A new pattern recognition methodology for classification of load profiles for ships electric consumers

Size: px
Start display at page:

Download "A new pattern recognition methodology for classification of load profiles for ships electric consumers"

Transcription

1 A new pattern recognition methodology for classification of load profiles for ships electric consumers GJ Tsekouras 1, IK Hatzilau 1, JM Prousalidis 1,2 1 Hellenic Naval Academy, Department of Electrical Engineering and Computer Science 2 National Technical University of Athens, School of Naval Architecture and Marine Engineering, Division of Marine Engineering In this paper a new pattern recognition methodology is presented for the classification of the daily chronological load curves of ship electric consumers (equipment) and the determination of the respective typical load curves of each one of them. It is based on pattern recognition methods, such as k-means, adaptive vector quantisation, fuzzy k- means, self-organising maps and hierarchical clustering, which are theoretically described and properly adapted. The parameters of each clustering method are properly selected by an optimisation process, which is separately applied for each one of six adequacy measures: the error function, the mean index adequacy, the clustering dispersion indicator, the similarity matrix indicator, the Davies-Bouldin indicator and the ratio of within cluster sum of squares to between cluster variation. As a study case, this methodology is applied to a set of consumers of Hellenic Navy MEKO type frigates. AUTHORS BIOGRAPHIES Dr GJ Tsekouras (Electrical & Computer Engineer from NTUA/1999, Civil Engineer from NTUA/2004, PhD on Electrical Engineering from NTUA/2006). Adunct lecturer at the Hellenic Naval Academy, dealing with power system analysis, forecasting and pattern recognition methodologies. Prof Dr-Ing IK Hatzilau (Electrical & Mechanical Engineer from NTUA/1965, Dr-Ing from University of Stuttgart/ 1969). After few years in the industry, he oined the Academic Staff of Dept of Electrical Engineering and Computer Science in Hellenic Naval Academy dealing with marine electrical and control engineering issues. He is representative of Hellenic Navy in NATO AC/141(MCG/6)SG/4 dealing with warship-electric systems. Dr J Prousalidis (Electrical Engineer from NTUA/1991, PhD from NTUA/1997). Assistant Professor at the School of Naval and Marine Engineering of National Technical University of Athens, dealing with electric energy systems and electric propulsion schemes on shipboard installations. INTRODUCTION The load demand profile ie, the chronological energy demand curve is a characteristic parameter of an electric consumer (equipment) of a ship electric network, the value of which vary depending on several factors such as ship type, state, operating modes and mission. In Fig 1 an indicative segment of a daily chronological curve of the refrigeration plant in ship-condition at SHORE of HN MEKO type frigate is shown, based on records taken, every 1 min. 1 The load demand profile is of primary importance in performing several studies such as load estimation, power sources selection, power cable rating, short circuit analysis, No. A Journal of Marine Engineering and Technology 45

2 a methodology in terms of the exploitation of the results yielded to a series of applications and studies of ship systems. In brief, these results, ie, the typical chronological load curves of each consumer, can be used as input information in: Fig 1: Indicative chronological energy demand curve of an electric consumer of HN MEKO type frigate load shedding, harmonics, modulation etc. The representative load demand profiles of each consumer can be obtained from a classification procedure of its chronological load curves in typical time intervals (also called typical days ). This classification can be conducted by pattern recognition techniques 2-6, such as: - the modified follow the leader 2,4 - the k-means the adaptive vector quantisation 3 - the fuzzy k-means the self-organising map 4 - the hierarchical methods 3-4. Regarding adequacy measures commonly used, these can be: - the mean square error 3,5 - the mean index adequacy the clustering dispersion indicator the similarity matrix indicator 3 - the Davies-Bouldin indicator the modified Dunn index 4 - the scatter index 4 - the ratio of within cluster sum of squares to between cluster variation. 3 Alternatively, the load curves classification of consumers can be performed by data mining 7, wavelet packet transformation 8 and Fourier analysis. The last ones are also used with simplified clustering models. Conventional tools, like statistical techniques, 9 need the knowledge of the typical days, which can be defined by experienced ship s personnel, eg, chief engineers. Evidently, one of the maor issues of this classification is the definition of the typical day. In the continental power systems the load curve s time interval is usually a day for a study time period from a few weeks 2 to one year. 3 In ships power systems the corresponding time interval is not defined by any standards, while the total study period can not but be fairly limited (varying from a few days to one month). In this paper the typical time interval is assumed to be a day. In a previous paper, 11 the authors have thoroughly discussed and highlighted the significance of developing such the formalisation of the consumer s behaviour and the corresponding clustering using the representative load curve of each consumer; the design of the ship s electric power system estimating the respective total load demand more accurately and, hence selecting the generators properly; the operation of the ship s power system succeeding more precise short-term load forecasting, increasing the respective reliability and decreasing the respective operation cost; the improvement of power factor taking into consideration the respective reactive load curves; the load estimation after the application of demand side management programs for each specific consumer, as well as the feasibility studies of the energy efficiency which normalise the total load demand and improve the total load factor; the improvement of the operation of the automatic battle management and load shedding systems, because the automatic supply of the critical consumers in each operating mode is facilitated in case of power system s fault based on the available generators, the healthy part of the power distribution system and the load demand of each consumer. The obective of this paper is to present a new methodology for the classification of the daily chronological load curves of ship electric consumers. This method is based on the socalled unsupervised neural networks. More specifically, for each consumer the corresponding set of load curves is organised into well-defined and separate classes, in order to successfully describe the respective demand behaviour without any intervention by the user in the classification procedure. The proposed methodology compares the results obtained from certain clustering techniques (k-means, adaptive vector quantisation (AVQ), fuzzy k-means, selforganising maps (SOM) and hierarchical methods) using six adequacy measures (mean square error, mean index adequacy, clustering dispersion indicator, similarity matrix indicator, Davies-Bouldin indicator, ratio of within cluster sum of squares to between cluster variation). The main points of this methodology are: the estimation of the representative typical daily load profiles for each consumer; the modification of the clustering techniques for this kind of classification problem, such as the appropriate weights initialisation for the k-means and fuzzy k- means; the proper parameters calibration, such as the amount of fuzziness for the fuzzy k-means, in order to fit the classification needs; the comparison of the clustering algorithms performance for each one of the adequacy measures; the selection of the proper adequacy measure. 46 Journal of Marine Engineering and Technology No. A

3 Finally, the results of the application of the developed methodology are thoroughly presented for the case study of six electric consumers of Hellenic Navy MEKO type frigates, the chronological load curves of which have been measured within the framework of a research proect. 1 PROPOSED PATTERN RECOGNITION METHODOLOGY FOR LOAD CURVES CLASSIFICATION OF SHIPS ELECTRIC CONSUMERS The classification of daily chronological load curves of each ship electric consumer is achieved by applying the pattern recognition methodology shown in Fig 2. The main steps are: 1. Data and features selection: Using electronic meters or ship energy management system for main consumers, the active and reactive power values are recorded (in kwh and kvarh) for each period in steps of 1min, 15min, etc. The chronological load curves for each individual consumer are determined for each study period (week, month). 2. Consumers clustering using a priori indices: Consumers can be characterised by their voltage level (high, medium, low), installed power, power factor, load factor, criticality according to ship s operating mode etc. These indices are not necessarily related to the load curves. They can be used however for the pre-classification of consumers. It is noted that the load curves of each consumer are normalised using the respective minimum and maximum loads of the period under study. 3. Data preprocessing: The load curves of each consumer are examined for normality, in order to modify or delete the values that are obviously wrong (noise suppression). If necessary, a preliminary execution of a pattern recognition algorithm is carried out, in order to trace erroneous measurements or networks faults, Fig 2: Flow diagram of pattern recognition methodology for the classification of daily chronological load curves of ships electric consumer No. A Journal of Marine Engineering and Technology 47

4 which, if uncorrected, will reduce the number of the useful typical time intervals for a constant number of clusters. 4. Typical load curves clustering for each consumer: For each consumer, a number of clustering algorithms (kmeans, adaptive vector quantisation, fuzzy k-means, self-organising maps and hierarchical methods) is applied. Each algorithm is trained for the set of load diagrams and evaluated according to six adequacy measures. The parameters of the algorithms are optimised, if necessary. The developed methodology uses the clustering methods providing the most satisfactory results. This process is repeated for the total set of consumers under study. Special consumers, such as seasonal or emergency ones (eg, machine tools, firepumps, etc) are identified. These results can be combined with the ships operating mode. At this stage, the size of time interval (1, 2, 3, 4, 6 hours, half or one day) could be investigated, too. The typical load curves of consumers used are selected by choosing the type of typical time interval (such as the most populated one, the time interval with the peak demand load, etc). MATHEMATICAL MODELLING OF CLUSTERING METHODS AND CLUSTERING VALIDITY ASSESSMENT General In the study case of the chronological typical load curves of ship electric consumers, a number of N analytical daily load curves is given. The main target is to determine the respective sets of days and load patterns. Generally, N is defined as the population of the input (or training) vectors, which are going to be clustered. ~x ¼ (x 1, x 2,... x i,... x d ) T symbolises the -th input vector and d its dimension, which equals to 1440 (the load measurements are taken every minute). The corresponding set is given by X ¼ f~x : ¼ 1,..., Ng. It is worth mentioning that x i are normalised using the higher and lower values of all elements of the original input patterns set, in order to obtain better results from the application of clustering methods. Each classification process makes a partition of the initial N input vectors to M clusters. The -th cluster has a representative, which is the respective load profile and is represented by the vector ~w ¼ (w 1, w 2,..., w i,..., w d ) T of d dimension. Vector ~w expresses the cluster centre. The corresponding set is the classes set, which is defined by W ¼f~w k, k ¼ 1,... Mg. The subset of input vectors ~x, which belongs to the -th cluster, is Ù and the respective population of load diagrams is N. For the study and evaluation of classification algorithms the following distance forms are defined: 1. the Euclidean distance between 1, 2 input vectors of the set X: d ð~x 1, ~x 2 Þ ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u1 X t d ðx 1 i x 2 iþ 2 (1) d i¼1 2. the distance between the representative vector ~w of -th cluster and the subset Ù, calculated as the geometric mean of the Euclidean distances between ~w and each member of Ù : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 X d ~w, Ù ¼ d 2 ~w, ~x (2) N ~x 2Ù 3. the infra-set mean distance of a set, defined as the geometric mean of the inter-distances between the members of the set, ie, for the subset Ù : ^d ðù Þ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 X d 2 ~x, Ù 2N ~x 2Ù The basic characteristics of the five clustering methods being used are the following. k-means model The k-means method is the simplest hard clustering method, which gives satisfactory results for compact clusters. 11 The k-means clustering method groups the set of the N input vectors to M clusters using an iterative procedure. The respective steps of the algorithm are the following: 1. Initialisation of the weights of the M clusters is determined. In the classic model a random choice among the input vectors is used 4. In the developed algorithm the w i of the -th centre is initialised as: w (0) i ¼ a þ b ( 1)=(M 1) (4) where a and b are properly calibrated parameters. 2. During training iteration t (called epoch t, hereinafter) for each training vector ~x its Euclidean distances d(~x, ~w ) are calculated for all centres. The -th input vector is put in the set Ù (t), for which the distance between ~x and the respective centre is minimum, which means: d ð~x, ~w k Þ ¼ min 8 d ~x, ~w 3. When the entire training set is formed, the new weights of each centre are calculated as: ~w (tþ1) ¼ 1 X ~x (6) where N (t) N (t) ~x 2Ù ( t) (3) (5) is the population of the respective set Ù (t) during epoch t. 4. Next, the number of epochs is increased by one. This process is repeated (return to step b) until the maximum number of epochs is used or weights do not significantly change, ie, (~w (t) ~w (tþ1), å, where å is the upper limit of weight change between sequential iterations). The algorithm s main purpose is to minimise the error function J: 48 Journal of Marine Engineering and Technology No. A

5 J ¼ 1 N X N ¼1 d 2 ð~x, ~w k:~x 2Ù k Þ (7) The main difference with the classic model is that the process is repeated for various pairs of (a,b). The best results for each adequacy measure are recorded for various pairs (a,b). between sequential iterations and the respective criterion is activated after T in epochs. The algorithm core is executed for a specific number of neurons and the respective parameters ç 0, ç min and T ç0 are optimised for each adequacy measure separately. This process is repeated from M 1 to M 2 neurons. Kohonen adaptive vector quantisation (AVQ) This algorithm is a variation of the k-means method, which belongs to the unsupervised competitive one-layer neural networks. It classifies input vectors into clusters by using a competitive layer with a constant number of neurons. Practically in each step all clusters compete with each other for the winning of a pattern. The winning cluster moves its centre to the direction of the pattern, while the rest of the clusters move their centres to the opposite direction (supervised classification) or remain stable (unsupervised classification). Here, we will use the last unsupervised classification algorithm. The respective steps are the following: 1. Initialisation of the weights of the M clusters is determined, where the weights of all clusters are equal to 0.5, that is w (0) i ¼ 0:5, 8, i. 2. During epoch t each input vector ~x is randomly presented and its respective Euclidean distances from every neuron are calculated. In the case of existence of bias factor º, the respective minimization function is: f winner_neuron ð~x Þ ¼ : min 8 d ~x, ~w þ º N =N where N is the population of the respective set Ù during epoch t-1. The weights of the winning neuron (with the smallest distance) are updated as: ~w ðþ t ðn þ 1Þ ¼ ~w ðþ t ðnþþ çðþ t ~x ~w ðþ t ðnþ (9) where n is the number of input vectors, which have been presented during the current epoch, and ç(t) is the learning rate according to: çðþ¼ t ç 0 exp t. ç min (10) T ç0 where ç 0, ç min and T ç0 are the initial value, the minimum value and the time parameter respectively. The remaining neurons are unchangeable for ~x, as introduced by the Kohonen winner-take-all learning rule. 13,14 3. Next, the number of epochs is increased by one. This process is repeated (return to step b) until either the maximum number of epochs is reached or the weights converge or the error function J does not improve, which means: J ðþ t J ðtþ1þ J ðþ t (8), å9 for t > T in (11) where å9 is the upper limit of error function change Fuzzy k-means During the application of the k-mean or the adaptive vector quantization algorithm, each pattern is assumed to be in exactly one cluster (hard clustering). In many cases the areas of two neighbour clusters are overlapped, so that there are not any valid qualitative results. If the condition of exclusive partition of an input pattern to one cluster is to be relaxed, the fuzzy clustering techniques should be used. More specifically, each input vector ~x does not belong to only one cluster, but it participates in every -th cluster by a membership factor u, where: X M ¼1 u ¼ 1& 0< u < 1, 8 (12) Theoretically, the membership factor gives more flexibility in the vector s distribution. During the iterations the following obective function is minimised: J fuzzy ¼ 1 N X M X N ¼1 ¼1 u d 2 ~x, ~w (13) The simplest algorithm is the fuzzy k-means clustering one, in which the respective steps are the following: 1. Initialisation of the weights of the M clusters is determined. In the classic model a random choice among the input vectors is used. 5 In the developed algorithm the w i of the -th centre is initialised by equation (4). 2. During epoch t for each training vector ~x the membership factors are calculated for every cluster: u ðtþ1þ ¼ X M k¼1 1 d ~x, ~w ðþ t d ~x, ~w ðþ t k (14) 3. Afterwards the new weights of each centre are calculated as: ~w ðtþ1þ ¼ X N ¼1 X N ¼1 u ðtþ1þ u ðtþ1þ q ~x q (15) where q is the amount of fuzziness in the range (1, 1) which increases as fuzziness reduces. 4. Next, the number of epochs is increased by one. This process is repeated (return to step b) until the maximum number of epochs is used or weights do not significantly change. No. A Journal of Marine Engineering and Technology 49

6 This process is repeated for different pairs of (a,b) and for different amounts of fuzziness. The best results for each adequacy measure are recorded for different pairs (a,b) and q. Self-organising maps (SOM) The Kohonen SOM is a topologically unsupervised neural network that proects a d-dimensional input data set into a reduced dimension space (usually a mono-dimensional or bi-dimensional map). It is composed of a predefined grid containing M 3 1 or M 1 3 M 2 d-dimensional neurons ~w k for mono-dimensional or bi-dimensional map respectively, which are calculated by a competitive learning algorithm updating not only the weights of the winning neuron, but also the weights of its neighbour units in inverse proportion of their distance. The neighbourhood size of each neuron shrinks progressively during the training process, starting with nearly the whole map and ending with the single neuron. The training of SOM is divided into two phases: rough ordering, with high initial learning rate, large radius and small number of epochs, so that neurons are arranged into a structure which approximately displays the inherent characteristics of the input data, fine tuning, with small initial learning rate, small radius and higher number of training epochs, in order to tune the final structure of the SOM. The transition of the rough ordering phase to fine tuning one takes place after T s0 epochs. It is mentioned that, in the case of the bi-dimensional map, the immediate exploitation of the respective clusters is not a simple problem. The map can be exploited either by inspection or applying a second simple clustering method, such as the simple k-means method. 16 Here, only the case of the mono-dimensional map is examined. More specifically, the respective steps of the monodimensional SOM algorithm are the following: 1. The number of neurons of the SOM s grid are defined and the initialisation of the respective weights is determined. Thus, the weights can be given by (a) w ki ¼ 0:5, 8k, i, (b) the random initialisation of each neuron s weight, (c) the random choice of the input vectors for each neuron. 2. The SOM training commences by first choosing an input vector ~x, at t epoch, randomly from the input vectors set. The Euclidean distances between the n-th presented input pattern ~x and all ~w k are calculated, so as to determine the winning neuron i9 that is closest to ~x (competition stage). The -th reference vector is updated (weights adaptation stage) according to: ~w ðþ t ðn þ 1Þ ¼ ~w ðþ t ðnþþ çðþ t h i9 ðþ t ~x ~w ðþ t ðnþ (16) where ç(t) is the learning rate according to equation (10). During the rough ordering phase, ç r, T ç0 are the initial value and the time parameter respectively, while during the fine tuning phase the respective values are ç f, T ç0. The h i9 (t) is the neighbourhood symmetrical function, that will activate the neurons that are topologically close to the winning neuron i9, according to their geometrical distance, who will learn from the same ~x (collaboration stage). In this case the Gauss function is proposed: " # d2 i9 h i9 ðþ¼ t exp 2 ó 2 (17) ðþ t where d i9 ¼k~r i9 ~r k is the respective distance between i9 and neurons, ~r ¼ (x, y ) are the respective co-ordinates in the grid, ó (t) ¼ ó 0 exp ( t=t ó 0 ) is the decreasing neighbourhood radius function where ó 0 and T ó 0 are the respective initial value and time parameter of the radius respectively. 3. Next, the number of the epochs is increased by one. This process is repeated (return to step b) until either the maximum number of epochs is reached or the index I s gets the minimum value: 10 I s (t) ¼ J(t) þ ADMðÞþ t TE(t) (18) where the quality measures of the optimum SOM are based on the quantization error J given by equation (7), the topographic error TE and the average distortion measure ADM. The topographic error measures the distortion of the map as the percentage of input vectors for which the first i9 1 and second i9 2 winning neuron are not neighbouring map units: TE ¼ XN ¼1 neighbði9 1, i9 2 Þ=N (19) where, for each input vector, neighb(i9 1, i9 2 ) equals either 1, if i9 1 and i9 2 neurons are not neighbours, or 0. The average distortion measure is given for the t epoch by: ADMðÞ¼ t XN ¼1 X M ¼1 h i9!~x, ðþ t d 2 ~x, ~w =N (20) This process is repeated for different parameters of ó 0, ç f, ç r, T ç0, T ó 0 and T s0. Alternatively, the multiplication factors ö and î are introduced without decreasing the generalisation ability of the parameters calibration: T s0 ¼ ö T ç0 (21) T ó 0 ¼ î T ç0 =ln ó 0 (22) The best results for each adequacy measure are recorded for all parameters ó 0, ç f, ç r, T ç0, ö and î. Hierarchical agglomerative algorithms Agglomerative algorithms are based on matrix theory 6. The input is the N 3 N dissimilarity matrix P 0. At each level t, when two clusters are merged into one, the size of the dissimilarity matrix P t becomes (N t) 3 (N t). Matrix P t is obtained from P t-1 by deleting the two rows and columns that correspond to the merged clusters and adding 50 Journal of Marine Engineering and Technology No. A

7 a new row and a new column containing the distances between the newly formed cluster and the old ones. The distance between the newly formed cluster C q (the result of merging C i and C ) and an old cluster C s is determined as: dc ð q, C s Þ¼ a i dc ð i, C s Þþ a dc ð, C s Þ þ b dc ð i, C Þþ c dc ð i, C s Þ dc ð, C s Þ (23) where a i, a, b and c correspond to different choices of the dissimilarity measure. It is noted that in each level t the respective representative vectors are calculated by (6). The basic algorithms, which are going to be used in our case, are: the single link algorithm (SL) it is obtained from (23) for a i ¼ a ¼ 0:5, b ¼ 0 and c ¼ 0:5: dc ð q, C s Þ ¼ min dc ð i, C s Þ, dc ð, C s Þ ¼ 1 2 dc ð i, C s Þþ 1 2 dc ð, C s Þ 1 2 dc ð i, C s Þ dc ð, C s Þ (24) the complete link algorithm (CL) it is obtained from (23) for a i ¼ a ¼ 0:5, b ¼ 0 and c ¼ 0:5: dc ð q, C s Þ ¼ max dc ð i, C s Þ, dc ð, C s Þ ¼ 1 2 dc ð i, C s Þþ 1 2 dc ð, C s Þ (25) þ 1 2 dc ð i, C s Þ dc ð, C s Þ the unweighted pair group method average algorithm (UPGMA): dc ð q, C s Þ ¼ n i dc ð i, C s Þþ n dc ð, C s Þ (26) n i þ n where n i and n - are the respective members populations of clusters C i and C. the weighted pair group method average algorithm (WPGMA): dc ð q, C s Þ ¼ 1 2 dc ð i, C s Þþ dc ð, C s Þ (27) the unweighted pair group method centroid algorithm (UPGMC): d ð1þ ðc q, C s Þ ¼ n ð i d 1Þ ðc i, C s Þþ n d ð1þ ðc, C s Þ ðn i þ n Þ n i n d ð1þ ðc i, C Þ ðn i þ n Þ 2 (28) where d (1) (C q, C s ) ¼k~w q ~w s k 2 and ~w q is the representative centre of the q-th cluster according to the equation (6). the weighted pair group method centroid algorithm (WPGMC): d ð1þ ðc q, C s Þ ¼ 1 2 d ð1þ ðc i, C s Þþ d ð1þ ðc, C s Þ 1 4 d ð1þ ðc i, C Þ (29) the Ward or minimum variance algorithm (WARD): ð Þ ðc q, C s Þ ¼ d 2 ðn i þ n s Þ d ð2þ ðc i, C s Þþ ðn þ n s Þ d ð2þ ðc, C s Þ n s d ð2þ ðc i, C Þ ðn i þ n þ n s Þ (30) where: d ð2þ ðc i, C Þ ¼ n i n d ð1þ ðc i, C Þ (31) n i þ n It is noted that in each level t the respective representative vectors are calculated by equation (6). The respective steps of the respective algorithms are the following: 1. Initialisation: The set of the remaining patterns R 0 for zero level (t ¼ 0 ) is the set of the input vectors X. The similarity matrix P 0 ¼ P(X ) is determined. Afterwards t increases by one (t ¼ t + 1). 2. During level t clusters C i and C are found, for which the minimisation criterion d(c i, C ) ¼ min r,s¼1,..., N,r6¼s d(c r, C s ) is satisfied. 3. Then clusters C i and C are merged into a single cluster C q and the set of the remaining patterns R t is formed as: R t ¼ (R t 1 fc i, C g) [fc q g. 4. The construction of the dissimilarity matrix P t from P t 1 is realised by applying equation (23). 5. Next, the number of the levels is increased by one. This process is repeated (return to step b) until the remaining patterns R N 1 is formed and all input vectors are in the same and unique cluster. It is mentioned that the number of iterations is determined from the beginning and it equals to the number of input vectors decreased by 1 (N-1). Adequacy measures In order to evaluate the performance of the clustering algorithms and to compare them with each other, six different adequacy measures are applied. Their purpose is to obtain well-separated and compact clusters, in order to make the load curves self explanatory. The definitions of these measures are the following: 1. Mean square error or error function (J) given by equation (7). 2. Mean index adequacy (MIA) 2-3, which is defined as the average of the distances between each input vector assigned to the cluster and its centre: No. A Journal of Marine Engineering and Technology 51

8 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X M MIA ¼ t d 2 ~w, Ù (32) M ¼1 3. Clustering dispersion indicator (CDI), 2-4 which depends on the mean infra-set distance between the input vectors in the same cluster and inversely on the infraset distance between the class representative load curves: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X M CDI ¼ t ^d 2 ðù k Þ=^dðW Þ (33) M k¼1 4. Similarity matrix indicator (SMI), 3 which is defined as the maximum off-diagonal element of the symmetrical similarity matrix, whose terms are calculated by using a logarithmic function of the Euclidean distance between any kind of class representative load curves: n o 1 SMI ¼ max 1 1=ln d ~w p, ~w q : piq (34) p, q ¼ 1,..., M 5. Davies-Bouldin indicator (DBI) 3,5, which represents the system-wide average of the similarity measures of each cluster with its most similar cluster: ( DBI ¼ 1 X M ^d ðù p Þþ max ^d ) ðù q Þ : M p6¼q d ~w k¼1 p, ~w q (35) p, q ¼ 1,..., M 6. Ratio of within cluster sum of squares to between cluster variation (WCBCR) 3, which depends on the sum of the distance square between each input vector and its cluster representative vector, as well as the similarity of the clusters centres: WCBCR ¼ XM k¼1 X X M d 2 ð~w k, ~x Þ ~x 2Ù k 1<q, p d 2 ~w p, ~w q (36) The success of the various algorithms for the same final number of clusters is expressed by having small values of adequacy measures. By increasing the number of clusters, all measures decrease, except from the similarity matrix indicator. An additional adequacy measure could be the number of the dead clusters, for which the sets are empty. It is intended to minimise this number. It is noted that in equations (7), (32)-(36), M is the number of clusters without the dead ones. APPLICATION OF THE PROPOSED METHODOLOGY TO A SET OF CONSUMERS OF HELLENIC NAVY MEKO FRIGATES Analysis of the case study The developed methodology is applied to the measured data of six consumers of HN MEKO type frigates electric system, the maximum peak load of which ranges between 3.5kW and 60kW. The respective data are the 1 minute ON/ OFF normalised load values for each individual consumer for a period of eleven days during November 1997 and January The respective consumers are: chiller, HP compressor FWD/AFT, refrigeration plant, LP compressor FWD/AFT, sanitary plant, boiler. Initially, the analysis of the chiller is presented in detail, while additional consumers can be analysed in a similar way. It is supposed that the time interval is a day. The representative consumer s typical day has been chosen to be the most populated one. The respective set of the daily chronological curves has 11 members. No curves are reected through data pre-processing. Application of k-means The modified model of the k-means method is executed for different pairs (a,b) from 2 to 10 clusters, where a ¼ {0.1, 0.11,..., 0.45} and a+b ¼ {0.54, 0.55,..., 0.9}. For each cluster, 1332 different pairs (a, b) are examined. The best results for the six adequacy measures do not refer to the same pair (a, b). From the results of Table 1, it is obvious that the developed k-means is superior to the classical one with the random choice of the input vectors during the centres initialisation. For the classical k-means model 100 executions are carried out and the best results for each adequacy measure are recorded. The superiority of the developed model applies in all cases of neurons. A second advantage is the convergence to the same results for the respective pairs (a,b), which can not be achieved using the classical model. Application of adaptive vector quantisation The initial value ç 0, the minimum value ç min and the time parameter T ç0 of learning rate must be properly calibrated. For example in Fig 3 the sensitivity of the ratio of within cluster sum of squares to between cluster variation WCBCR to the ç 0 and T ç0 parameters is presented for 90 experiments. The best results of the adequacy measures are not given for the same ç 0 and T ç0, according to the results of Table 1. The ç min value does not practically improve the neural network s behaviour assuming that it ranges between 10-5 and Application of fuzzy k-means In the fuzzy k-means method the model is executed for different amount of fuzziness q ¼ {2, 4, 6}. As an example the WCBCR is presented in Fig 4 for different number of clusters for three cases of q ¼ {2, 4, 6}. The best results are given by q ¼ 6, while the respective values for q ¼ 4is quite similar to those of q ¼ Journal of Marine Engineering and Technology No. A

9 Methods -Parameters Adequacy Measure J MIA CDI SMI DBI WCBCR Proposed k-means a- b parameters Classical k-means AVQ ç 0 ç min T ç0 parameters Fuzzy k-means No q- a- b parameters convergence All hierarchical algorithms (in this case only) Mono-dimensional SOM ó 0 ö î ç f ç r T ç0 parameters Table 1: Comparison of the best clustering models for 4 clusters for the chiller Fig 3: WCBCR measure of the AVQ method for 4 clusters for the set of 11 training patterns of the chiller with ç 0 ¼ {0.1, 0.2,..., 0.9}, T ç0 ¼ {500, 1000,..., 5000} Application of mono-dimensional self-organising maps Although the mono-dimensional SOM is theoretically well defined, there are several issues that need to be solved for the effective training of SOM. The maor problems are: Fig 4: WCBCR measure of the fuzzy k-means method for 2 to 10 clusters for the set of 11 training patterns of the chiller with q ¼ 2, 4, 6 to stop the training process of the optimum SOM. In this case the target is to minimise the index Is (equation (18)). Generally, it is noticed that the convergence is completed after T ç0 epochs during fine tuning phase, when T ç0 has big values (>1000 epochs). the proper calibration of (a) the initial value of the neighbourhood radius ó 0, (b) the multiplicative factor ö between T s0 (epochs of the rough ordering phase) and T ç0 (time parameter of learning rate), (c) the No. A Journal of Marine Engineering and Technology 53

10 multiplicative factor î between T ó 0 (time parameter of neighbourhood radius) and T ç0, (d) the proper initial values of the learning rate ç r and ç f during the rough ordering phase and the fine tuning phase respectively. The optimisation process for the monodimensional SOM parameters is similar to that one of the application of the adaptive vector quantisation for ç 0 and T ç0. The best results for each adequacy measures are presented in Table 1 for the case of 4 clusters. the initialisation of the weights of the neurons, where the three cases of the theoretical analysis of selforganising maps are examined and the best training behaviour is presented in case (a) (for w ki ¼ 0:5, 8k, i ). Application of hierarchical agglomerative algorithms In the case of the seven hierarchical models the best results are given for different models for two and three clusters, while for four clusters or more, all models give the same results, as it is shown in Fig 5. It should be mentioned that there are not any other parameters for calibration, such as maximum number of iterations etc. Comparison of clustering models for the chiller The best results for each clustering method (modified k- means, adaptive vector quantisation, fuzzy k-means, selforganised maps and hierarchical methods) are presented in Fig 6. The optimised AVQ and WARD methods present the best behaviour for the mean square error J, the optimised AVQ and the unweighted pair group method average algorithm (UPGMA) for the MIA, the optimised AVQ and the modified k-means for the CDI, DBI and WCBCR, the modified k-means for the SMI. All measures (except DBI) show an improved performance, as the number of clusters increases. The number of dead clusters for the WCBCR indicator for all clustering techniques and for all adequacy measures for AVQ method are shown in Fig 6g and 6h respectively. Here, it is not obvious which measure is the best one because of the extremely small set of training patterns. However, taking into consideration the results 3,10 and noting that the basic theoretical advantage of the WCBCR measure is that it combines the distances of the input vectors from the representative clusters and the distances between clusters (covering also the J and CDI characteristics), the application of WCBCR is proposed. Observing Fig 6f, the improvement of the WCBCR is significant up to 4 clusters. After this value, the behaviour Fig 5: Adequacy measures of the 7 hierarchical clustering algorithms for 2 to 10 clusters for the set of 11 training patterns of the chiller 54 Journal of Marine Engineering and Technology No. A

11 Fig 6: Adequacy measures and dead clusters of each clustering method for 2 to 10 clusters for the set of 11 training patterns of the chiller Type Number of clusters Population of the representative cluster Best Date clustering technique November 1997 January a 4 7 AVQ 2 nd 3 rd 3 rd 1 st 4 th 4 th 4 th 4 th 4 th 4 th 4 th b 3 9 CL - UPGMC 2 nd 1 st 1 st 1 st 1 st 1 st 1 st 1 st 3 rd 1 st 1 st c 4 7 k-means 4 th 3 rd 4 th 4 th 4 th 4 th 4 th 4 th 2 nd 1 st 1 st d 4 6 UPGMC 4 th 3 rd 4 th 4 th 1 st 1 st 1 st 1 st 1 st 2 nd 1 st e 3 8 All hierarchical 1 st 2 nd 1 st 1 st 1 st 1 st 1 st 3 rd 1 st 2 nd 1 st f 4 8 k-means 1 st 2 nd 1 st 1 st 1 st 1 st 1 st 1 st 4 th 3 rd 1 st Table 2: Results of the methodology for 6 electrical consumers of HN MEKO Type frigates, where type a is the chiller, type b the HP compressor FWD/AFT, type c the refrigeration plant, type d the LP compressor FWD/AFT, type e the sanitary plant, type f the boiler respectively No. A Journal of Marine Engineering and Technology 55

12 of the adequacy measure is gradually stabilised. It can also be estimated graphically, using the knee-rule, of which gives values between 3 to 4 clusters (see Fig 7). In Fig 8 the typical daily chronological ON/OFF load curves for the chiller are presented using the AVQ method with the best WCBCR measure for 4 clusters. The training time for the methods under study has the ratio 0.05:1:22:24:36 (hierarchical: proposed k-means: optimised adaptive vector quantisation: mono-dimensional selforganising map: fuzzy k-means for q ¼ 6). Therefore, the k-means, AVQ and hierarchical models have been selected. Fig 7: WCBCR measure of the AVQ model for 2 to 10 clusters for the set of 11 training patterns of the chiller and the use of the tangents for the estimation of the knee Application of the clustering models for the other electric consumers The same methodology is applied to the other five electric consumers. In Table 2, the total number of clusters, the population of the representative cluster, the respective clustering technique and the clusters calendar are registered for each consumer. It is obvious that the modified k-means, the AVQ and the hierarchical (especially UPGMC) clustering techniques compete each other, while the fuzzy k-means and self-organising map techniques have poor results in this kind of classification. It is reminded that the representative Fig 8: The typical daily chronological ON/OFF load curves for the chiller for four clusters using AVQ method 56 Journal of Marine Engineering and Technology No. A

13 cluster is the most populated one. In Fig 9, the respective representative typical load diagrams are presented. CONCLUSIONS This paper presents an efficient pattern recognition methodology for the study of the load demand behaviour of electrical consumers of ships. Unsupervised clustering methods can be applied, such as the k-means, adaptive vector quantisation (AVQ), fuzzy k-means, mono-dimensional self-organising maps (SOM) and hierarchical methods. The performance of these methods is evaluated by six adequacy measures: mean square error, mean index adequacy, clustering dispersion indicator, similarity matrix indicator, Davies- Bouldin indicator, the ratio of within cluster sum of squares to between cluster variation. Finally, the representative daily load diagrams along with the respective populations per each consumer are calculated. This information is valuable for ship engineers, because it facilitates the formalisation of the electrical consumer s load behaviour, the design, the operation and the reliability of the ship s power system and the improvement of the operation of the automatic battle management system for a battleship. From the respective application for six consumers of HN MEKO type frigates electric system for a small dataset (only 11 training sets), it is concluded that three to four clusters suffice for the satisfactory description of the daily load curves of each consumer. It is also concluded that the optimal clustering technique is the modified k-means, the AVQ and the hierarchical clustering techniques, while the optimal adequacy measure is the ratio of within cluster sum of squares to between cluster variation. These results surely depend on the population of the data set, the ship operating mode ( anchor, shore, at sea, general quarter ) and the load curve s time interval (in this case study, it was a day). In the future, it should be investigated in larger datasets for longer study periods with different time intervals (from few hours to days). Fig 9: The representative (most populated one) chronological ON/OFF load curves for each one of the six consumers No. A Journal of Marine Engineering and Technology 57

14 REFERENCES 1. Hatzilau IK, Perros S, Karamolegos G, Galanis K, Dalakos A, Anastasopoulos K, Kavousanos A and Enotiadis X Load estimation of MEKO Class frigates Field measurements, results. Proceedings of the International Conference on The Naval Technology for the 21 st Century, pp , June 1998, Pireaus, Greece. 2. Chicco G, Napoli R, Postolache P, Scutariu M and Toader C. 2003, Customer characterisation for improving the tariff offer, IEEE Transactions Power Systems, 18-1: Tsekouras GJ, Hatziargyriou ND and Dialynas EN Two-stage pattern recognition of load curves for classification of electricity customers. IEEE Transactions Power Systems, 22-3: Gerbec D, Gasperic S, Smon I and Gubina F Allocation of the load profiles to consumers using probabilistic neural networks. IEEE Transactions Power Systems, Vol 20-2: Chicco G, Napoli R and Piglione F Comparisons among clustering techniques for electricity customer classification. IEEE Transactions Power Systems, Vol 21-2: Theodoridis S and Koutroumbas K Pattern Recognition, 1 st ed. New York: Academic Press. 7. Figueiredo V, Rodrigues F, Vale Z, Gouveia JB An electric energy consumer characterization framework based on data mining techniques. IEEE Trans. Power Syst, Vol. 20-2: Petrescu M, Scutariu M Load diagram characterisation by means of wavelet packet transformation. 2 nd Balkan Conference, Belgrade, Yugoslavia, June Chen C S, Hwang J C, Huang C W Application of load survey systems to proper tariff design. IEEE Trans. Power Syst,Vol. 12-3: Tsekouras GJ, Kotoulas PB, Tsirekis CD, Dialynas EN, Hatziargyriou ND A pattern recognition methodology for evaluation of load profiles and typical days of large electricity customers. Electrical Power Systems Research, Vol. 78-9: Hatzilau IK, Tsekouras GJ, Prousalidis J, Gyparis IK. On electric load characterization and categorization in ship electric installations. INEC-2008, 1-3 April 2008, Hamburg, Germany. 12. Duda RO, Hart PE, Stork DG Pattern Classification. 1st edition, A Wiley-Interscience Publication. 13. Haykin S Neural networks: A comprehensive foundation. 2nd Edition, Prentice Hall, NJ, Kohonen T Self organization and associative memory. 2nd Edition, Springer-Verlag, NY. 15. SOM Toolbox for MATLAB Helsinki, Finland: Helsinki Univ. Technology. 16. Chicco G, Napoli R, Piglione F, Scutariu M, Postolache P, Toader C Options to classify electricity customers. Med Power 2002, 4-6 November, Athens, Greece 17. Hand D, Manilla H, Smyth P Principles of data mining. The MIT Press, Cambridge, Massachusetts, London, England. 58 Journal of Marine Engineering and Technology No. A

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

More information

Segmentation of stock trading customers according to potential value

Segmentation of stock trading customers according to potential value Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz [email protected] 28 de outubro de 2013 Adriano Cruz [email protected] () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz [email protected]

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Hierarchical Cluster Analysis Some Basics and Algorithms

Hierarchical Cluster Analysis Some Basics and Algorithms Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

DEA implementation and clustering analysis using the K-Means algorithm

DEA implementation and clustering analysis using the K-Means algorithm Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

PLAANN as a Classification Tool for Customer Intelligence in Banking

PLAANN as a Classification Tool for Customer Intelligence in Banking PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department

More information

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo

More information

THE SMART METERING rollout as witnessed worldwide,

THE SMART METERING rollout as witnessed worldwide, IEEE TRANSACTIONS ON SMART GRID 1 Two-stage load pattern clustering using fast wavelet transformation Kevin Mets, Frederick Depuydt, Chris Develder, Senior Member, IEEE Abstract Smart grids collect large

More information

Quality Assessment in Spatial Clustering of Data Mining

Quality Assessment in Spatial Clustering of Data Mining Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

A comparison of various clustering methods and algorithms in data mining

A comparison of various clustering methods and algorithms in data mining Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

Fuzzy Clustering Technique for Numerical and Categorical dataset

Fuzzy Clustering Technique for Numerical and Categorical dataset Fuzzy Clustering Technique for Numerical and Categorical dataset Revati Raman Dewangan, Lokesh Kumar Sharma, Ajaya Kumar Akasapu Dept. of Computer Science and Engg., CSVTU Bhilai(CG), Rungta College of

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Neural Network Add-in

Neural Network Add-in Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva 382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

CLUSTERING FOR FORENSIC ANALYSIS

CLUSTERING FOR FORENSIC ANALYSIS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

Comparing large datasets structures through unsupervised learning

Comparing large datasets structures through unsupervised learning Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]

More information

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]

More information

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES JULIA IGOREVNA LARIONOVA 1 ANNA NIKOLAEVNA TIKHOMIROVA 2 1, 2 The National Nuclear Research

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239 STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering

More information

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

More information

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES * CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA [email protected] Efficient partitioning

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

On Clustering Validation Techniques

On Clustering Validation Techniques Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI [email protected] YANNIS

More information

Customer Relationship Management using Adaptive Resonance Theory

Customer Relationship Management using Adaptive Resonance Theory Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Learning Vector Quantization: generalization ability and dynamics of competing prototypes

Learning Vector Quantization: generalization ability and dynamics of competing prototypes Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science

More information

Statistical Databases and Registers with some datamining

Statistical Databases and Registers with some datamining Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS TW72 UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS MODULE NO: EEM7010 Date: Monday 11 th January 2016

More information

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING ABSTRACT: IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING Hakan Wiman Department of Photogrammetry, Royal Institute of Technology S - 100 44 Stockholm, Sweden (e-mail [email protected]) ISPRS Commission

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

IMPLEMENTING INTRANET/EXTRANET Estimate of cost Feasibility check

IMPLEMENTING INTRANET/EXTRANET Estimate of cost Feasibility check IMPLEMENTING INTRANET/EXTRANET Estimate of cost Feasibility check Synopsis Whether Firms/Organisations decide to install Intranet, they perceive it as appropriate stimulus to improve their production process

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 [email protected] What is Learning? "Learning denotes changes in a system that enable

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information