The application of cluster analysis in geophysical data interpretation
|
|
|
- Winifred Conley
- 10 years ago
- Views:
Transcription
1 Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title The application of cluster analysis in geophysical data interpretation Author(s) Song, Yu-Chen; Meng, Hai-Dong; O'Grady, Michael J.; O'Hare, G. M. P. (Greg M. P.) Publication Date Publication information Computational Geosciences, 14 (2): Publisher Springer This item's record/more information DOI Downloaded T20:57:51Z Some rights reserved. For more information, please see the item record link above.
2 ORIGINAL PAPER The Application of Cluster Analysis in Geophysical Data Interpretation Yu-Chen Song 1,*, Hai-Dong Meng 1, M.J. O'Grady 2, G.M.P. O'Hare 2 1 Inner Mongolia University of Science and Technology, Baotou, China. 2 School of Computer Science & Informatics, University College Dublin (UCD), Belfield, Dublin 4, Ireland. Abstract: A clustering algorithm which is based on density and adaptive density-reachable is developed and presented for arbitrary data point distributions in some real world applications, especially in geophysical data interpretation. Through comparisons of the new algorithm and other algorithms, it is shown that the new algorithm can reduce the dependency of domain knowledge and the sensitivity of abnormal data points, that it can improve the effectiveness of clustering results in which data are distributed in different shapes and different density, and that it can get a better clustering efficiency. The application of the new clustering algorithm demonstrates that data mining techniques can be used in geophysical data interpretation and can get meaningful and useful results, and that the new clustering algorithm can be used in other real world applications. Keywords: cluster analysis; geophysical data interpretation; data mining *Corresponding author: Yu-Chen Song Tex: address: [email protected] (Yu-Chen Song), [email protected]. Address: Inner Mongolia University of Science and Technology, 7#, Aerding Street, Kunqu District, Baotou, Inner Mongolia, China. Post code:
3 The Application of Cluster Analysis in Geophysical Data Interpretation Abstract: A clustering algorithm which is based on density and adaptive density-reachable is developed and presented for arbitrary data point distributions in some real world applications, especially in geophysical data interpretation. Through comparisons of the new algorithm and other algorithms, it is shown that the new algorithm can reduce the dependency of domain knowledge and the sensitivity of abnormal data points, that it can improve the effectiveness of clustering results in which data are distributed in different shapes and different density, and that it can get a better clustering efficiency. The application of the new clustering algorithm demonstrates that data mining techniques can be used in geophysical data interpretation and can get meaningful and useful results, and that the new clustering algorithm can be used in other real world applications. Keywords: cluster analysis; geophysical data interpretation; data mining 1. Introduction Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large amounts of data. It has also opened up exciting opportunities for exploring and analyzing new types of data and for analyzing old types of data in new ways. Clustering is one of data mining techniques. Cluster analysis seeks to find groups of closely related observations so that observations that belong to the same cluster are more similar to each other than observations that belong to other clusters. Cluster analysis has played an important role in a wide variety of fields. Many different clustering algorithms have been developed for meeting different applications, such as K-means algorithm. In this paper we develop a cluster algorithm CADD (Clustering Algorithm based on Density and adaptive Density-reachable) and try to use it to cluster a set 2
4 of geophysical data from Ningxia Autonomous Region in China. The two different algorithms, K-means algorithm and the CADD algorithm, are applied in the real word application (geophysical data interpretation) so that it will be shown that if the new algorithm is good for geophysical prospecting application. By comparing the two pictures, the original geomorphological map in Ningxia and the clustering result using the CADD algorithm, it will be shown that whether the new algorithm is effective in this particular area application. The rest of the paper is organized as fallows: In the section 2, Background and Related Research. In the section 3, the development of the new algorithm will be introduced after some main problems are analyzed. In the section 4 is the design and implementation of the algorithm which includes the concepts and description of the clustering algorithm. In the section 5, performances of the new algorithm will be tested so that the new algorithm would be suited for some real word applications, such as arbitrary data point distributions and time and space complexity. In the section 6, the new algorithm is applied to a real world application, geophysical prospecting. The conclusion and future works will be presented in section Background and Related Research Cluster analysis is an active research area in data mining technology. In general, the major clustering algorithms can be divided into followings: the prototype-based clustering method, the grid-based clustering method, the partitioning method, the density-based clustering method, etc. In the prototype-based method [34], a cluster is a set of objects in which an object is closer or similar more to the prototype that defines the cluster than to the prototype of any other cluster. 3
5 The grid-based clustering methods [16, 18, 25] first quantize the clustering space into a finite number of cells (hyper-rectangles) and then perform the required operations on the quantized space. Cells that contain more than a certain number of points are treated as dense and the dense cells are connected to form the clusters. Partitioning methods [34, 14, 21, 6] are divided into two major subcategories. One of the partitioning methods is the centroid algorithm [35, 14] which represents each cluster by using the gravity centre of the instances. The most wellknown centroid algorithm is the K-means algorithm [19, 1, 13, 34] which partitions the data set into k subsets such that all points in a given subset are closest to the same centre. K-means algorithm randomly selects k of the instances to represent the clusters and if k cannot be known ahead of time, varies values of k can be evaluated until the most suitable one is found. K-means algorithm is efficient in processing large data sets [12] and it handles spherical shapes well [34]. But K-means has the weaknesses [23, 24] that it often terminates at a local optimum, it is sensitive to noise, it cannot handle non-globular clusters and it cannot detect outliers. Density-based methods [4, 26], such as the DBSCAN algorithm [4, 26, 32, 2, 5, 36] and the DENCLUE algorithm [9, 28 ], cluster objects based on the notion of density. The DBSCAN algorithm locates regions of high density that are separated from one another by regions of low density. The DBSCAN algorithm is a typical and effective density-based clustering algorithm which can find different types of clusters, can identify outliers and noise, but it cannot handle the clusters of varying density. The DENCLUE algorithm has a solid theoretical foundation. It models the overall density of a set of points as the sum of influence functions associated with each point. The DENCLUE algorithm is good at handling noise 4
6 and outliers and it can find clusters of different shapes and size, but it has trouble with high-dimensional data and data that contains clusters of widely different densities, and it can be more computationally expensive than other density-based clustering techniques. Different clustering algorithms are selected for different application areas [26, 29, 30, 20, 22, 7, 33, 10, 31, 11, 17, 40, 3]. A variety of factors need to be considered when deciding which type of clustering techniques to use. Our goal is that clustering algorithm can be appropriate for a particular clustering task. Generally, the task of choosing the proper clustering algorithm involves considering these issues such as type of clustering, type of cluster, number of data objects and number of attributes, and domain-specific issues as well. 3. The development of the new algorithm According to the analysis above, some main problems in existing algorithms are as following: 3.1 Selection of cluster shapes Generally, shapes of original clusters are divided into several different types [34, 15]: 1Well-separated clusters, each point is closer to all of the points in its cluster than to any point in another cluster. 2Centre-based clusters, each point is closer to the centre of its cluster than to the centre of any other cluster. 3 Contiguity-based clusters, each point is closer to at least one point in its cluster than to any point in another cluster. 4Density-based clusters, clusters are regions of high density separated by regions of low density. Because the structures of data sets are complicated in some real world applications, and distributions of data 5
7 points are different and cluster shapes cannot be predicted, clustering algorithms are needed to handle different shapes of original clusters. 3.2 Dependency on domain knowledge For some algorithms it is necessary to input the parameters of the number of clusters and the initial centroids of clusters. This is difficult for unsupervised data mining when there is lack of relevant domain knowledge [29, 30]. At the same time, different random initializations of numbers and centroids of clusters produce different clustering results, which have an effect on the stability of clustering methods. 3.3 Sensitivity to noise or outliers There are large amounts of noise or outliers in some real word applications. Some algorithms are sensitive to noise or outliers, such as partitioning methods, prototype-based clustering methods, and grid-based methods. For example, if there are some maximum value existed, the data point distribution may be highly distorted. So a better clustering algorithm is needed to be less sensitive to noise or outliers, also it can handle noise or outliers effectively. From the above analyses, we develop CADD algorithm, especially for some real world applications, such as geophysics or geochemistry. The aim is that it can promote the ability of finding clusters of arbitrary distribution and handling noise or outliers in some real world applications, that it has better performances and scalabilities in high dimensional data sets, and that it can automatically get some parameters and make such parameters less dependent on domain knowledge. 6
8 4. Design and implementation of the algorithm 4.1 The concepts of the clustering algorithm Based on the notions of density and adaptive density-reachable, the CADD algorithm which is designed and implemented in this paper has ability to find clusters of arbitrary shapes and sizes, to handle clusters of varying densities, and to identify noise or outliers effectively. The concepts of the CADD algorithm are as follows: 1The density of data points: The density of data points models the overall density of a set of points in dataset D as the sum of influence function associated with each point: Density ( x ) i = n j= 1 e 2 ( x, x ) d i j 2 2σ (1) f Guass Where, Gaussian influence function ( xi x j ) 2 (, ) d x i x j 2 2σ, = e indicates the density influence of each data point to the density of point x i, and σ is density adjustment parameter which is analogous to the standard deviation, and governs how quickly the influence of a point drops off. 2Local density attractors: Local density attractors are the data points at which the values of density function Density(x) are locally maximum. 3 Density-reachable distance: Density-reachable distance is used to determine a circular area of data point x, labeled as δ={x 0<d(x i,x j ) R}, the data points in which are belong a same cluster. The definition formula is: mean( D ) R = coefr n (2) 7
9 Where, mean(d) is the mean distance between all data points in dataset D, and coefr (0<coefR<1) is named as the original adjustment coefficient of densityreachable distance. 4Density-reachable: Density-reachable means that if there is an object chain p 1, p 2,,p n =q, q is a local density attractor, and p n-1 is density-reachable from q, then for p i D,(1 i<n-1) and d(p i,p i +1) R, we define that object p i, (1 i<n-1) is density-reachable from q. 5Adaptive density-reachable distance: In handling the clusters of varying densities, it is important to adjust the density-reachable distance R step by step during clustering. The adaptive adjustment is carried out through multiplying the original density-reachable distance R with an adjustment coefficient α: R Adap = αr (3) Where, R Adap is adaptive density-reachable distance, and α is defined as: α = Density( Attractor ) i 1 Density( Attractor ) i (4) This is because that when the density value of local density attractor of a cluster is greater, the distance between objects in the cluster is smaller, and on the contrary, when the density value of local density attractor of a cluster is smaller, the distance between objects in the cluster is larger. When i=1, let Density(Attractor 0 )= Density(Attractor 1 ), and so α 1. It is necessary to note that adaptive adjustment coefficient α may be also other function. 8
10 4.2 Description of the clustering algorithm Local density attractors are used to determine centroids of clusters, and adaptive density-reachable distance to locate data objects that belong to the cluster. The descriptions of CADD algorithm are as follows: Algorithm Clustering Algorithm based on Density and adaptive Density-reachable Input:Adjustment coefficient of density-reachable distance CoefR, Density adjustment parameter σ. Output:Number of clusters, the members of each cluster, outliers or noise points. 1:Compute the densities of each data point. 2:i 1 3:repeat 4: Seek the maximum density attractor O DensityMaxi in the original data set of clustering objects as the cluster centroid of C i. 5: Assign the data objects which are density reachable within adaptive densityreachable distance from O DensityMaxi to cluster C i,and at the same time delete the clustered objects from original data set. 6: i i+1 7:until The original data set is empty. 8:Assign the clusters which have fewer objects (such as less 5 or 10) into outlier or noise group. 5. The performances of the algorithm In this section, the performances of the CADD algorithm are tested by comparing the CADD algorithm and other algorithms. The testing of the performances include comparing of the clustering results between the clusters of 9
11 different data point distributions, the comparing time and space complexities using different algorithms. 5.1 Arbitrary data point distributions According to data point distributions in some certain real world applications, there are different shapes of original clusters which are well-separated clusters, centre-based clusters, contiguity-based clusters and density-based clusters, etc. The following experiments will show that the CADD algorithm can handle arbitrary data point distributions, such as non-globular clusters of different shapes, different sizes and different densities, in some real world applications. Figure 1 (a) and Figure 1 (b) show the clustering results in unequal density data point distributions using the CADD algorithm and K-means algorithm, respectively. In figure 1(a), the density of Cluster 1 is different from that of Cluster 2 and Cluster 3, and the density of Cluster 2 is different from that of Cluster 3. The shapes of the clustering results accord with the shapes of natural data point distributions. When data set is the equal density of data point distribution, K-means algorithm handles globular clusters well. But when data set is the unequal density data point distribution, K-means algorithm can not get good result even the shapes are globular. As figure 1(b) shows: K-means algorithm assigns some data objects of Cluster 1 to Cluster 2 and to Cluster 3 wrongly, and some data objects of Cluster 2 to Cluster 3 wrongly. It makes the shapes of clustering results differ from the shapes of natural data point distribution. Through this experiment we can clearly get that the CADD algorithm can handle clusters in unequal densities data point distribution, while K-means algorithm can not handle clusters in unequal densities data point distribution well. 10
12 25 c1 c2 c3 25 c1 c2 c (a) Clustering results by CADD algorithm (b) Clustering results by K-means algorithm Figure 1 Comparing clustering results by different algorithms Figure 2 shows the clustering results of non-globular clusters by using the CADD algorithm and using K-means algorithm, respectively. In Figure 2 (a) using the CADD algorithm, Cluster 1 is a globular cluster, Cluster 2 is a nonglobular cluster, and the outliers scatter around the two clusters. The CADD algorithm handles non-globular clusters very well and can recognize outliers. In Figure 2 (b) using K-means algorithm, because one of the clusters is non-globular shape, so it makes the globular cluster divided into two parts and the non-globular cluster also into two parts. K-means cannot handle non-globular clusters and cannot recognize outliers. This experiment clearly shows that the CADD algorithm in handling non-globular clusters is better than K-means algorithm c1 c2 out c1 c (a) Non-globular clusters by CADD algorithm (b) Non-globular clusters by K-means algorithm Figure 2 Comparing Non-globular clusters by CADD and K-means 11
13 The two experiments show that the CADD algorithm has good property in handling both non-globular clusters and unequal density data distribution. 5.2 Time and space complexity The two experiments of the time and space complexities are carried out so that the time and space complexity of the CADD algorithm can be compared with that of K-means algorithm. The curves of experimental results are shown in Figure 3. Figure 3 shows the time and space complexities using the CADD algorithm and K-means algorithm. When the amount of data objects is less than 3000, the running time of the CADD algorithm is the same as that of K-means algorithm. As the amount of data objects increases greatly, more than 3000, the running time of the CADD algorithm is much less than that of K-means algorithm. It is mainly because that as the amount of data objects increases, the iteration of K-means algorithm increases, and the running time increases. Because the CADD algorithm only necessary to search the density-reachable objects in data set one time for each cluster, its running time is lower than that of K-means algorithm. The time complexity of the CADD algorithm is O(kn), while the time of complexity of K-means algorithm is O(knt), where n is the number of data objects, k is the number of clusters, and t is the number of iterations required for convergence. The space complexity of the CADD algorithm is O(n) and K-means algorithm is O((k+n)m). 12
14 CADD K-means Run tim e (s) Records Figure 3 The time and space complexity of CADD and K-means Through the previous experiment and comparing analysis, the performances of the CADD algorithm, which include the ability to handle the non-globular and unequal density of data distribution, the time and space efficiency, are better than that of K-means algorithm. Through the special design of the CADD algorithm, k value of the number of clusters and cluster centers are can be determined automatically. The CADD algorithm is insensitive to abnormal data and can be able to discover outliers. So we can conclude that the most performances of the CADD algorithm are good for some real world applications. 6. A real world application The geophysical data are automatically measured and recorded by geophysical instruments. Generally, the amount of data is very large and relatively standard. It is suitable to be processed and be analyzed by data mining techniques. Using clustering algorithm of data mining, we can process some real word data which are measured in geophysical prospecting [8, 37, 27], such as Electrical Method, Gravity Exploration, and Magnetic Exploration, etc. We can also process the real word data which are obtained in different geophysical prospecting measurements 13
15 comprehensively. For example, we can store the geophysical prospecting data in spatial database, and every data point in the spatial database has some attributes associated with geophysical parameters measured in field. This is a very new approach of the real word application in geophysical data interpretation. The apparent resistivity curves of electrical sounding are stored in spatial database, which were measured in the whole province region, Ningxia province of China (Figure 5(a)). We try to test and analyze the clustering results of the electrical sounding data by using the CADD algorithm. By comparing the clustering results obtained by using different clustering algorithms, we can know that 1whether the CADD algorithm is effective and reasonable in the real word application of geophysical prospecting; 2whether the clustering results can describe the characteristic of underground electrical distribution of the real word application objectively and accurately. 6.1 Data preparation In the measuring region, a whole province region [39, 38], there are about 1100 electrical sounding points. At every sounding point, apparent resistivity is measured at 14 different AB/2={3, 4.5, 7, 12, 20, 30, 45, 70, 120, 200, 300, 450, 700, 1000m}, and every sounding curve has 14 apparent resistivity values ρ = s ρ (3), ρ (4.5),, ρ (1000)}. The sounding curves are stored in spatial { s s s database, and every electrical sounding curve is a data point (tuple) which has 14 attributes ρ = ρ (3), ρ (4.5),, (1000)} associated with 14 different AB/2. s { s s ρ s The purpose of clustering analysis is to divide electrical sounding curves into different types at different part of the measuring regions, and the types of sounding curves reflect the geo-electrical features of the region. 14
16 6.2 The clustering results Figure 4 (a) shows the clustering result distribution of the apparent resistivity curves of electrical sounding by using K-means algorithm. As can be seen, there are two main clusters in the clustering result. The blue area distributes widely and continuously, which is Cluster 1. The orange area distributes in small area and continuously partly, which is Cluster 2. There are two other clusters which include only less than 5 points, so they are shown in the same color as no-data sample points, and also there is no meaning in practice. It can be seen in Figure 4 (a) that the clustering result distributions by using K-means algorithm is simple and it cannot reflect the real distribution of the apparent resistivity curves of electrical method in the measured area. (a) Clustering Result by K-means (b) Clustering Result by CADD Figure 4 Clustering Results of the Apparent Resistivity Curves Figure 4 (b) is the clustering result distribution of the apparent resistivity curves of electrical method by using the CADD algorithm. It can be seen from Figure 4 15
17 (b) that there are four clusters in the measured area. Cluster 1 is blue color which includes over six hundred measuring points. Cluster 2 is orange color which includes one hundred and seventy measuring points. Cluster 3 is dark green which includes about one hundred and thirty measuring points. Cluster 4 is light green color which includes about fifteen measuring points. The outliers are the points which the clusters include less than 5 measuring points. The no-data sample points and the outliers are shown in no cluster. Comparing Figure 5 (a) with Figure 5 (b), it is obvious that each cluster (b) is fitted well with each region in Geomorphological Map of Ningxia Plain (a). The distribution area of sounding points (blue points (b)) in Cluster 1 is fitted with the region of river-lake sedimentary plain (a). The distribution area of sounding points (orange points (b)) in Cluster 2 is fitted with the region of alluvial flood sedimentary plain and tectonic erosion hilly (a). The distribution area of sounding points (dark green points (b)) in Cluster 3 is fitted with the region of aeolian sand hilly (a). The distribution area of sounding points (light green points (b)) in Cluster 4 is seldom scattered in the region of tectonic erosion plain and alluvial lake sedimentary plain (a). The distribution area of outliers (no cluster (b)) reflects the region of tectonic erosion plain and alluvial flood sedimentary plain in the piedmont (a), because the variational range of electric sounding data is too large to cluster which is caused by tectonic erosion. The clustering result reflects the electric distribution characteristics in Ningxia Plain, and the result is meaningful, useful and effective. 16
18 tectonic erosion plain alluvial flood sedimentary plain river-lake sedimentary plain aeolian sand hilly tectonic erosion hilly (a) Geomorphological Map (b) Clustering Result by CADD Figure 5 Geological-Geomorphological Map and the Clustering Result in Ningxia Plain 7. Conclusion and future works In this paper, we have developed a new clustering algorithm for the real word application of geophysical prospecting. Firstly, the clustering analysis, one of the data mining techniques, has successfully been applied in geophysical data interpretation and the meaningful and useful results can be got by CADD algorithm. Secondly, the new algorithm was based on density and adaptive 17
19 density-reachable so that it could handle the data sets with arbitrary data point distributions. After it had been tested by using different data sets, the new algorithm could have many anticipative performances such as time and space complexity. Thirdly, the results for both test data sets and real data sets indicated that the new algorithm was effective and efficient and that it could eliminate the effect of abnormal data (noise or outliers), and suitable for large data sets and high-dimensional data sets. Since the CADD algorithm is especially designed for the applications of geophysical prospecting, future research will have to consider the applications of different geophysical prospecting in many different ways. Also, we may try to extend it to other domains such as wireless sensor network or even some social data sets. At the same time, the performances of the CADD algorithm can be tested in many different domains and it can be improved further. Acknowledgment The material is based on work supported by National Natural Science Foundation of China under Grant No Michael O'Grady thanks the support of the Irish Research Council for Science, Engineering & Technology (IRCSET) though the Embark Initiative postdoctoral fellowship programme. Gregory O'Hare thanks the support of Science Foundation Ireland under Grant No. 03/IN.3/1361. References 1. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York. (1973) 2. Birant, D., Kut, A.: ST-DBSCAN: An algorithm for clustering spatial temporal data. Data & Knowledge Engineering. 60(1), (2007) 3. Chang, H., Yeung, D.Y.: Locally linear metric adaptation with application to semi-supervised clustering and image retrieval. Pattern Recognition. 39(7), (2006) 18
20 4. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd int. Conf. on Knowledge Discovery and Data Mining. pp Portland, Oregon (1996) 5. Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X.: Incremental Clustering for Mining in a Data Warehousing Environment. Proceedings of the 24th VLDB Conference. New York, USA (1998) 6. Event, G., Naor, J., Rao, S., Schieber, B.: Fast Approximate Graph Partitioning Algorithms. SIAM Journal on Computing. 28(6), (1999) 7. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: towards a unifying framework. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (1996) 8. Fu, L.K.: Textbook of Electrical Imaging Surveys. The Geological Publishing House, Beijin, China (1983) 9. Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In Proc. Of th 4th Intl. Conf. on Knowledge Discovery and Data Mining, New York City. pp AAAI Press (1998) 10. Horovitz, O., Krishnaswamy, S., Gaber, M.M.: A fuzzy approach for interpretation of ubiquitous data stream clustering and its application in road safety. Intelligent Data Analysis. 11(1), (2007) 11. Hsieh, K.L., Tong, L.I., Wang, M.C.: The application of control chart for defects and defect clustering in IC manufacturing based on fuzzy theory. Expert Systems with Applications, 32(3), (2007) 12. Huang, Z.: Extensions to the K-Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery. pp (1998) 13. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall Advanced Reference Series, Prentice Hall. (1988) 14. Jain, A.K., Murty, M.N., Flyn, P.J.: Data Clustering: A Review. ACM Computing Surveys. 31(3). (1999) 15. Kaufman, L., Rousseeuw, P. J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. John Wiley and Sons, New York (1990) 19
21 16. Kotsiantis, S., Pintelas, P.: Recent Advances in Clustering: A Brief Survey. WSEAS Transactions on Information Science and Applications. 1(1), (2004) 17. Li, L.Q., Ji, H.B., Gao, X.B.: Maximum entropy fuzzy clustering with application to real-time target tracking. Signal Processing. 86(11), (2006) 18. Ma Eden, W.M., Chow, T.W.S.: A new shifting grid clustering algorithm. Pattern Recognition. 37(3), (2004) 19. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability, pp University of California Press (1967) 20. Meng, H.D., Song, Y.C.: The implementation and application of data mining system based on campus network. Journal on communications, Beijing, China. 26, (2005) 21. Mitra, D., Ziedins, I.: Hierarchical virtual partitioning--algorithms for virtual private networking. Bell Labs Technical Journal. 2(2), (1997) 22. Peng,W.: The computer processing of remote sensing data and geography information system. Beijing Normal School Publishing House, Beijing, China. (2006) 23. Pham, D.T., Dimov, S.S., Nguyen, C.D.: A two-phase K-means algorithm for large datasets. Proceedings of the Institution of Mechanical Engineers -- Part C -- Journal of Mechanical Engineering Science. vol. 218 Issue 10, pp October (2004) 24. Pham, D.T., Dimov, S.S., Nguyen, C.D.: An Incremental K-means algorithm. Proceedings of the Institution of Mechanical Engineers -- Part C -- Journal of Mechanical Engineering Science. vol. 218 Issue 7, pp July (2004) 25. Pilevar, A.H., Sukumar, M.: A grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognition Letters. 26(7), (2005) 26. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Mining and Knowledge Discovery, Kluwer Academic Publishers. 2, 1 27 (1998) 27. Shen, Z.L.: Hydrogeology Textbook. Beijing Science and Technology Press, Beijing, China (1985) 28. Song, Y.C.: Application Research on Clustering Analysis of Data Mining in Geophysical- Geochemical Data Processing. Dissertation, China University of Geoscience, Beijing, China (2006) 20
22 29. Song, Y.C., Meng, H.D.: The design of expert system of market basket analysis based on data miming. Market modernization J, Beijing, China.7, (2005) 30. Song, Y.C., Meng, H.D.: The design of expert system of market basket analysis based on data miming. Market modernization J, Beijing, China. 6, (2005) 31. Soto, J., Aguiar, M.I.V., Flores-Sintas, A.: A fuzzy clustering application to precise orbit determination. Journal of Computational & Applied Mathematics. 204(1), (2007) 32. Stefanakis, E.: NET-DBSCAN: clustering the nodes of a dynamic linear network. International Journal of Geographical Information Science. 21(4), (2007) 33. Su, M.C., Liu, Y.C.: A new approach to clustering data with arbitrary shapes. Pattern Recognition. 38(11), (2005) 34. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Post & Telecom Press of P.R.China (China Edition), Beijing (2006) 35. Trepalin, S., Osadchiy, N.: The centroidal algorithm in molecular similarity and diversity calculations on confidential datasets. Computer-Aided Molecular Design J. 19(9), (2005) 36. Xu X., Jager J., Kriegel, H.P.: A Fast Parallel Clustering Algorithm for Large Spatial Data sets, Data Mining and Knowledge Discovery. 3, (1999) 37. Yang, J.: The Data-Processing and Interpretation Software System for IP Water Exploration and Its Application. Geophysical and Geochemical Exploration (China Edition). 23(5), (1999) 38. Yang, J., Meng, H.D., Fu, L.K.: The Data-Processing and Interpretation Software System for IP Water Exploration. Geophysical and Geochemical Exploration. 22(3), (1998) 39. Yin, B.X.: Integrated Study of Groundwater Recharge and Water Quality Distribution in Yin Chuan lain. Dissertation, China University of Geoscience, Beijing, China (2006) 40. Zhao, W.Z., Wu, C.Y., Yin, K., Young, T.Y., Ginsberg, M.D.: Pixel-based statistical analysis by a 3D clustering approach: Application to autoradiographic images. Computer Methods & Programs in Biomedicine. 83(1), (2006) 21
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
A Two-Step Method for Clustering Mixed Categroical and Numeric Data
Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Identifying erroneous data using outlier detection techniques
Identifying erroneous data using outlier detection techniques Wei Zhuang 1, Yunqing Zhang 2 and J. Fred Grassle 2 1 Department of Computer Science, Rutgers, the State University of New Jersey, Piscataway,
A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets
A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets Preeti Baser, Assistant Professor, SJPIBMCA, Gandhinagar, Gujarat, India 382 007 Research Scholar, R. K. University,
Public Transportation BigData Clustering
Public Transportation BigData Clustering Preliminary Communication Tomislav Galba J.J. Strossmayer University of Osijek Faculty of Electrical Engineering Cara Hadriana 10b, 31000 Osijek, Croatia [email protected]
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise
DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
Data Mining Process Using Clustering: A Survey
Data Mining Process Using Clustering: A Survey Mohamad Saraee Department of Electrical and Computer Engineering Isfahan University of Techno1ogy, Isfahan, 84156-83111 [email protected] Najmeh Ahmadian
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Cluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases
Published in the Proceedings of 14th International Conference on Data Engineering (ICDE 98) A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Xiaowei Xu, Martin Ester, Hans-Peter
A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory
A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory Yen-Jen Oyang, Chien-Yu Chen, and Tsui-Wei Yang Department of Computer Science and Information Engineering National Taiwan
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
On Clustering Validation Techniques
Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI [email protected] YANNIS
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
A Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
A Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
Clustering Techniques: A Brief Survey of Different Clustering Algorithms
Clustering Techniques: A Brief Survey of Different Clustering Algorithms Deepti Sisodia Technocrates Institute of Technology, Bhopal, India Lokesh Singh Technocrates Institute of Technology, Bhopal, India
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between
Context-aware taxi demand hotspots prediction
Int. J. Business Intelligence and Data Mining, Vol. 5, No. 1, 2010 3 Context-aware taxi demand hotspots prediction Han-wen Chang, Yu-chin Tai and Jane Yung-jen Hsu* Department of Computer Science and Information
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Data Clustering Techniques Qualifying Oral Examination Paper
Data Clustering Techniques Qualifying Oral Examination Paper Periklis Andritsos University of Toronto Department of Computer Science [email protected] March 11, 2002 1 Introduction During a cholera
Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. [email protected] 2 SCMS School of Technology
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering
GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering Yu Qian Kang Zhang Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083-0688, USA {yxq012100,
Scalable Cluster Analysis of Spatial Events
International Workshop on Visual Analytics (2012) K. Matkovic and G. Santucci (Editors) Scalable Cluster Analysis of Spatial Events I. Peca 1, G. Fuchs 1, K. Vrotsou 1,2, N. Andrienko 1 & G. Andrienko
Data Clustering Using Data Mining Techniques
Data Clustering Using Data Mining Techniques S.R.Pande 1, Ms. S.S.Sambare 2, V.M.Thakre 3 Department of Computer Science, SSES Amti's Science College, Congressnagar, Nagpur, India 1 Department of Computer
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW 1 XINQIN GAO, 2 MINGSHUN YANG, 3 YONG LIU, 4 XIAOLI HOU School of Mechanical and Precision Instrument Engineering, Xi'an University
Cluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
On Data Clustering Analysis: Scalability, Constraints and Validation
On Data Clustering Analysis: Scalability, Constraints and Validation Osmar R. Zaïane, Andrew Foss, Chi-Hoon Lee, and Weinan Wang University of Alberta, Edmonton, Alberta, Canada Summary. Clustering is
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA
CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA 1 SADAF KHAN, 2 ROHIT SINGH 1,2 Career Point University Abstract- Big data if used properly can bring huge benefits to the
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
Determining the Number of Clusters/Segments in. hierarchical clustering/segmentation algorithm.
Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms Stan Salvador and Philip Chan Dept. of Computer Sciences Florida Institute of Technology Melbourne, FL 32901
Clustering methods for Big data analysis
Clustering methods for Big data analysis Keshav Sanse, Meena Sharma Abstract Today s age is the age of data. Nowadays the data is being produced at a tremendous rate. In order to make use of this large-scale
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Data Mining K-Clustering Problem
Data Mining K-Clustering Problem Elham Karoussi Supervisor Associate Professor Noureddine Bouhmala This Master s Thesis is carried out as a part of the education at the University of Agder and is therefore
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Cluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Comparison the various clustering algorithms of weka tools
Comparison the various clustering algorithms of weka tools Narendra Sharma 1, Aman Bajpai 2, Mr. Ratnesh Litoriya 3 1,2,3 Department of computer science, Jaypee University of Engg. & Technology 1 [email protected]
Anomaly Detection using multidimensional reduction Principal Component Analysis
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. II (Jan. 2014), PP 86-90 Anomaly Detection using multidimensional reduction Principal Component
Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
An Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
SoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
A Novel Approach for Network Traffic Summarization
A Novel Approach for Network Traffic Summarization Mohiuddin Ahmed, Abdun Naser Mahmood, Michael J. Maher School of Engineering and Information Technology, UNSW Canberra, ACT 2600, Australia, [email protected],[email protected],M.Maher@unsw.
