Selection of the Suitable Parameter Value for ISOMAP
|
|
|
- Mary Nash
- 9 years ago
- Views:
Transcription
1 1034 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 Selection of the Suitable Parameter Value for ISOMAP Li Jing and Chao Shao School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou , China {jingli , Abstract As a promising dimensionality reduction and data visualization technique, ISOMAP is usually used for data preprocessing to avoid the curse of dimensionality and select more algorithms or improve the performance of algorithms used in data mining process according to No Free Lunch (NFL) Theorem. ISOMAP has only one parameter, i.e. the neighborhood size, upon which the success of ISOMAP depends greatly. However, it s an open problem how to select a neighborhood size efficiently. Based on the unique feature of shortcut edges, introduced into the neighborhood graph by using the un neighborhood size, this paper presents an efficient method to select a neighborhood size according to the decrement of the sum of all the shortest path distances. In contrast with the straightforward method with residual variance, our method only requires running the former part of ISOMAP (shortest path computation) incrementally, which makes it less time-consuming, while yielding the same results. Finally, the feasibility and robustness of this method can be verified by experimental results well. Index Terms data visualization, ISOMAP, geodesic distance, shortest path distance, neighborhood size, residual variance I. INTRODUCTION Nowadays, the explosive growth in the amount of data and their dimensionality makes data visualization more and more important in data mining process. According to No Free Lunch (NFL) Theorem, the data information should be taken into account to select more data analysis/processing algorithms for data mining. For highdimensional data, the useful distribution and structure information cannot be seen by eyes directly, but it can be obtained by data visualization approaches easily. During the last hundreds of years, lots of approaches to visualize high-dimensional data have been emerged, most of which fall into the following five categories: 1) use several sub-windows to represent visually different subsets of the dimensions respectively, such as scatterplot matrices[1] and pixel-oriented techniques[2]; 2) rearrange the dimension axes in the low-dimensional space, such as parallel coordinates[3] and star coordinates[4]; 3) embed Manuscript received November 1, 2010; revised December 15, 2010; accepted January 15, Corresponding author: Li Jing, jingli @yahoo.com.cn. the dimensions to partition the low-dimensional space hierarchically, such as dimensional stacking[5] and treemap[6]; 4) use certain objects with several visual features, each of which stands for one dimension, such as Chernoff-faces[7] and stick figures[8]; 5) reduce the dimensionality of the data to two or three dimensions using dimensionality reduction techniques, such as PCA (Principal Component Analysis)[9], MDS (Multidimensional Scaling)[10], SOM (Self-Organizing Map)[11], ISOMAP (Isometric Mapping)[12][13][14], LLE (Locally Linear Embedding)[15][16] and Laplacian Eigenmap[17], etc. Unlike other approaches, dimensionality reduction techniques try to preserve the high-dimensional relationship between the data in the low-dimensional space, which can represent visually the distribution and structure of the data very well. In addition, dimensionality reduction techniques can avoid the curse of dimensionality and improve the performance of algorithms used in data mining process. As one of nonlinear dimensionality reduction techniques, ISOMAP, a variant of MDS (Multi-dimensional Scaling)[12], preserves global geodesic distances between the data in the low-dimensional embedded space, and thus can visualize the convex but intrinsically flat manifolds such as the Swiss-roll and S-curve data sets nicely. Its ability to preserve the global geometric structure of manifolds in the non-iterative way makes it more and more attractive in data preprocessing[16], in addition, one of its main advantages is that only one parameter, i.e. the neighborhood size upon which the success of ISOMAP depends greatly[18], is required. Like other manifold learning techniques such as LLE[15] and Laplacian Eigenmap[17], it s very important for ISOMAP to select a neighborhood size, which means a good trade-off between locality and globality, however, this is an open problem. It s well known that the neighborhood size should be neither so large that the neighborhood graph contains shortcut edges (which connect two data points with rather large distance along the manifold but relatively small Euclidean distance and thus don t lie on the manifold) so as not to represent the neighborhood structure of the data correctly, nor so small that the graph becomes disjoined or sparse so as not to approximate geodesic distances between the data accurately[18]. A straightforward method is to select a neighborhood size through estimating the doi: /jsw
2 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE quality of the corresponding mapping measured by residual variance[12]. So this method requires running the whole ISOMAP algorithm with every possible neighborhood size, which makes it very time-consuming. In this paper, we present an efficient method to select a neighborhood size, in which only the former part of the ISOMAP algorithm, i.e. shortest path computation, is required to run incrementally. This paper is organized as follows: In Section II, ISOMAP and the straightforward method with residual variance are recalled briefly. In Section III, our method is described in detail. In Section IV and Section V, experimental results and conclusions are given respectively. II. ISOMAP AND THE STRAIGHTFORWARD METHOD WITH RESIDUAL VARIANCE A. ISOMAP When the global geometric structure of highdimensional data is unknown, we are not sure that the Euclidean distance metric is to represent the dissimilarity relationship between the data. Fortunately, the Euclidean distance metric is trustworthy enough to represent the dissimilarity relationship between the data within a small enough neighborhood, which is also called the local Euclidean nature of the manifold. So the global geometric structure can be approximated using the local Euclidean distance information, as ISOMAP[12] does. If the data lies on a single well-sampled manifold, it s proved that the unknown global geodesic distances between the data can be well approximated by the corresponding shortest path distances in the neighborhood graph which represents the right neighborhood structure of the data[20]. After using the geodesic distance metric instead of the Euclidean distance metric, ISOMAP uses the classical MDS algorithm to map the data into the low-dimensional embedded space. So ISOMAP can be described briefly as follows[12][13]: 1) Select n representative data points randomly or using vector quantization (with better results[21]) for very large data sets to keep subsequent computation tractable; 2) Construct a neighborhood graph (connected for data visualization) using the nearest neighbors method with a neighborhood size ( is more natural to choose than ε [22]); 3) Compute all the shortest path distances in the neighborhood graph; 4) Use the classical MDS algorithm to map the data into the low-dimensional embedded space. B. The Straightforward Method with Residual Variance Given the data lying on a single well-sampled convex but intrinsically flat manifold such as the Swiss-roll and S-curve data sets, the success of ISOMAP depends upon selecting a neighborhood size, with which the neighborhood graph can represent the right neighborhood structure of the data and thus geodesic distances can be approximated by the corresponding shortest path distances accurately[18]. A straightforward method is to find a neighborhood size through estimating the quality of the corresponding mapping, i.e. how well the highdimensional structure is represented in the lowdimensional embedded space[19], measured by residual 2 variance[12]: 1, where ρ is the ρ D ˆ ( ) X D Y D ˆ X ( ) D Y standard linear correlation coefficient, taken over all the entries of Dˆ X ( ) and D Y, and where Dˆ X ( ) and DY are matrices of geodesic distances approximated by the shortest path distances in the high-dimensional data space X, which is the function of the neighborhood size, and Euclidean distances in the low-dimensional embedded space Y, which is the low-dimensional mapping of ISOMAP, respectively. The lower residual variance is, the better the highdimensional structure is represented in the lowdimensional embedded space, and thus the more the neighborhood size is. So the optimal can be defined as follows[19]: 2 = arg min(1 ρ ) (1) opt Dˆ X ( ) D Y Due to residual variance s use of Y, i.e. the lowdimensional mapping of ISOMAP, and its multimodality, we must run the whole ISOMAP algorithm with every possible [ min, max ] ( min is the minimal which can make the corresponding neighborhood graph connected and max is the predefined maximal which can be specified further in the next section), and select the optimal with the minimal residual variance, which makes it very time-consuming. So, residual variance can only be used to estimate the intrinsic dimensionality of the data and compare the relative goodness of two neighborhood sizes, however, not to select a neighborhood size practically. III. OUR METHOD A neighborhood size should be the one with which the neighborhood graph can represent the right neighborhood structure of the data and thus geodesic distances can be approximated by the corresponding shortest path distances accurately. To do this, the neighborhood graph should be connected and dense, but it cannot contain shortcut edges. Under these restrictions, the more the number of edges in the neighborhood graph is (that is, the denser the neighborhood graph is), the more accurately geodesic distances are approximated by the corresponding shortest path distances, which can be verified by experimental results in the next section. So, a neighborhood size should be large enough, while not introducing shortcut edges into the neighborhood graph. Obviously, under the restriction that the neighborhood graph is connected, as the neighborhood size increases and new edges are added into the neighborhood graph, the involved shortest path distances decrease
3 1036 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 monotonously due to triangular inequality of the Euclidean distance metric and the definition of shortest path distance as in (2), limited by the corresponding geodesic distances before shortcut edges are introduced into the neighborhood graph. d ˆ = min( dˆ, dˆ + dˆ ), k = ij ij ik kj 1,2, (2) So, the sum of all the shortest path distances is a monotonously descent function of the neighborhood size : f ( ) = Dˆ X ( ) (3) all the entries Before shortcut edges are introduced into the neighborhood graph (when the neighborhood size is ), as the neighborhood size increases, the neighborhood graph is denser along the manifold, new edges added into the neighborhood graph will be longer and their influences on the involved shortest path distances will be weaker, and thus the decrement of the sum of all the shortest path distances f () will decrease gradually. Once shortcut edges are introduced into the neighborhood graph (when the neighborhood size is un), the sum of all the shortest path distances f () will drop sharply in contrast with the former gradually decreasing downtrend, because shortcut edges can influence much more shortest paths than other edges in the neighborhood graph and make them be no longer along the manifold. Consequently, to select a neighborhood size, we can compute the sum of all the shortest path distances f (), which only requires running the former path of ISOMAP (shortest path computation) incrementally beginning with min, and select the first one at which the sum of all the shortest path distances f () begins to drop sharply in contrast with the former gradually decreasing downtrend as the (i.e. ) to be used in the subsequent ISOMAP algorithm. For example, we can select as follows: IF f ( ) f ( + 1) > 2*( f ( 1) f ( )) AND first == TRUE = ; (4) first = FALSE; END Where first is a boolean variable whose initial value is TRUE. So, our method is less time-consuming than the straightforward method with residual variance. In addition, shortcut edges will be introduced into the neighborhood graph if a bit larger is used, so the selected is also the maximal, which can also be used as the upper boundary of the s, i.e. max, for the straightforward method with residual variance. Although our method doesn t require running the latter part of ISOMAP, i.e. the classical MDS algorithm which includes the very time-consuming eigenvalue decomposition, our method still requires running the former part of ISOMAP, i.e. shortest path computation, which is still time-consuming. Note that weights of the edges in the neighborhood graph are specified as the corresponding Euclidean distances, and the Euclidean distance metric meets the symmetry and triangular inequality conditions, so we can greatly quicken Dijkstra s shortest path algorithm based on these two characteristics of the Euclidean distance metric, and the corresponding time costs of these two shortest path algorithms with different neighborhood size over the well-known Swiss-roll and S-curve data sets are given in Fig. 1(a) and Fig. 1(b) respectively (different neighborhood size means different neighborhood graph). As a Dijkstra-like shortest path algorithm, its differences from Dijkstra s shortest path algorithm are listed as follows: 1) The shortest paths found previously needn t be computed once again and can also be used to find other shortest paths (use the symmetry condition); 2) For each vertex, its edges existing in the neighborhood graph are the shortest paths themselves, needn t be computed once again, and can also be used to find other shortest paths (use the triangular inequality condition) Dijkstra's shortest path algorithm Our Dijkstra-like shortest path algorithm (a) The Swiss-roll data set. Dijkstra's shortest path algorithm Our Dijkstra-like shortest path algorithm (b) The S-curve data set. Figure 1. Time costs of these two shortest path algorithms over different neighborhood graph of different data set. We represent the i-th data point and its nearest neighbors by x i and N (i) respectively, and then the
4 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE adjacent matrix of the corresponding neighborhood graph can be represented by P = ( p ij ) n n, where 0, j = i pij = xi x j, j N ( i) or i N ( j) (5) +, else Consequently, our Dijkstra-like shortest path algorithm can be described as follows: Initialize P = p ij ) n n ( ; ˆ ˆ to be identical with D X ( ) = ( d i ) j n n For each data point i S { j dˆ < + } ; ( S = {i} in Dijkstra s = ij shortest path algorithm) T = { 1,, n} S ; For each data point j T d min{ ˆ ˆ j = d il + d lj} ; ( d j = dˆ ij in Dijkstra s l S While T Φ j = arg min{ d } ; l T d ˆ ij = d ; j d ˆ ji = d j l shortest path algorithm) ; (This two shortest paths can also be used to find other shortest paths) S = S + { j} ; T = T { j} ; For each data point l T If dl > dˆ ij + dˆ jl d = dˆ + dˆ ; l ij jl IV. EXPERIMENTAL RESULTS To contrast our method with the straightforward method with residual variance, we run ISOMAP, which uses our faster Dijkstra-like shortest path algorithm, with different neighborhood size over two widely-used data sets, i.e. Swiss-roll and S-curve, with 2000 data points sampled uniformly from the corresponding intrinsic manifolds given in Fig. 2(a) and Fig. 2(b) respectively. In the experiments, we use the k-means algorithm in Matlab v6.5 toolboxes to select n=500 representative data points from these two data sets respectively. Residual variance and our function f (), i.e. the sum of all the shortest path distances obtained by our faster Dijkstra-like shortest path algorithm, with different over these two data sets are given in Fig. 3 and Fig. 4 respectively. (a) The intrinsic manifold of the Swiss-roll data set. (b) The intrinsic manifold of the S-curve data set. Figure 2. Two manifolds used in the experiments. In the experiments, we specify the first at which the next decrement of our function is one times larger than the last one as the to be used in the subsequent ISOMAP algorithm, as in (4). From Fig. 3 and Fig. 4, We can see that our method selects the same neighborhood size with the straightforward method with residual variance, which means what our method selects, i.e., is also optimal, for example, =7 over the Swiss-roll data set, represented by five-pointed stars in Fig. 3(a) and Fig. 3(b), but with seconds in our method vs seconds in the straightforward method with residual variance, and =18 over the S-curve data set, represented by five-pointed stars in Fig. 4(a) and Fig. 4(b), but with seconds in our method vs seconds in the straightforward method with residual variance. To verify the suitability of the selected, the neighborhood graphs and low-dimensional mappings of ISOMAP with the corresponding over these two data sets are given in Fig. 5 and Fig. 6 respectively, from which we can see that the intrinsic manifold structures of these two data sets are recovered nicely, which means that the selected is.
5 1038 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE Residual variance (a) Residual variance with different. (a) The Swiss-roll data set, =7. 10 x f() (b) The sum of all the shortest path distances ( f ( ) ) with different. Figure 3. The contrast of these two methods over the Swiss-roll data set ( is represented by the five-pointed star). (b) The S-curve data set, =18. Figure 5. The neighborhood graphs with the corresponding over different data set. Residual variance (a) Residual variance with different. (a) The Swiss-roll data set, = x f() (b) The sum of all the shortest path distances ( f ( ) ) with different. Figure 4. The contrast of these two methods over the S-curve data set ( is represented by the five-pointed star). (b) The S-curve data set, =18. Figure 6. The low-dimensional mappings of ISOMAP with the corresponding over different data set.
6 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE When we increase by one correspondingly, shortcut edges (represented by dashed lines in Fig. 7(a) and Fig. 7(b) respectively) are introduced into the corresponding neighborhood graphs, and thus the corresponding low-dimensional mappings of ISOMAP worsen badly, as given in Fig. 8, which means what our method selects, i.e., is also the maximal. (b) The S-curve data set, =19. Figure 8. The low-dimensional mappings of ISOMAP with the corresponding +1 over different data set. (a) The Swiss-roll data set, =8. To verify the robustness of our method, we run ISOMAP with different neighborhood size over the noisy Swiss-roll and the noisy S-curve data sets, with zero-mean normally distributed noise added to each data point of the corresponding data set, where the standard deviation of the noise is chosen to be 2% of smallest dimension of the bounding box enclosing the data. Residual variance and our function f () with different over these two noisy data sets are given in Fig. 9 and Fig. 10 respectively (b) The S-curve data set, =19. Figure 7. The neighborhood graphs with the corresponding +1 over different data set (shortcut edges are represented by dashed lines). Residual variance x 106 (a) Residual variance with different f() (a) The Swiss-roll data set, = (b) The sum of all the shortest path distances ( f ( ) ) with different. Figure 9. The contrast of these two methods over the noisy Swiss-roll data set ( is represented by the five-pointed star).
7 1040 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 Residual variance TABLE I. THE RESIDUAL VARIANCE AND THE NUMBER OF EDGES IN THE CORRESPONDING NEIGHBORHOOD GRAPH OBTAINED BY THREE METHODS WITH =6 AND =7 OVER THE SWISS-ROLL DATA SET Three methods ISOMAP with the nearest neighbors method Residual variance The number of edges in the corresponding neighborhood graph = = x (a) Residual variance with different. ISOMAP with the -edge connected neighborhood graph ISOMAP with the mutual neighborhood graph = = = = f() (b) The sum of all the shortest path distances ( f ( ) ) with different. Figure 10. The contrast of these two methods over the noisy S-curve data set ( is represented by the five-pointed star). From Fig. 9 and Fig. 10, We can see that our method still can select the same neighborhood size with the straightforward method with residual variance over the noisy data sets, that is, what we selects is still the maximal. From Fig. 3(a) and Fig. 4(a), under the restriction that shortcut edges are not introduced into the neighborhood graph (that is, is less than or equal to the selected ), as more edges are introduced into the neighborhood graph, indicated by the larger, geodesic distances can be approximated more accurately, indicated by the smaller residual variance. This fact can also be verified by the comparison among three methods to construct the neighborhood graph in ISOMAP: ISOMAP with the nearest neighbors method[12], ISOMAP with the -edge connected neighborhood graph[23], and ISOMAP with the mutual neighborhood graph[24]. In these three methods with the same neighborhood sizes =6 and =7 respectively over the Swiss-roll data set, ISOMAP with the -edge connected neighborhood graph has the most edges in the corresponding neighborhood graph and the smallest residual variance, and ISOMAP with the mutual neighborhood graph has the least edges in the corresponding neighborhood graph and the largest residual variance, as given in Table I. In this sense, what our method selects, i.e. the maximal, is optimal, which can also be proved by the fact that what our method selects is same to the one with the minimal residual variance, as given in Fig. 3-4 and Fig In addition, from Fig. 3-4 and Fig. 9-10, we can see that the sum of all the shortest path distances ( f () ) decreases monotonously and its decrement decreases gradually until a sharp drop emerges, at which shortcut edges begin to emerge in the neighborhood graph and the corresponding low-dimensional mapping of ISOMAP begins to worsen badly, represented by Fig. 5-8 respectively, by which the feasibility and effectiveness of our method can be verified. V. CONCLUSIONS It's an open problem for those manifold learning techniques such as ISOMAP and LLE to select a neighborhood size, with which a right local connectivity can be constructed and thus these techniques can be applied successfully. In this paper, we present an efficient and robust method to select the neighborhood size according to the decrement of the sum of all the shortest path distances. Our method is less timeconsuming than the straightforward method with residual variance, while yielding the same results. What s more, what our method selects is also the maximal, which can be provided as the upper boundary of the s, i.e. max, for the straightforward method with residual variance. ACNOWLEDGMENT This work was supported in part by Research Programme of Henan Fundamental and Advanced Technology of China (No ), the ey Technologies R&D Programme of Henan Province, China (No ).
8 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE REFERENCES [1] W. S. Cleveland, Visualizing Data. New Jersey, U.S.A.: Hobart Press, [2] D. A. eim, Designing pixel-oriented visualization techniques: theory and applications, IEEE Transactions on Visualization and Computer Graphics, vol. 6, no. 1, pp , [3] A. Inselberg and B. Dimsdale, Parallel coordinates: a tool for visualizing multidimensional geometry, in Proc. IEEE Visualization 1990, San Francisco, California, October 23-25, 1990, pp [4] E. andogan, Visualizing multi-dimensional clusters, trends, and outliers using star coordinates, in Proc. 7th ACM SIGDD International Conference on nowledge Discovery and Data Mining, San Francisco, California, August 26-29, 2001, pp [5] J. LeBlanc, M. O. Ward, and N. Wittels, Exploring n- dimensional databases, in Proc. IEEE Visualization 1990, San Francisco, California, October 23-25, 1990, pp [6] B. Johnson, Visualizing hierarchical and categorical data. Ph.D. Thesis, Department of Computer Science, University of Maryland, [7] E. R. Tufte, The visual display of quantitative information. Cheshire, CT: Graphics Press, [8] R. M. Pickett and G. G. Grinstein, Iconographic displays for visualizing multidimensional data, in Proc IEEE Conference on Systems, Man and Cybernetics, Beijing and Shenyang, China, August 8-12, 1988, pp [9] J. C. Mao and A.. Jain, Artificial neural networks for feature extraction and multivariate data projection, IEEE Transactions on Neural Networks, vol. 6, no. 2, pp , [10] A. Naud and W. Duch, Interactive data exploration using MDS mapping, in Proc. 5th Conference on Neural Networks and Soft Computing, Zakopane, Poland, June 5-10, 2000, pp [11] C. Shao and H.. Huang, Improvement of data visualization based on SOM, in Proc. International Symposium on Neural Networks, Dalian, China, August 19-21, 2004, pp [12] J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, vol. 290, no. 22, pp , [13] V. de Silva and J. B. Tenenbaum, Unsupervised learning of curved manifolds, in Proc. MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, California, March 19-29, 2001, pp [14] M. Vlachos, C. Domeniconi, D. Gunopulos, G. ollios, and N. oudas, Non-linear dimensionality reduction techniques for classification and visualization, in Proc. 8th ACM SIGDD International Conference on nowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23-26, 2002, pp [15] S. T. Roweis and L.. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, no. 22, pp , [16] A. Hadid and M. PietikÄainen, Efficient locally linear embeddings of imperfect manifolds, in Proc. 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 03), Leipzig, Germany, July 5-7, 2003, pp [17] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, vol. 15, no. 6, pp , [18] M. Balasubramanian, E. L. Schwartz, J. B. Tenenbaum, V. de Silva, and J. C. Langford, The Isomap algorithm and topological stability, Science, vol. 295, no. 5552, pp. 7a-7, [19] O. ouropteva, O. Okun, and M. PietikÄainen, Selection of the optimal parameter value for the locally linear embedding algorithm, in Proc. 1st International Conference on Fuzzy Systems and nowledge Discovery (FSD 02), Singapore, November 18-22, 2002, pp [20] M. Bernstein, V. de Silva, J. C. Langford, and J. B. Tenenbaum, Graph approximations to geodesics on embedded manifolds, Technical Report, Department of Psychology, Stanford University, [21] J. A. Lee, A. Lendasse, and M. Verleysen, Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis, Neurocomputing, vol. 57, pp , March [22] O. ouropteva, O. Okun, A. Hadid, M. Soriano, S. Marcos, and M. Pietikäinen, Beyond locally linear embedding algorithm, Technical Report MVG , University of Oulu, Machine Vision Group, Information Processing Laboratory, [23] Y. Li, -edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection, in Proc. 17th International Conference on Pattern Recognition (ICPR 04), Cambridge, U, August 23-26, 2004, pp [24] Y. Qian and. Zhang, Discovering spatial patterns accurately with effective noise removal, in Proc. 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMD 04), Paris, France, June 13, 2004, pp Li Jing, born in Henan Province, China on February 20, She received her PH.D. degree in computer software theory from Information Engineering University, Zhengzhou, China in Now she is an ASSOCIATE PROFESSOR of School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, Henan Province, China. His research fields include artificial neural network,, data mining and image process, etc. Dr. Jing is also a member of China Computer Federation. Chao Shao, born in Henan Province, China on December 12, He received his PH.D. degree in computer application technology from Beijing Jiaotong University, Beijing, China in He is currently an ASSOCIATE PROFESSOR of School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, Henan Province, China. His research fields include artificial neural network, machine learning, data mining and data visualization, etc. Dr. Shao is also a member of China Computer Federation.
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
So which is the best?
Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which
Visualization of Topology Representing Networks
Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
Visualization of Large Font Databases
Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden [email protected], [email protected]
HDDVis: An Interactive Tool for High Dimensional Data Visualization
HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia [email protected] ABSTRACT Current high dimensional data visualization
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
INTERACTIVE DATA EXPLORATION USING MDS MAPPING
INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Visualization of General Defined Space Data
International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
Solving Geometric Problems with the Rotating Calipers *
Solving Geometric Problems with the Rotating Calipers * Godfried Toussaint School of Computer Science McGill University Montreal, Quebec, Canada ABSTRACT Shamos [1] recently showed that the diameter of
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Dimensionality Reduction - Nonlinear Methods
Chapter 3 Dimensionality Reduction - Nonlinear Methods This chapter covers various methods for nonlinear dimensionality reduction, where the nonlinear aspect refers to the mapping between the highdimensional
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Online data visualization using the neural gas network
Online data visualization using the neural gas network Pablo A. Estévez, Cristián J. Figueroa Department of Electrical Engineering, University of Chile, Casilla 412-3, Santiago, Chile Abstract A high-quality
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
1 Spectral Methods for Dimensionality
1 Spectral Methods for Dimensionality Reduction Lawrence K. Saul Kilian Q. Weinberger Fei Sha Jihun Ham Daniel D. Lee How can we search for low dimensional structure in high dimensional data? If the data
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU
1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known
Project Participants
Annual Report for Period:10/2006-09/2007 Submitted on: 08/15/2007 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
ADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard ([email protected]) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
Accurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Comparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]
Going Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Visualizing Data using t-sne
Journal of Machine Learning Research 1 (2008) 1-48 Submitted 4/00; Published 10/00 Visualizing Data using t-sne Laurens van der Maaten MICC-IKAT Maastricht University P.O. Box 616, 6200 MD Maastricht,
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Virtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data Jorge M. L. Gorricha and Victor J. A. S. Lobo CINAV-Naval Research Center, Portuguese Naval Academy,
CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *
CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA [email protected] Efficient partitioning
Visualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: [email protected]) Ludovic Lebart Abstract. The Kohonen self organizing
Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University
Visualization of Multivariate Data Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University Introduction Multivariate (Multidimensional) Visualization Visualization
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 23, NO. 3, AUGUST 2008 877
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 23, NO. 3, AUGUST 2008 877 Electricity Price Curve Modeling and Forecasting by Manifold Learning Jie Chen, Student Member, IEEE, Shi-Jie Deng, Senior Member, IEEE,
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Data visualization using nonlinear dimensionality reduction techniques: method review and quality assessment John A. Lee Michel Verleysen
Data visualization using nonlinear dimensionality reduction techniques: method review and quality assessment John A. Lee Michel Verleysen Machine Learning Group, Université catholique de Louvain Louvain-la-Neuve,
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set D.Napoleon Assistant Professor Department of Computer Science Bharathiar University Coimbatore
ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
Visualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
TIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland [email protected] Course Description Content
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Class-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
