A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
|
|
|
- Brittany Harrington
- 10 years ago
- Views:
Transcription
1 A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods based on kernel density estimation have been successfully applied for various data mining techniques. Their natural interpretation together with consistency properties make them an attractive tool in clustering problems. In this paper, the complete gradient clustering algorithm, based on the density of the data, is presented. The proposed method has been applied to a real data set of grains and compared with K-means clustering algorithm. The wheat varieties, Kama, Rosa and Canadian, characterized by measurements of main grain geometric features obtained by X-ray technique, have been analyzed. Results indicate that the proposed method is expected to be an effective method for recognizing wheat varieties. Moreover, it outperforms the K-means analysis if the nature of the grouping structures among the data is unknown before processing. 1 Introduction Clustering is a major technique for data mining, used mostly as an unsupervised learning method. The main aim of cluster analysis is to partition a given population into groups or clusters with common characteristics, since similar objects are grouped together, while dissimilar objects belong to different clusters. As a result, a new set of categories of interest, characterizing the population, is discovered. The clustering methods are divided into six groups: hierarchical, partitioning, densitybased, grid-based, and soft-computing methods [5]. These numerous concepts of M. Charytanowicz, and J. Niewczas Institute of Mathematics and Computer Science, The John Paul II Catholic University of Lublin, Konstantynów 1 H, PL Lublin {mchmat,jniewczas}@kul.lublin.pl P.A. Kowalski, P. Kulczycki, S. Łukasik, and S. Żak System Research Institute, Polish Academy of Sciences, Newelska 6, PL Warsaw {pakowal,kulczycki,slukasik,slzak}@ibspan.waw.pl 1
2 2 M. Charytanowicz et al. clustering are implied by different techniques of determination of the similarity and dissimilarity between two objects. A classical partitioning K-means algorithm is concentrated on measuring and comparing the distances among objects. It is computationally attractive and easy to interpret and implement in comparison to other methods. On the other hand, the number of clusters is assumed by user in advance and therefore the nature of the obtained groups may be unreliable for the nature of the data, usually unknown before processing. The rigidity of arbitrary assumptions concerning the number or shape of clusters among data can be overcome by density-based methods that let the data detect inherent data structures. In this paper, a complete gradient clustering algorithm is presented. The main idea of this algorithm assumes that each cluster is identified by humps of the kernel density estimator of the data. The number of clusters is determined by number of local maxima of the estimator. Each data point is moved in an ascending gradient direction to accomplish the clustering. The procedure does not need any assumptions concerning the data and may be applied to a wide range of topics and areas of cluster analysis [2, 4]. The algorithm can be also used for outlier detection. Moreover, an appropriate change in values of kernel estimator parameters allows elimination of clusters in sparse areas [4]. The main purpose of this work is to present the application of the complete gradient clustering algorithm to a real data set of grains, characterized by their geometric features. A comparison between the results obtained from the proposed algorithm and the K-means algorithm is reported. 2 The Complete Gradient Clustering Algorithm In this study, a complete gradient clustering algorithm, for short the CGCA, based on kernel density estimation is presented. The principle of the proposed algorithm is based on the density of the data, since the implementation of the CGCA needs to estimate the probability density function. Each cluster is characterized by local mode or maxima of the kernel density estimator. Regions of high densities of objects are recognized as clusters, while areas with rare distributions of objects divide one group from another. The local maxima are searched by using an ascending gradient method. The algorithm works in an iterative manner until a termination criterion has been satisfied. 2.1 Kernel Density Estimation Suppose that x 1, x 2,..., x m is a random sample of m independent points in n- dimensional space from an unknown distribution with probability density function f. The kernel estimator of f can be defined as:
3 A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images 3 ˆf (x) = 1 m ( ) x xi mh n K, i=1 h (1) where the positive coefficient h is called the smoothing parameter, while the measurable function K : R n [0, ) of unit integral Rn K(x)dx = 1, unimodal and symmetrical with respect to zero, takes the name of a kernel [6]. It is generally accepted, that the choice of the kernel K is not as important as the choice of the coefficient h and thank to this, it is possible to take into account the primarily properties of the estimator obtained. Most often the standard normal kernel given by a formula: K(x) = 1 xtx e 2 (2) n/2 2π is used. The practical implementation of the kernel density estimators requires a proper choice of the bandwidth h. The best value of h is taken as the value that minimizes the mean integrated square error MISE( ˆf ) = E ( ˆf (x) f (x)) 2 dx. (3) A frequently used bandwidth selection technique is based on the approach of least-squares cross validation [6]. The value of h is chosen to minimize the function M : (0, ) R given by the rule: M(h) = 1 m 2 h n m i=1 m j=1 ( ) x j x i K + 2 K(0), (4) h mhn where K(x) = K 2 (x) 2K(x) and K 2 is the convolution square of the function K; for the standard normal kernel (2): K 2 1 xtx (x) = e 4. (5) n/2 (4π). Additional procedures improving the quality of the estimator obtained are found in [6]. Practical applications are presented in the publication [3]. For further purposes, the kernel density estimator with the standard normal kernel (2) is used. 2.2 Procedures of the CGCA Consider the data set containing m samples x 1, x 2,..., x m in n-dimensional space. Using the methodology introduced in section 2.1, the kernel density estimator ˆf may be constructed. The idea of the CGCA is based on the approach proposed by Fukunaga and Hostetler [1]. Thus given the start points:
4 4 M. Charytanowicz et al. x 0 j = x j for j = 1,2,...,m, (6) each point is moved in an uphill gradient direction using the following iterative formula: x k+1 j = x k j + b ˆf (x k j ) ˆf (x k for j = 1,2,...,m, and k = 0,1,..., (7) j ) where ˆf denotes the gradient of kernel estimator and parameter b = h 2 /(n + 2) [1, 4]. To complete the algorithm the following two aspects need to be specified: a termination criterion of the algorithm and procedure of creating clusters. The algorithm will be stopped when the following condition is fulfilled: D k D k 1 αd 0, (8) where D 0 and D k 1, D k denote sums of distances between particular elements of set x 1, x 2,..., x m before starting the algorithm as well as after the (k 1)-th and k-th step, respectively. The positive parameter α is taken arbitrary and the value is recommended. This k-th step is the last one and will be denoted by k. Finally, after the k -th step of algorithm (6)-(7) the set: x k 1,xk 2,...,xk m, (9) considered as the new representation of all samples x 1, x 2,..., x m, is obtained. Following this, the set of mutual distances of the above elements: { } d(xi k,x k j ) (10) i=1,2,...,m 1 j=i+1,i+2,...,m is defined. Using the methodology presented in section 2.1, the auxiliary kernel estimator ˆf d of the elements of set (10), treated as a sample of a one-dimensional random variable, is created. Next, the first (i.e. for the smallest value of an argument) local minimum of the function ˆf d belonging to the interval (0,D), where D means the maximum value of the set (10), is found. This local minimum will be denoted as x d, and it can be interpreted as half the distance between centers of potential clusters lying closest together. Finally, the clusters will be created. First, the element of set (9) will be taken; it initially create a one-element cluster containing it. An element of set (9) is added to the cluster if the distance between it and any element belonging to the cluster is less than x d. Of course, this element is removed from the set (9). If there are no more elements belonging to the cluster, the new cluster is created. The procedure of assigning elements to clusters is repeated as long as the set (9) is not empty. Additional information on the CGCA procedures, especially analysis of influence of the values of kernel estimator parameters on results obtained, is described in details in [4].
5 A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images 5 3 Materials and methods The proposed algorithm has been tested on a real data set of grains. Studies were conducted using combine harvested wheat grain originating from experimental fields, explored at the Institute of Agrophysics of the Polish Academy of Sciences in Lublin. The examined group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment. Visualization of the internal kernel structure was detected using a soft X-ray technique. The images were recorded on 13x18 cm X-ray KODAK plates. Figure 1 presents the X-ray images of these kernels. Fig. 1 X-ray photograms (18x13cm) of kernels The X-ray photograms were scanned using the Epson Perfection V700 table photo-scanner with a built-in transparency adapter, 600 dpi resolution and 8 bit gray scale levels. Analysis procedures of obtained bitmap graphics files were based on the computer software package GRAINS, specially developed for X-ray diagnostic of wheat kernels [7]. To construct the data, seven geometric parameters of grains: area A [mm 2 ], perimeter P [m], compactness C = 4πA/P 2, length of kernel [mm], width of kernel [mm], asymmetry coefficient and length of kernel groove [mm] were measured from a total of 210 samples (see Figure 2). In our study, the data was reduced to be two-dimensional after applying the principal component analysis to validate the results visually. All calculations were carried out on an IBM/PC compatible computer with the Windows XP operation system, and all programs were developed in Borland C++ Builder Environment.
6 6 M. Charytanowicz et al. Fig. 2 Document window with geometric parameters of a kernel and statistical parameters of its image 4 Results and discussion The dataset consists of 210 kernels belonging to three wheat varieties, 70 elements each, characterized by 7 geometric features, all of these real-valued continuous. The data s projection on the axes of the two greatest principal components is presented in Figure 3 with wheat varieties being distincted symbolically. Fig. 3 Wheat varieties data set on the axes of the two greatest principal components: ( ) the Rosa wheat variety, ( ) the Kama wheat variety, ( ) the Canadian wheat variety
7 A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images 7 All samples are labeled by numbers: 1-70 for the Kama wheat variety, for the Rosa wheat variety, and for the Canadian wheat variety. The CGCA created three clusters corresponding to Rosa, Kama, and Canadian varieties, containing 69, 65, and 76 grains respectively. Thus the samples 9, 38, which belong to the Kama wheat variety are incorrectly grouped into the cluster associated with the Rosa wheat variety. What is more, the samples 125, 136, 139, which belong to the Rosa wheat variety, and the samples 166, 200, 202, which belong to the Canadian wheat variety are wrongly classified into the cluster associated with the Kama wheat variety. In addition, the samples 20, 27, 28, 30, 40, 60, 61, 64, 70, which belong to the Kama wheat variety are wrongly classified into the cluster associated with the Canadian wheat variety. Clustering results, containing numbers of grains classified properly and numbers of grains classified wrongly into clusters associated with Rosa, Kama, and Canadian varieties, are shown in Table 1. Table 1 Clustering results of the Complete Gradient Clustering Algorithm and K-Means Algorithm for the wheat varieties data set Complete Gradient Clustering Algorithm K-Means Clustering Algorithm Clusters Correct Incorrect Total Correct Incorrect Total Rosa Kama Canadian According to the results of the CGCA, out of 70 kernels of the Rosa wheat variety, 67 was classified properly. Only 2 of the Kama variety were classified wrongly as the Rosa variety. These frequencies are equal 66 and 2 respectively if K-means algorithm is used. The Rosa variety is best recognized using both techniques. For the other two varieties, the CGCA created clusters containing 65 elements (Kama) and 76 elements (Canadian). In regard to the Kama variety, 59 kernels were classified correctly, while 6 of the other varieties were incorrectable identified as the Kama variety. For the Canadian variety, 67 kernels were correctly identified and 9 kernels of the Kama variety were wrongly identified as the Canadian variety. Results obtained using K-means algorithm were the following: the Kama variety: out of 77 kernels, 65 were correctly identified and 12 incorrectly identified; the Canadian variety: our of 65 kernels, 62 were correctly identified and 3 incorrectly identified. This implies, that Kama and Canadian varieties are similar with respect to geometric parameters. The percentages of correctness of both methods are summarized in Table 2. The proposed algorithm achieved accuracy of about 96% for the Rosa wheat variety, 84% for the Kama wheat variety, and 96% for the Canadian wheat variety. In comparison, K-means algorithm achieved accuracy of about 94%, 93%, and 89% respectively.
8 8 M. Charytanowicz et al. Table 2 Comparison of performance of the Complete Gradient Clustering Algorithm and K-Means Algorithm for the wheat varieties data set, correctness percentages are reported Complete Gradient Clustering Algorithm K-Means Clustering Algorithm Wheat Varieties Correctness % Correctness % Rosa Kama Canadian Conclusions The proposed clustering algorithm, based on kernel estimator methodology, is expected to be an effective technique for forming proper categories of wheat. It behaves equally or better than the classical K-means algorithm. Moreover, the data reduced after applying the principal component analysis, contained apparent clustering structures according to their classes. Both algorithms performed well and the percentages of correctness were comparable. The amount of 193 grains, giving almost 92% of the total, was classified properly. The wheat varieties used in the study showed differences in geometric parameters. The Rosa variety is better recognized, whilst Kama variety and Canadian variety are less successfully differentiated. Further research is needed on grain geometric parameters and these ability to identify wheat kernels. References 1. Fukunaga K, Hostetler L D (1975) The estimation of the gradient of a density function, with applications in Pattern Recognition. IEEE Transactions on Information Theory 21: Kowalski P, Łukasik S, Charytanowicz M, Kulczycki P (2008) Data-Driven Fuzzy Modeling and Control with Kernel Density Based Clustering Technique. Polish Journal of Environmental Studies 17: Kulczycki P (2008) Kernel estimators in industrial applications. In Prasad B (ed) Soft Computing Applications in Industry. Springer-Verlag, Berlin 4. Kulczycki P, Charytanowicz M (2010) A Complete gradient clustering algorithm formed with kernel estimators. Int. J. Appl. Comput. Sci., in press 5. Mirkin B (2005) Clustering for Data Mining: A Data Recovery Approach. Chapman and Hall/CRC, New York 6. Silverman B W (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, London 7. Strumiłło A, Niewczas J, Szczypiński P, Makowski P, Woźniak W (1999) Computer system for analysis of X-ray imges of wheat grains. Int. Agrophysics 13:
Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr Kulczycki, Piotr A. Kowalski, Szymon Łukasik, and Sławomir Żak Abstract Methods
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009
Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Parallel Computing of Kernel Density Estimates with MPI
Parallel Computing of Kernel Density Estimates with MPI Szymon Lukasik Department of Automatic Control, Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland [email protected]
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Descriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
CLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Segmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification
29 Ninth IEEE International Conference on Data Mining Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification Fei Wang 1,JimengSun 2,TaoLi 1, Nikos Anerousis
Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
University of Ostrava. Fuzzy Transforms
University of Ostrava Institute for Research and Applications of Fuzzy Modeling Fuzzy Transforms Irina Perfilieva Research report No. 58 2004 Submitted/to appear: Fuzzy Sets and Systems Supported by: Grant
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Cluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
EXPERIMENTAL ERROR AND DATA ANALYSIS
EXPERIMENTAL ERROR AND DATA ANALYSIS 1. INTRODUCTION: Laboratory experiments involve taking measurements of physical quantities. No measurement of any physical quantity is ever perfectly accurate, except
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS Cativa Tolosa, S. and Marajofsky, A. Comisión Nacional de Energía Atómica Abstract In the manufacturing control of Fuel
The Big Data mining to improve medical diagnostics quality
The Big Data mining to improve medical diagnostics quality Ilyasova N.Yu., Kupriyanov A.V. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. The
Cluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Comparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
Means, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India [email protected] 2 School of Computer
Cluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
E3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
Introduction to Clustering
Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi
Estimating the Average Value of a Function
Estimating the Average Value of a Function Problem: Determine the average value of the function f(x) over the interval [a, b]. Strategy: Choose sample points a = x 0 < x 1 < x 2 < < x n 1 < x n = b and
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Introduction to the Finite Element Method (FEM)
Introduction to the Finite Element Method (FEM) ecture First and Second Order One Dimensional Shape Functions Dr. J. Dean Discretisation Consider the temperature distribution along the one-dimensional
How To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz [email protected] 28 de outubro de 2013 Adriano Cruz [email protected] () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz [email protected]
Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel
Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel Institut für Statistik LMU München Sommersemester 2013 Outline 1 Setting the scene 2 Methods for fuzzy clustering 3 The assessment
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Visualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
Local outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
The influence of teacher support on national standardized student assessment.
The influence of teacher support on national standardized student assessment. A fuzzy clustering approach to improve the accuracy of Italian students data Claudio Quintano Rosalia Castellano Sergio Longobardi
Identifying erroneous data using outlier detection techniques
Identifying erroneous data using outlier detection techniques Wei Zhuang 1, Yunqing Zhang 2 and J. Fred Grassle 2 1 Department of Computer Science, Rutgers, the State University of New Jersey, Piscataway,
A Relevant Document Information Clustering Algorithm for Web Search Engine
A Relevant Document Information Clustering Algorithm for Web Search Engine Y.SureshBabu, K.Venkat Mutyalu, Y.A.Siva Prasad Abstract Search engines are the Hub of Information, The advances in computing
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
For example, estimate the population of the United States as 3 times 10⁸ and the
CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
A Survey of Clustering Techniques
A Survey of Clustering Techniques Pradeep Rai Asst. Prof., CSE Department, Kanpur Institute of Technology, Kanpur-0800 (India) Shubha Singh Asst. Prof., MCA Department, Kanpur Institute of Technology,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
