A New Approach for Clustering of X-ray Images

Similar documents
Content-based image retrieval in medical applications for picture archiving and communication systems

Colour Image Segmentation Technique for Screen Printing

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Norbert Schuff Professor of Radiology VA Medical Center and UCSF

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

The Scientific Data Mining Process

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering UE 141 Spring 2013

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

EHR CURATION FOR MEDICAL MINING

Chapter 7. Cluster Analysis

RUN-LENGTH ENCODING FOR VOLUMETRIC TEXTURE

A comparison of various clustering methods and algorithms in data mining

Chapter ML:XI (continued)

Environmental Remote Sensing GEOG 2021

Probabilistic Latent Semantic Analysis (plsa)

Local features and matching. Image classification & object localization

Data, Measurements, Features

Using Data Mining for Mobile Communication Clustering and Characterization

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

A Hierarchical SVG Image Abstraction Layer for Medical Imaging

A review of content-based image retrieval systems in medical applications clinical benefits and future directions

Comparison of K-means and Backpropagation Data Mining Algorithms

Movie Classification Using k-means and Hierarchical Clustering

Image Segmentation and Registration

CONTENT-BASED IMAGE RETRIEVAL FOR ASSET MANAGEMENT BASED ON WEIGHTED FEATURE AND K-MEANS CLUSTERING

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Standardization and Its Effects on K-Means Clustering Algorithm

Machine Learning using MapReduce

Efficient Attendance Management: A Face Recognition Approach

Unsupervised learning: Clustering

How To Cluster

Social Media Mining. Data Mining Essentials

Statistical Databases and Registers with some datamining

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Exploration and Visualization of Post-Market Data

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Cluster Analysis: Advanced Concepts

Hierarchical Cluster Analysis Some Basics and Algorithms

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

A Review of Content Based Image Retrieval Systems in Medical Applications Clinical Benefits and Future Directions

Euler Vector: A Combinatorial Signature for Gray-Tone Images

Human behavior analysis from videos using optical flow

COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Natural Language Querying for Content Based Image Retrieval System

Segmentation & Clustering

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Information Retrieval and Web Search Engines

Image analysis. Daniel Smutek

International Journal of Advance Research in Computer Science and Management Studies

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

ScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Integrating Images to Patient Electronic Medical Records through Content-based Retrieval Techniques

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2

SIGNATURE VERIFICATION

Footwear Print Retrieval System for Real Crime Scene Marks

SPATIAL DATA CLASSIFICATION AND DATA MINING

Music Mood Classification

A new similarity measure for image segmentation

Unsupervised Data Mining (Clustering)

SYMMETRIC EIGENFACES MILI I. SHAH

Visualization by Linear Projections as Information Retrieval

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

Clustering & Visualization

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Object Recognition. Selim Aksoy. Bilkent University

The Data Mining Process

Machine Learning for Data Science (CS4786) Lecture 1

Super-resolution method based on edge feature for high resolution imaging

Image Compression and Decompression using Adaptive Interpolation

The Big Data mining to improve medical diagnostics quality

Supervised and unsupervised learning - 1

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

An Overview of Knowledge Discovery Database and Data mining Techniques

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

CHAPTER 2 LITERATURE REVIEW

Transcription:

ISSN (Print): 1694-0814 22 A New Approach for Clustering of X-ray Images Chhanda Ray 1, Krishnendu Sasmal 2 1 Department of Computer Application RCC Institute of Information Technology Kolkata, West Bengal INDIA 2 Department of Computer Application RCC Institute of Information Technology Kolkata, West Bengal INDIA ABSTRACT Medical image databases are a key component in future diagnosis and preventive medicine. Automatic clustering of medical images plays an important role for structuring of a given medical database as well as for searching and retrieval of biomedical images. This paper introduces a new approach for efficient clustering of x-ray images based on various levels of image features. Initially, for each given x-ray image, features have been extracted from three different levels, namely, global, local and pixel. Finally, a new approach, a combination of k-means and hierarchical clustering techniques, has been applied in order to cluster x-ray images. The experimental results are also presented at the end of this paper. KEYWORDS: Biomedical Images, Image Clustering, Image Databases, X-ray Images. 1. Introduction Medical image databases are a key component in future diagnosis and preventive medicine. In last several years, there is an emerging trend towards the digitization of medical images and the formation of adequate archives. In a statistics, it has been stated that the Radiology Department of the University Hospital of Geneva alone produces more than 12,000 new x-ray images per day in 2002 [11]. In order to extract the relevant information from this huge amount of x-ray images, it is necessary to organize and store them properly. Moreover, these x-ray images must be archived in the patient s personal file in such a way that allows easy access when needed. Further, with the growing size of medical images, there is a need for efficient tools that can analyze medical images content and represent it in a way that can be efficiently searched and compared. One of the main goals of medical information systems is to deliver the needed information at the right time at right place in order to improve the quality and efficiency of care processes [14]. In conventional databases, textual searching can be done using the DICOM header, but information contained in x-ray images differs considerably from that residing in alphanumerical format. A single x-ray image may contain a large number of regions of interest, each of which may be the focus of attention for the medical expert, per diagnosis task. However, automatic categorization processes are required for fast archiving of secondary digitized x-ray images that were acquired by flim-based modalities or non-dicom digital devices. Medical image content representation and retrieval is playing an increasing role in a wide spectrum of applications within the clinical process [11]. For the clinical decision-making it can be useful to refer x-ray images of the same modality or the same anatomic region for the identification of certain pathologies. In this context, Content Based Image Retrieval (CBIR) has emerged as a powerful tool to efficiently retrieve x-ray images visually similar to a query image during last several years. In CBIR, the main idea is to represent each image as a feature vector and to measure the similarity between images with distance between their corresponding feature vectors according to some metric. Finding the correct features to represent images, as well as the similarity metric depend on the image domain and the goal of the retrieval system. In [1], a new technique for image representation, blobworld, has been introduced that provides a transformation from the raw pixel data to a small set of image regions, which are coherent in color and texture space. A retrieval system based on this representation has also been presented in [1]. Computer vision techniques for content-based image retrieval from

ISSN (Print): 1694-0814 23 large medical databases have been discussed in [6]. However, current medical CBIR systems often propose solutions that are limited to images with specific organ and modality or diagnostic study, and are usually not directly transferable to other medical applications. In this context, automated clustering of medical images is a key solution for structuring of a given medical database as well as for searching of a medical image. In recent times, clustering and classification of x-ray images is the focus of attention of many researchers [2, 3, 4, 5, 7, 8, 10, 13]. [2] deals with an approach for edge detection and classification of x-ray images that can be applied to interventional 3D vertebra shape reconstruction. An approach for unsupervised image-set clustering using an information theoretic framework has been proposed in [3]. In [3], each image or image set is represented as a Gaussian mixture distribution (GMM) and images are compared and matched via a probabilistic measure of similarity between distributions known as the Kullback-Leibler (KL) distance. Using the same GMM-KL framework, a technique for medical image categorization and retrieval for Picture- Archiving and Communication Systems (PACS) has been described in [5]. [4] focuses on a general framework for image categorization, classification and retrieval for medical image archives. An algorithm for image clustering and compression is focused in [7]. Medical image classification that involves the automatic classification of x-ray images in 57 predefined classes with large intra-class variability has been introduced in [8]. In [8], a generic method for image classification based on ensemble of decision trees and random subwindows is represented. A new image classification method by using multi-level image features and state-of-the-art machine learning method, Support Vector Machine (SVM) is focused in [10]. To support the automated classification of medical images, a tree-based detailed code is developed in [13]. The objective of this work is to design and implementation of a new efficient and simplified clustering technique for x- ray images. The approach presented in this paper is different from other approaches in few aspects, image segmentation, image representation & feature extraction, and clustering is done by applying a new approach which is a combination of k-means and hierarchical clustering techniques. The organization of the paper is as follows. Section 2 represents an image clustering framework for automatic categorization of x-ray images. In Section 3, the details of feature extraction are provided while the clustering technique of x- ray images has been introduced in Section 4. Experimental results are shown in Section 5 and Section 6 provides a conclusion to the work. 2. Image Clustering Framework Clustering of x-ray images (radiographs) is a non-trivial task, due to the complex nature of the information in images. X-ray images of the same class share a strong visual similarity. Moreover, there is a great variation within a class, caused by different doses of x-ray, varying orientation, alignment and pathology. In addition to content variation, the quality of the x-ray images may vary considerably. In this paper, the image clustering framework is consisting of two different phases, namely, image feature extraction, and image clustering as shown in the following figure. Testing Images Feature Extraction Pixel Level Local Level Global Level Figure 1: Image Clustering Framework Feature extraction phase deals with the extraction of features from given set of x-ray images while in image clustering phase, clusters of x-ray images are generated based on image features. The detailed descriptions of these phases are described in the following Sections. 3. Image Feature Extraction Combined Features Clustering Clustering Result The accuracy of image classification mainly depends on image feature extraction. More discriminated features lead to better clustering result. In general, there are three different levels of feature extraction, global, local and pixel. The global feature can be extracted to represent the whole image in an average fashion. Local features are extracted from small sub-images that are generated by partitioning the original image into a number of segments. The simplest image features can be directly extracted based on the pixel values of the given image.

ISSN (Print): 1694-0814 24 In case of general digital x-ray images, there are sharp variations of features in terms of color, texture and shape. The x-ray images are characterized with contrast variation and non-uniform intensity background, weak signal-to-noise ratio, digitized x-ray projections noise, and high frequency noise. Moreover, in many images there are cloths, jewels, artificial-implants and medical instruments. In this paper, for each x-ray image, features have been extracted from three different levels, namely, global, local and pixel level and are combined into a large feature vector for image clustering. The low level features extracted from x-ray images at global level and local patches are texture and shape. Since we are only concerned about x-ray images, color features are not considered in this work. For the sake of simplicity, hence all given x-ray images are scaled down to (100 * 100) pixels before feature extraction. global level. An example of chest x-ray image is represented below. 3.1 Global Level Feature Selection Texture feature contains important information regarding underlying structural arrangement of surfaces of an image. One of the well-known approaches for describing texture of an image is by using Gray Level Co-occurrence Matrix (GLCM) originally introduced by Haralick et al. [1973]. The GLCM is a tabulation of how often different combinations of pixel brightness values (gray levels) occur in an image. GLCM texture considers the spatial relationship between two pixels at a time, namely, the reference pixel and the neighbor pixel. A different co-occurrence matrix exists for each spatial relationship such as above, next to, diagonal, south, east, and west. However, for texture calculations the GLCM is need to be symmetric. In this paper, the co-occurrence matrix is constructed by getting information about the orientation and distance between the pixels. Hence, at global level, for each given x- ray image eight gray-level co-occurrence matrix is constructed for eight different orientations; 0 0, 45 0, 90 0, 135 0, -45 0, -90 0, -135 0, and -180 0. The texture measures that are computed from each of the eight GLCM for a given x- ray image includes Contrast, Energy, Homogeneity, Entropy, Mean, Variance, and Correlation. Initially, for each given x-ray image, a co-occurrence matrix for degree zero is created and required seven texture features are extracted at global level. Therefore, for each given x-ray image total 56 texture features are extracted at global level. Shape provides geometrical information of an object in image, which do not change even when the location, scale and orientation of the object are changed. In this work, Canny edge operator [Canny, 1986] is used to generate edge histograms. Hence, for each x-ray image the shape information is extracted after every ten degrees by canny edge detector and thus resulting into 37shape features at Figure 2: An Example of Chest X-ray Image Hence, at global level 56 texture features and 37 shape features is extracted for each x-ray image. 3.2 Local Level Feature Selection At local level, each input x-ray image is partitioned into four non-overlapping patches for feature extraction. The partition of chest x-ray image of figure 2 is shown in the following figure. (a) Part 1 (b) Part 2 (c) Part 3 (d) Part 4 Figure 3: Feature Selection at Local Level

ISSN (Print): 1694-0814 25 At local level, for each of the four non-overlapping image segment, 56 texture features and 37 shape features is computed in the same way as global level and thus, total 372 features is obtained. 3.3 Pixel Level Feature Selection At pixel level, images are resizing into (10 * 10) pixels in order to obtained pixel information directly as shown in the figure below. After resizing the image, a feature set of size 100 is obtained at pixel level. Hierarchical clustering technique creates a hierarchical decomposition of the given set of data objects. A hierarchical method can be either agglomerative or divisive. In case of agglomerative approach, clustering starts with each object in a separate group and successively merges the objects or groups that are close to one another until all the groups are merged into one or a termination condition holds. On the other hand, the divisive approach starts with all the objects in the same cluster and successively splits into smaller clusters until eventually each object is in one cluster or a termination condition holds. However, the hierarchical method suffers from the fact that once a step is merged or split, it can never be undone. In order to produce good clustering result, a combination of k-means and hierarchical clustering techniques has been applied in this work for image clustering. 5. Experimental Result and Analysis Figure 4: Feature Selection at Pixel Level In this work, the total dimensionality of feature vector for each given x-ray image is 565 among which 93 features has been extracted at global level, 372 features has been computed at local level, and 100 features has been extracted at pixel level. Finally, the combined image features from three different levels for each x-ray image are stored into a big feature vector and clustering is performed. In order to produce good clustering result no dimensionality reduction techniques like PCA has been applied here which may lead to loss of some information. However, more memory is required in order to store a large number of features. 4. Image Clustering The k-means clustering technique is straightforward in concept, but yields good clustering accuracy. The k-means algorithm takes the input parameter, k, and partitions a set of n objects into k clusters so that the resulting intracluster similarity is high but the intercluster similarity is low. The cluster similarity is measured in terms of the mean value of the objects in a cluster known as cluster centroid or center of gravity. For a given unknown objects, k-means technique randomly selects k of the objects, each of which initially represents a cluster mean or center. For each of the remaining objects, an object is assigned to the cluster to which it is closest. Hence, closeness is defined in terms of a distance metric between the object and the cluster mean, namely, Euclidean distance. It then computes the new mean for each cluster and the process repeated until the criterion function converges. In this paper, experiment has been made on a image set of 150 x-ray images consists of five different classes of x-ray images such as knee, hand, skull, backbone, and chest. In order to produce final clustering result, initially clustering has been done by using k-means method on large feature vector and then hierarchical technique is applied on k-means clustering result. In this context, it has been identified from experimental results that image clustering performance degraded if hierarchical clustering method is applied before k-means technique on feature set. The experimental results presented in the following table indicate the percentage of clustering accuracy for different clustering techniques: k-means clustering Hierarchical clustering K-means + Hierarchical Hand Skull Chest Backbone Knee 67% 83% 85% 92% 80% 42% 80% 65% 45% 51% 72% 89% 87% 92% 81% Table 1: Experimental Results 6. Conclusion & Future Work This paper presents a new method directed towards the automatic clustering of x-ray images. The clustering has been done based on multi-level feature of given x-ray images such as global level, local level and pixel level. A new efficient approach, a combination of k-means and hierarchical clustering techniques has been applied for clustering of x-ray images which produces a very high level of accuracy. However, since the dimensionality of feature

ISSN (Print): 1694-0814 26 space is huge and no dimensionality reduction is used, space complexity increases. The clustering technique proposed in this work can be efficiently used in a number of applications such as face recognization, thumb impression identification, bacterial film identification and so on. Moreover, the same clustering technique can be applied for automatic clustering of biomedical images such as CT-scan, MRI, PET, and X-ray. In future scope, other clustering techniques can be applied in order to identify the best suitable clustering technique for x- ray images in terms of more improved clustering results and complexity. Acknowledgments This work is supported by grants from BRNS Health-Grid Project, Department of Atomic Energy, Govt. of India. References [1] S. Belongie, C. Carson, H. Greenspan, J. Malik, Color- and Texture-Based Image Segmentation Using EM and Its Application to Content-Based Image Retrieval, Proceedings of the 6 th International Conference on Computer Vision (ICCV 98 ), Bombay, INDIA, January 1998, pp 675-682. [2] A. Bilgot, O. Le Cadet, V. Perrier, L. Desbat, Edge Detection and Classification in X-ray Image: Applications to Interventional 3D Vertebra Shape Reconstruction, http://ljk.imag.fr/membres/valerie.perrier/publi/surgetica_ bilgot.pdf. [3] J. Goldberger, S. Gordon, H. Greenspan, Unsupervised Image-Set Clustering Using an Information Theoretic Framework, IEEE Transactions on Image Processing, Vol. 15, No. 2, February 2006, pp 449-458. [4] J. Goldberger, H. Greenspan, J. Dreyfuss, An Optimal Reduced Representation of a MoG with Applications to Medical Image Database Classification, IEEE Transactions on Image Processing, Vol. 15, No. 2, February 2006, pp 449-458. [5] H. Greenspan, Adi T. Pinhas, Medical Image Categorization and Retrieval For PACS Using the GMM-KL Framework, IEEE Transactions on Information Technology in BioMedicine, 2005, pp 1-15, TITB-00012-2005. [6] A. Kak, C. Pavlopoulou, Computer Vision Techniques for Content-Based Image Retrieval from Large Medical Databases, Proceedings of IAPR Workshop on Machine Vision Applications, The University of Tokyo, Japan, November 2000, pp 395-404. [7] M. Kaya, An Algorithm for Image Clustering and Compression, Turk Journal on Electrical Engineering, Vol. 13, No. 1, 2005, pp 79-91. [8] R. Maree, P. Geurts, J. Piater, L. Wehenkel, Biomedical Image Classification with Random Subwindows and Decision Trees, Proceedings of International Workshop on Computer Vision for Biomedical Image Applications: Current Techniques and Future Trends (ICCV 2005), Spriner-Verlag. [9] K. Meena, M. Durairaj, K.R. Subramanian, IRNNS: A Hybrid Prediction System for Medical Database, National Journal of System and Information Technology, Vol. 1, No. 2, December 2008, pp 57-67, ISSN: 0974-3308. [10] A. Mueen, M. sapiyan Baba, R. Zainuddin, Multilevel Feature Extraction and X-ray Image Classification, Journal of Applied Sciences, Vol. 7, No. 8, 2007, pp 1224-1229, ISSN 1812-5654. [11] H. Muller, N. Michoux, D. Bandon, A. Geissbuhler, A review of content-based image retrieval systems in medical application- clinical benefits and future directions, International Journal of Medical Informatics, vol.73, No. 1, February 2004, pp 1-23. [12] M. Ramachandra, V. Jain, N. Udupa, Learning Algorithms for Enterprise Data Mining, National Journal of System and Information Technology, Vol. 1, No. 2, December 2008, pp 68-78, ISSN: 0974-3308. [13] B. B. Wein, T. Lehmann, D. Keysers, H. Schubert, M. Kohnen, Detailed Image Classification Code for Image Retrieval of medical Images (IRMA), Proceedings of Computer-Assisted Radiology and Surgery(CARS 2002), H.U. Lemke, M.W. Vannier, K. Inamura, A.G. Farman, K.Doi & J.H.C. Reiber (Editors). [14] A. Winter, R. Haux, A three-level graph-based model for the management of hospital information systems, Journal of Methods of Information in Medicine, vol. 34, No. 4, September 1995, pp 378-396.