CHAPTER 5 SOFT CLUSTERING BASED MULTIPLE DICTIONARY BAG OF WORDS FOR IMAGE RETRIEVAL

Size: px
Start display at page:

Download "CHAPTER 5 SOFT CLUSTERING BASED MULTIPLE DICTIONARY BAG OF WORDS FOR IMAGE RETRIEVAL"

Transcription

1 84 CHAPTER 5 SOFT CLUSTERING BASED MULTIPLE DICTIONARY BAG OF WORDS FOR IMAGE RETRIEVAL Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more (Viola and Jones 2001, Ferrari et al 2010). Most of the current state of-the-art of image classification systems are based on the Bag-of-Words image representation (Csurka et al 2004). For classification of images, using this method a codebook of visual words is created using various clustering methods. For further increasing the performance MDBoW method that uses more visual words from different independent dictionaries instead of adding more words to the same dictionary was implemented using hard clustering method (Aly et al 2011). Nearest-neighbor assignments are used in the clustering of features. A given feature may be nearly the same distance from two cluster centers. For hard clustering method, only the slightly nearer neighbor is selected to represent that feature. Thus, the ambiguous features are not well represented by the visual vocabulary. To address this problem, soft clustering model based Multiple Dictionary Bag of Visual Words for image classification is used. 5.1 MDBoW WITH HARD CLUSTERING One of the most important and challenging problem in machine vision is retrieving images from a large and highly varied image data set

2 85 based on visual contents. MDBoW (Aly et al 2011) that uses more visual words has significantly increased the large scale classification of images. In this method, more words from different independent dictionaries are used instead of adding more words to the same dictionary. It can be seen that the storage grows linearly with the number of dictionaries used. In comparison with recent techniques for compact BoW (Jégou et al 2009, Perronnin et al 2010), there is an improved performance with comparable storage. It is seen that the performance increases considerably by building the dictionary using all available features. The two ways of implementing multiple dictionaries are and Separate methods. In single dictionary generation, which is the baseline method a single dictionary of visual words, is generated from the pool of features, which is used to generate the histogram for the image. In multiple dictionary generation, each dictionary D N is generated with a different subset of the image features. In Separate dictionary implementation, the image gets a histogram h N from every dictionary D N which is concatenated to form a single histogram h. Every feature gets N entries in the histogram h, one from every dictionary. In dictionary implementation, a single unified dictionary is built from the concatenation of visual words from the dictionaries 1 to N and the image get a single histogram h. Every feature gets only one entry in the histogram h. In this approach, more words are taken from different independent dictionaries where as in base line method, more words will be taken from same dictionary. Thus, multiple dictionary method has less storage than baseline approach Steps for Dictionary Generation 1. Generate N random possibly overlapping subsets of the image features {S n } 1 N. 2. Compute a dictionary D n independently for each subset S n. Each dictionary has a set of K n visual words.

3 86 3. Every image feature gets its visual word from every dictionary D n. Then concatenate all the dictionaries into a single dictionary. The final histogram is the histogram of the obtained single unified dictionary Steps for Separate Dictionary Generation 1. Generate N random possibly overlapping subsets of the image features {S n } 1 N 2. Compute a dictionary D n independently for each subset S n. Each dictionary has a set of K n visual words. 3. Compute the histogram. Every image feature gets its visual word from every dictionary D n. Accumulate these visual words as individual words into individual histograms h n for each dictionary. The final histogram is the concatenation of the individual histograms. Separate dictionary method is flexible and can be used with any kind of dictionary. The dictionaries formed by different clustering methods with varying sizes can be combined (Nister and Stewenius 2006). The drawback is that there is an increase in the storage requirements in the inverted file, as well as the time to generate the visual words since each feature in the image has N entries in the final histogram. dictionaries have the advantage of requiring less memory than separate dictionary. Separate dictionary storage increases linearly with the number of dictionaries, as the entries in the inverted file multiply. Separate dictionaries run time also grows linearly with the number of dictionaries as each feature has only one entry in the final histogram.

4 PROPOSED FUZZY BASED MDBoW Base Line Method In baseline method of Bag of Words model implemented in this research, features are extracted using Harris corner detector and SIFT descriptor is used for representing the extracted features. The extracted features of the image should be distinctive. Features should be easily detected under changes in pose and lighting. There should be many features per object. Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters. The advantages of SIFT features are locality, distinctiveness, efficiency and extensibility (Lowe 2004). After feature extraction, clustering of the features is done by Fuzzy C-Mean clustering. FCM (Bezdek 1981) is a data clustering technique in which a data set is grouped into clusters depending on the membership value. Fuzzy C-Means is suited to identify clusters of the same geometry or the same order that is the clusters should have homogeneous order. After clustering, a codebook with predefined number of visual words will be obtained. In training phase, the input vectors from the feature pool are assigned to one or more classes. The decision rule divides input space into decision regions separated by decision boundaries, and histogram is built up. In testing phase, for the test data point, the k closest points from training data is found and classification is done using KNN classifier. It works well for large number of data and the distance metric used is good. The distance function used is Euclidean distance Fuzzy C-Means Fuzzy C-Means is a data clustering technique in which a data set is grouped into clusters depending on the membership value. Fuzzy C-Means is

5 88 suited to identify clusters of the same geometry or the same order that is the clusters should have homogeneous order. Given the data set X= {x 1, x 2, x 3,,x N }, choose the number of clusters 1 < c < N, the weighting exponent m > 1, the termination tolerance > 0 and the norm-inducing matrix A. The Fuzzy C-Means clustering algorithm is based on the minimization of an objective function called C-means functional given by Equation (5.1). J(X, U, V) = D (5.1) D = x v (5.2) where v i is the cluster prototype or the cluster center, D ik corresponds to the distance of the k th sample point from the ith cluster center. The parameter µ ik shall be interpreted as, the value of the membership function of the i th fuzzy subset for the k th datum Steps for Fuzzy C-Means algorithm The following are the steps to be followed for implementation of the algorithm. Initialize the partition matrix randomly, such that U ( ) M. 1. Compute the cluster prototypes (means) v ( ) = ( ( ) ) ( ( ), c (5.3), ) for l = 1, 2, 3 Where v i is the cluster center calculated using the membership function. 2. Compute the distances: D = (x v ) A(x v ) c, 1 N (5.4)

6 89 where A = I for Euclidean Norm and is the distance matrix containing the square distances between data points and cluster centres. 3. Update the partition matrix: ( ), = ( ) (5.5) Until U ( ) U ( ) The result of the partition is collected in structure arrays. is the maximum termination tolerance and m is the fuzziness weighting exponent. Use of FCM algorithm requires determination of several parameters like c, m, the inner product norm and the matrix norm. In addition, the set U ( ) M of initial cluster centres must be defined Implementation of MDBoW Model using FCM In this thesis, Separate dictionary and dictionary concept has been implemented with Fuzzy C-Means algorithm. Fuzzy clustering is the process of assigning membership levels and then using these member ship levels data elements are assigned to one or more clusters. The advantage of soft clustering is that it is insensitive to noise. In many real situations, fuzzy clustering is more natural than hard clustering, as objects on the boundaries between several classes are not forced to belong to one of the classes. However, they are assigned membership degrees between 0 and 1 indicating their partial memberships. Features are extracted from the images using Harris corner detector and represented using SIFT descriptor. From the feature pool N subsets of features are taken randomly and N dictionaries are generated using Fuzzy C-Means algorithm. For each of the dictionary generated histograms are

7 90 generated for each image in the dataset and the final histogram is the concatenation of the individual histograms. This is done during the training phase of the algorithm. During the testing phase, features are extracted from each image and the same process as stated above generates histogram for the image. The KNN classifier then finds the k closest index and gives the classification result Experimental Results and Analysis for MDBoW Model using FCM Bag of words model for visual categorization of images has been implemented using Harris corner detector for extracting features and 128 dimensional SIFT for representing the extracted features. The features extracted are clustered using Fuzzy C-Means algorithm and a codebook is generated with each vector in it being a visual word that serves as the basis for indexing the images. Images are then represented as histogram counts of these visual words. KNN algorithm is used to classify images. The performance of Bag of Words depends on dictionary generation method, dictionary size, histogram weighting, normalization, and distance function. In the proposed method the performance of Multiple Dictionary Bag of Words model using Separate and dictionary is analysed. It is done by varying the word per dictionary and the number of dictionaries generated. Fuzzy C-Means soft clustering algorithm is used to generate dictionary. This work is based on the assumption that, fuzziness in the codebook creation step leads to more robust behaviour of the bag of visual words approach in terms of codebook size. The performance of the Multiple Dictionary Bag of Words model using Separate and dictionary is compared with base line method by varying the word per dictionary and also

8 91 by varying the number of individual dictionary generated by taking features randomly.the sample images from dataset are as shown in Figure 5.1. Figure 5.1 Sample images from dataset Analysis of baseline method and MDBoW using separate dictionary concept For the process under investigation, the best strategy for selecting m is experimental. For most data, 1.5 m 3.0 gives good results (Bezdek 1981). By analysis, for this experiment, the parameters set are m = 1.7 and stop condition = This is because these values gave distinct codewords with less number of iterations. The test data set includes eight different topics each containing 50 images. 200 images per concept are used during the training phase to build the codebooks. The classifier is trained for another 200 images from each topic. The number of dictionaries formed randomly is varied from 1 to 5 and the word per dictionary is varied from 80 to 200. In this analysis, distance measure used is Euclidean distance. Since dataset is used for real time application for visual recognition of objects for a humanoid used in restaurant, it is created from Google images. The images in the dataset used can be categorised as tiny images. Figures 5.2 to 5.6 show the variation of accuracy rate with words per dictionary.

9 92 2 DICTIONARY 1 ACCURACY RATE 1 baseline method MD separate 0.87 WORD PER DICTIONARY Figure 5.2 Accuracy vs. words per dictionary for dictionary1 The experiment is conducted by varying the number of dictionary generated randomly from the feature pool from 1 to 5. This is named as dictionary1, dictionary2, dictionary3, dictionary4 and dictionary5. 2 DICTIONARY 2 ACCURACY RATE 1 baseline method MD separate 0.87 WORD PER DICTIONARY Figure 5.3 Accuracy vs. words per dictionary for dictionary2

10 93 2 DICTIONARY 3 ACCURACY RATE 1 baseline method MD separate 0.87 WORD PER DICTIONARY Figure 5.4 Accuracy vs. words per dictionary for dictionary3 1 DICTIONARY 4 ACCURACY RATE baseline method MD separate 0.87 WORD PER DICTIONARY Figure 5.5 Accuracy vs. words per dictionary for dictionary4

11 94 It can be seen from the results obtained that the performance measure increases as the number of dictionary increases with increase in codebook size and then reduces. This is because of redundancy in information after a particular value of codebook size and number of dictionary. DICTIONARY 5 3 ACCURACY RATE baseline method MD separate WORD PER DICTIONARY Figure 5.6 Accuracy vs. words per dictionary for dictionary5 The results obtained are compared with those of the baseline method implemented. In both baseline method and Multiple Dictionary Bag of Words model, the clustering of words are done using Fuzzy C-Means soft clustering algorithm. The algorithm is also applied to Dataset taken from Caltech database that includes four different topics, where in each topic has 200 images that can be considered as a dataset with less number of topics. It is found that the Multiple Dictionary Bag of Words model works for large-scale image search where the number of topics and the number of images per topics are more. From the results obtained, it can be seen that for separate dictionary MDBoW with FCM for the given dataset, the performance measure is maximum for a codebook size of 160 and number of dictionary3.

12 95 Table 5.1 Accuracy rate for word per dictionary 160 for various numbers of dictionaries No:of Dictionary Accuracy Rate Table 5.2 Macro precision for different words per dictionary for Base line method and separate dictionary (MDBoW) No: of Words Base Line Per Dictionary Method Dic 1 Dic 2 Dic3 Dic 4 Dic Table 5.3 Micro Precision for different words per dictionary for Base line method and separate dictionary (MDBoW) No: of Words Per Dictionary Base Line Method Dic 1 Dic 2 Dic3 Dic 4 Dic

13 96 Table 5.4 Micro F1 for different words per dictionary for Base line method and separate dictionary (MDBoW) No: of Words Per Dictionary Base Line Method Dic 1 Dic 2 Dic3 Dic 4 Dic Table 5.5 Macro F1 for different words per dictionary for Base line method and separate dictionary (MDBoW) No. of Words Per Dictionary Base Line Method Dic 1 Dic 2 Dic3 Dic 4 Dic The results projected in Tables 5.2 to 5.5 show that Multiple Dictionary Bag of Words model using Separate dictionary shows better performance than baseline method. This is because more words are taken from different independent dictionaries where as in base line method more words will be taken from same dictionary. Every image feature gets its visual word from every dictionary D n. It can be seen from the results that on an average the method gives maximum accuracy rate for word per dictionary of 160 and the accuracy rate increases as the number of dictionary increases

14 97 from 1 to 5. The tabulation of this result is given in Table 5.1. The parameters Macro Precision, Micro Precision, Micro F1 and Macro F1 have better values for Multiple Dictionary Bag of Words than baseline method. For word per dictionary of 160 all these parameters increase, as the number of dictionary increases Analysis of baseline method and multiple dictionary using unified dictionary concept In dictionary implementation a single dictionary is built from the concatenation of visual words from the dictionaries 1 to N and the image get a single histogram h. Every feature gets only one entry in the histogram h. Figures 5.7 to 5.11 shows the variation of accuracy rate with words per dictionary by varying the number of dictionary generated randomly from the feature pool from 1 to 5. They are named as Dictionary 1, Dictionary 2, Dictionary 3, Dictionary 4 and Dictionary 5 for dictionary method. ACCURACY DICTIONARY 1 WORDS PER DICTIONARY baseline method MD unified Figure 5.7 Accuracy vs. words per dictionary for dictionary1

15 98 05 DICTIONARY 2 ACCURACY 5 5 baseline method MD unified WORDS PER DICTIONARY Figure 5.8 Accuracy vs. words per dictionary for dictionary2 1 DICTIONARY 3 ACCURACY baseline method MD unified WORDS PER DICTIONARY Figure 5.9 Accuracy vs. words per dictionary for dictionary3

16 99 ACCURACY DICTIONARY 4 WORDS PER DICTIONARY baseline method MD unified Figure 5.10 Accuracy vs. words per dictionary for dictionary DICTIONARY 5 ACCURACY 5 5 baseline method MD unified WORDS PER DICTIONARY Figure 5.11 Accuracy vs. words per dictionary for dictionary5

17 100 In both baseline and Multiple Dictionary Bag of Words model with dictionary the clustering of words is done using Fuzzy C-Means clustering algorithm. The algorithm is implemented for Dataset that includes eight different topics where each topic has 200 images. From the results, it is seen that the variation of accuracy with words per dictionary is not consistent for Multiple Dictionary Bag of Words model with dictionary when compared with Separate dictionary. Separate dictionary concept gave the same pattern of result obtained for Multiple Dictionary Bag of Words model with hard clustering (Aly et al 2011). The results projected shows that in the case for lower values of words per dictionary, that is for 80, the accuracy increases and then decreases as the number of dictionary is increased. It gives a maximum measure for dictionary 4, which is better than baseline method for higher value of word per dictionary. Table 5.6 Macro Precision for different words per dictionary for Base line method and unified dictionary (MDBoW) No: of words per dictionary Baseline method Dic 1 Dic 2 Dic 3 Dic 4 Dic

18 101 Table 5.7 Micro Precision for different words per dictionary for base line method and unified dictionary (MDBoW) No: of words per dictionary Baseline method Dic 1 Dic 2 Dic 3 Dic 4 Dic Table 5.8 Micro F1 for different words per dictionary for Base line method and Dictionary (MDBoW) No: of words per dictionary Baseline method Dic 1 Dic 2 Dic 3 Dic 4 Dic Table 5.9 Macro F1 for different words per dictionary for base line method and unified dictionary (MDBoW) No: of words per dictionary Baseline method Dic 1 Dic 2 Dic 3 Dic 4 Dic

19 102 The results projected in Tables 5.6 to 5.9 shows that Multiple Dictionary Bag of Words model using dictionary shows a performance that is not consistent and it varies randomly. Therefore the analysis of multiple dictionary with proposed FCM is done only for separate dictionary concept Conclusion The performance of fuzzy clustering Multiple Dictionary Bag of Words model using Separate and dictionary used for image classification is investigated by varying the words per dictionary and the number of dictionaries generated. It is compared with the base line method. In this approach, more words are taken from different independent dictionaries where as in base line method more words will be taken from same dictionary. Thus multiple dictionary method has less storage than baseline approach. It is seen that the method works better when the number of topics and the number of images per topics are more. The results obtained indicate that Multiple Dictionary Bag of Words model using fuzzy clustering increases the recognition performance than the baseline method which uses fuzzy codebook in Bag of Words method. The performance measures used for evaluation increases as the number of dictionary is increased for a particular value of word per dictionary. The results projected shows that Multiple Dictionary Bag of Words model using unified dictionary shows performance that is not consistent and it varies randomly. Proposed multiple dictionary BoW with FCM using Separate dictionary concept performs better than the baseline method whereas dictionary concept does not show consistent result.

20 PROPOSED FUZZY BASED MDBoW WITH MODIFIED FCM Base Line Method In the Bag of Words model implemented in this research, features are extracted using Harris corner detector and SIFT descriptor is used for representing the extracted features. The extracted feature pool is then clustered using the modified FCM to get a codebook with predefined number of visual words. Features extracted from training images are assigned to the nearest code in the codebook. The image is reduced to the set of codes it contains, represented as a histogram. The normalized histogram of codes is the same as the normalized histogram of visual words. The k closest points from training data is found in testing phase, for the test data point and classification is done using KNN classifier Modified Fuzzy C-Means In the existing Fuzzy C-Means algorithm, the objective function is defined in terms of mean squared error. In the proposed method instead of taking mean squared error the objective function is defined in terms of root mean squared error using R 1 - norm. The root mean squared error is more sensitive than other measures to the occasional large error and the squaring process gives disproportionate weight to very large errors. R 1 -norm is defined as X = x / (5.6) It has been proved that R 1 - K-Means performs slightly better than standard K-means (Ding et al 2006).

21 Steps for proposed modified FCM algorithm The following are the steps to be followed for implementation of the algorithm. Given the data set X, choose the number of clusters 1 < c < N, the weighting exponent m > 1, the termination tolerance > 0 and the norm-inducing matrix A. Initialize the partition matrix randomly, such that U ( ) M. 1. Compute the cluster prototypes (means): V ( ) = ( ( ) ) ( ), c (5.7) (, ) l = 1,2. where v i is the cluster centres calculated using the membership function. 2. Compute the distances: D = (x v ) A(x v ), 1 c, 1 N (5.8) where A = I for Euclidean Norm and D is the distance matrix containing the square distances between data points and cluster centres. 3. Update the partition matrix: ( ) =, ( ) (5.9) Implementation of MDBoW Model using Modified FCM Multiple Dictionaries for BoW, which uses more visual words, have significantly increased the performance of classification of images from

22 105 a large and highly varied image data set. In MDBoW model implemented in this section, features are extracted using Harris corner detector and SIFT descriptor is used for representing the extracted features. In this thesis, only Separate dictionary implementation is implemented using modified FCM. Since the results obtained for dictionary concept in earlier experiment was erratic and not consistent Results and Analysis The effect of variation of different parameters and performance evaluation of MDBoW approach for image classification is done in terms of Micro Precision, Macro Precision, MicroF1-measure, MacroF1-measure and Accuracy rate (Bassam Al-Salemi and Mohd Juzaiddin Ab Aziz 2011) for eight different topics namely burger, spaghetti, egg, spoon, bottle, can, coffee pot and mug from dataset created from Google images. The dataset is created for real time application for visual recognition of objects for a humanoid used in restaurant environment. The images in the dataset used can be categorised as tiny images. For the modified Fuzzy C-Means the parameter m = 1.7 and stop condition = The test data set includes eight different topics each containing 50 images. 200 images per concept were used to build the codebooks. The classifier is trained for another 200 images from each topic. The number of dictionaries formed randomly is varied from 1 to 5 and the word per dictionary is varied from 80 to 200. The distance measure used is Euclidean distance. The sample images from dataset are as shown in Figure 5.1. Figures 5.12 to 5.16 show the variation of accuracy rate with words per dictionary by varying the number of dictionary generated randomly from 1 to 5 which is named as Dictionary1, Dictionary2, Dictionary3, Dictionary4 and Dictionary5.

23 106 ACCURACY DICTIONARY1 WORDS PER DICTIONARY BASELINE MDBoWMODFCM MDBoWFCM Figure 5.12 Accuracy vs. words per dictionary for dictionary1 In both baseline method and Multiple Dictionary Bag of Words model, the clustering of words are done using modified Fuzzy C-Means soft clustering algorithm using R 1 -norm. The results obtained are compared with those of the baseline method and Multiple Dictionary Bag of Words model with FCM. ACCURACY DICTIONARY2 WORDS PER DICTIONARY BASELINE MDBoWMODFCM MDBoWFCM Figure 5.13 Accuracy vs. words per dictionary for dictionary2

24 107 3 DICTIONARY3 ACCURACY BASELINE MDBoWMODFCM MDBoWFCM 0.86 WORDS PER DICTIONARY Figure 5.14 Accuracy vs. words per dictionary for dictionary3 ACCURACY DICTIONARY4 WORDS PER DICTIONARY BASELINE MDBoWMODFCM MDBoWFCM Figure 5.15 Accuracy vs. words per dictionary for dictionary4

25 108 DICTIONARY5 3 2 ACCURACY BASELINE MDBoWMODFCM MDBoWFCM WORDS PER DICTIONARY Figure 5.16 Accuracy vs. words per dictionary for dictionary5 The earlier experimental results validate the finding, MDBoW with FCM performs better than the baseline method with clustering done using FCM. Therefore in this case the results are projected for baseline method using ModFCM. Table 5.10 Macro Precision for different words per dictionary Words Base Per Line Dictionary Method MDBoWFCM MDBoWMODFCM Dic 1 Dic 2 Dic3 Dic 4 Dic 5 Dic 1 Dic 2 Dic3 Dic 4 Dic Table 5.11 Accuracy rate for word per dictionary 160 for various numbers of dictionaries No: of Dictionary Accuracy Rate (MDBoWFCM) Accuracy Rate (MDBoWMODFCM)

26 109 The results obtained shows that Multiple Dictionary Bag of Words model using modified Fuzzy C-Means soft clustering algorithm using R 1 - norm performs better for words per dictionary of 160 as in the previous case, which is consistent for a given dataset. Table 5.12 Micro Precision for different words per dictionary Word Per Base Line Dictionary Method MDBoWFCM MDBoWMODFCM Dic 1 Dic 2 Dic3 Dic 4 Dic 5 Dic 1 Dic 2 Dic3 Dic 4 Dic Table 5.13 Macro F1 for different words per dictionary Word Per Base Line Dictionary Method MDBoWFCM MDBoWMODFCM Dic 1 Dic 2 Dic3 Dic 4 Dic 5 Dic 1 Dic 2 Dic3 Dic 4 Dic Table 5.14 Micro F1 for different words per dictionary Word Per Dictionary Base Line Method MDBoWFCM MDBoWMODFCM Dic 1 Dic 2 Dic3 Dic 4 Dic 5 Dic 1 Dic 2 Dic3 Dic 4 Dic

27 110 The parameters Macro Precision, Micro Precision, Micro F1 and Macro F1 have better values for Multiple Dictionary Bag of Words with modified FCM. For word per dictionary of 160 all these parameters increases as the number of dictionary increases. From the earlier analysis, the results projected shows that Multiple Dictionary Bag of Words model using Separate dictionary shows better performance than baseline method. This is because more words are taken from different independent dictionaries where as in base line method more words will be taken from same dictionary. Every image feature gets its visual word from every dictionary D n. The results projected in Tables 5.10 to 5.14 show the variation of performance measures with increase in word per dictionary. The results projected shows that Multiple Dictionary Bag of Words model using Separate dictionary and clustering using modified FCM with R 1 -norm shows better performance than baseline method and MDBoW using FCM. The reason for this is that R 1 - norm based objective function tries to reduce the effect of outliers in image dataset. It can be seen from the results that the method gives maximum accuracy rate for word per dictionary of 160 and number of dictionary 3. The accuracy rate increases as the number of dictionary increases from 1 to 5. The results obtained validate that MDBoW performs better for datasets having large number of classes and more number of images per topics Conclusion In this chapter, the performance of Multiple Dictionary Bag of Words model used for image classification with code book generated using modified FCM with R 1 -norm in the objective function is investigated. This is done by varying the words per dictionary and the number of dictionaries

28 111 generated. It is compared with the base line method and MDBoW using FCM for dictionary generation. In base line method, more words will be taken from same dictionary where as in this approach, more words are taken from different independent dictionaries. It is seen that the method works better when the number of topics and the number of images per topics are more.

Face Recognition using SIFT Features

Face Recognition using SIFT Features Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

More information

Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features. Cordelia Schmid Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density

Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density Journal of Computational Information Systems 8: 2 (2012) 727 737 Available at http://www.jofcis.com Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density Xiaojun LOU, Junying LI, Haitao

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Clustering & Association

Clustering & Association Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

More information

Modified Sift Algorithm for Appearance Based Recognition of American Sign Language

Modified Sift Algorithm for Appearance Based Recognition of American Sign Language Modified Sift Algorithm for Appearance Based Recognition of American Sign Language Jaspreet Kaur,Navjot Kaur Electronics and Communication Engineering Department I.E.T. Bhaddal, Ropar, Punjab,India. Abstract:

More information

Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Entropy and Information Gain

Entropy and Information Gain Entropy and Information Gain The entropy (very common in Information Theory) characterizes the (im)purity of an arbitrary collection of examples Information Gain is the expected reduction in entropy caused

More information

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering By, Swati Bhonsle Alissa Klinzmann Mentors Fred Park Department of Mathematics Ernie Esser Department of

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Robust Panoramic Image Stitching

Robust Panoramic Image Stitching Robust Panoramic Image Stitching CS231A Final Report Harrison Chau Department of Aeronautics and Astronautics Stanford University Stanford, CA, USA hwchau@stanford.edu Robert Karol Department of Aeronautics

More information

Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

More information

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER TECHNICAL SCIENCES Abbrev.: Techn. Sc., No 15(2), Y 2012 CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER Piotr Artiemjew, Przemysław Górecki, Krzysztof Sopyła

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

1. Bag of visual words model: recognizing object categories

1. Bag of visual words model: recognizing object categories 1. Bag of visual words model: recognizing object categories 1 1 1 Problem: Image Classification Given: positive training images containing an object class, and negative training images that don t Classify:

More information

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

The Delicate Art of Flower Classification

The Delicate Art of Flower Classification The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems Avinash Ravichandran, Rizwan Chaudhry and René Vidal Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218,

More information

Rapid Image Retrieval for Mobile Location Recognition

Rapid Image Retrieval for Mobile Location Recognition Rapid Image Retrieval for Mobile Location Recognition G. Schroth, A. Al-Nuaimi, R. Huitl, F. Schweiger, E. Steinbach Availability of GPS is limited to outdoor scenarios with few obstacles Low signal reception

More information

The use of computer vision technologies to augment human monitoring of secure computing facilities

The use of computer vision technologies to augment human monitoring of secure computing facilities The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University

More information

Enhanced Customer Relationship Management Using Fuzzy Clustering

Enhanced Customer Relationship Management Using Fuzzy Clustering Enhanced Customer Relationship Management Using Fuzzy Clustering Gayathri. A, Mohanavalli. S Department of Information Technology,Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Fourier Descriptors For Shape Recognition. Applied to Tree Leaf Identification By Tyler Karrels

Fourier Descriptors For Shape Recognition. Applied to Tree Leaf Identification By Tyler Karrels Fourier Descriptors For Shape Recognition Applied to Tree Leaf Identification By Tyler Karrels Why investigate shape description? Hard drives keep getting bigger. Digital cameras allow us to capture, store,

More information

Cluster Algorithms. Adriano Cruz adriano@nce.ufrj.br. 28 de outubro de 2013

Cluster Algorithms. Adriano Cruz adriano@nce.ufrj.br. 28 de outubro de 2013 Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning. By: Ibrar Hussain Student ID: Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

Review of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B.

Review of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B. Review of Computer Engineering Research journal homepage: http://www.pakinsight.com/?ic=journal&journal=76 WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM Geeta R.B.* Department

More information

Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

More information

Distributed Kd-Trees for Retrieval from Very Large Image Collections

Distributed Kd-Trees for Retrieval from Very Large Image Collections ALY et al.: DISTRIBUTED KD-TREES FOR RETRIEVAL FROM LARGE IMAGE COLLECTIONS1 Distributed Kd-Trees for Retrieval from Very Large Image Collections Mohamed Aly 1 malaa@vision.caltech.edu Mario Munich 2 mario@evolution.com

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Prototype-based classification by fuzzification of cases

Prototype-based classification by fuzzification of cases Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university pkord@telin.ugent.be Bernard De Baets Dep. Applied Mathematics

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Drug Store Sales Prediction

Drug Store Sales Prediction Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,

More information

6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Face Recognition in Low-resolution Images by Using Local Zernike Moments Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

Job Classification Based on LinkedIn Summaries CS 224D

Job Classification Based on LinkedIn Summaries CS 224D 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015 CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

More information

Clustering and Data Mining in R

Clustering and Data Mining in R Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Contrast enhancement of soft tissues in Computed Tomography images

Contrast enhancement of soft tissues in Computed Tomography images Contrast enhancement of soft tissues in Computed Tomography images Roman Lerman, Daniela S. Raicu, Jacob D. Furst Intelligent Multimedia Processing Laboratory School of Computer Science, Telecommunications,

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference

More information

Polygonal Approximation of Closed Curves across Multiple Views

Polygonal Approximation of Closed Curves across Multiple Views Polygonal Approximation of Closed Curves across Multiple Views M. Pawan Kumar Saurabh Goyal C. V. Jawahar P. J. Narayanan Centre for Visual Information Technology International Institute of Information

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Food brand image (Logos) recognition

Food brand image (Logos) recognition Food brand image (Logos) recognition Ritobrata Sur(rsur@stanford.edu), Shengkai Wang (sk.wang@stanford.edu) Mentor: Hui Chao (huichao@qti.qualcomm.com) Final Report, March 19, 2014. 1. Introduction Food

More information

Medial Axis Construction and Applications in 3D Wireless Sensor Networks

Medial Axis Construction and Applications in 3D Wireless Sensor Networks Medial Axis Construction and Applications in 3D Wireless Sensor Networks Su Xia, Ning Ding, Miao Jin, Hongyi Wu, and Yang Yang Presenter: Hongyi Wu University of Louisiana at Lafayette Outline Introduction

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM

SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM Mrutyunjaya Panda, 2 Manas Ranjan Patra Department of E&TC Engineering, GIET, Gunupur, India 2 Department

More information

Churn problem in retail banking Current methods in churn prediction models Fuzzy c-means clustering algorithm vs. classical k-means clustering

Churn problem in retail banking Current methods in churn prediction models Fuzzy c-means clustering algorithm vs. classical k-means clustering CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

More information

An Enhanced Clustering Algorithm to Analyze Spatial Data

An Enhanced Clustering Algorithm to Analyze Spatial Data International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

CS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning

CS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning CS229 Project Final Report Sign Language Gesture Recognition with Unsupervised Feature Learning Justin K. Chen, Debabrata Sengupta, Rukmani Ravi Sundaram 1. Introduction The problem we are investigating

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Neighborhood and Price Prediction for San Francisco Airbnb Listings

Neighborhood and Price Prediction for San Francisco Airbnb Listings Neighborhood and Price Prediction for San Francisco Airbnb Listings Emily Tang Departments of Computer Science, Psychology Stanford University emjtang@stanford.edu Kunal Sangani Department of Economics

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

Computer Vision - part II

Computer Vision - part II Computer Vision - part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1 2

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Introduction to knn Classification and CNN Data Reduction

Introduction to knn Classification and CNN Data Reduction Introduction to knn Classification and CNN Data Reduction Oliver Sutton February, 2012 1 / 29 1 The Classification Problem Examples The Problem 2 The k Nearest Neighbours Algorithm The Nearest Neighbours

More information

K-nearest-neighbor: an introduction to machine learning

K-nearest-neighbor: an introduction to machine learning K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Outline Types of learning Classification:

More information

Classification using intersection kernel SVMs is efficient

Classification using intersection kernel SVMs is efficient Classification using intersection kernel SVMs is efficient Jitendra Malik UC Berkeley Joint work with Subhransu Maji and Alex Berg Fast intersection kernel SVMs and other generalizations of linear SVMs

More information

Fast Visual Vocabulary Construction for Image Retrieval using Skewed-Split k-d trees

Fast Visual Vocabulary Construction for Image Retrieval using Skewed-Split k-d trees Fast Visual Vocabulary Construction for Image Retrieval using Skewed-Split k-d trees Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris Information Technologies Institute, CERTH, Thessaloniki,

More information

Distance based clustering

Distance based clustering // Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

More information

Structural Matching of 2D Electrophoresis Gels using Graph Models

Structural Matching of 2D Electrophoresis Gels using Graph Models Structural Matching of 2D Electrophoresis Gels using Graph Models Alexandre Noma 1, Alvaro Pardo 2, Roberto M. Cesar-Jr 1 1 IME-USP, Department of Computer Science, University of São Paulo, Brazil 2 DIE,

More information

Lecture 20: Clustering

Lecture 20: Clustering Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

More information

VISION BASED INDIAN SIGN LANGUAGE CHARACTER RECOGNITION

VISION BASED INDIAN SIGN LANGUAGE CHARACTER RECOGNITION VISION BASED INDIAN SIGN LANGUAGE CHARACTER RECOGNITION 1 ASHOK KUMAR SAHOO, 2 KIRAN KUMAR RAVULAKOLLU Department of Computer Science and Engineering, Sharda University, Greater Noida, India E-Mail: ashoksahoo2000@yahoo.com

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

A Chain Code Approach for Recognizing Basic Shapes

A Chain Code Approach for Recognizing Basic Shapes A Chain Code Approach for Recognizing Basic Shapes Dr. Azzam Talal Sleit (Previously, Azzam Ibrahim) azzam_sleit@yahoo.com Rahmeh Omar Jabay King Abdullah II for Information Technology College University

More information

On Multifont Character Classification in Telugu

On Multifont Character Classification in Telugu On Multifont Character Classification in Telugu Venkat Rasagna, K. J. Jinesh, and C. V. Jawahar International Institute of Information Technology, Hyderabad 500032, INDIA. Abstract. A major requirement

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points Journal of Computer Science 6 (3): 363-368, 2010 ISSN 1549-3636 2010 Science Publications Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information