Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering

Similar documents
Grade 6 Mathematics Performance Level Descriptors

The Scientific Data Mining Process

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

APPLYING COMPUTER VISION TECHNIQUES TO TOPOGRAPHIC OBJECTS

Prentice Hall Mathematics: Course Correlated to: Arizona Academic Standards for Mathematics (Grades 6)

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

Part-Based Recognition

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Numeracy and mathematics Experiences and outcomes

Introduction to Pattern Recognition

SIGNATURE VERIFICATION

Classifying Manipulation Primitives from Visual Data

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

MATHS LEVEL DESCRIPTORS

Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills

Colour Image Segmentation Technique for Screen Printing

Social Media Mining. Data Mining Essentials

Music Genre Classification

Simultaneous Gamma Correction and Registration in the Frequency Domain

Descriptive Statistics and Measurement Scales

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

Number Sense and Operations

Document Image Retrieval using Signatures as Queries

Knowledge Discovery from patents using KMX Text Analytics

Grade 5 Math Content 1

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Euler Vector: A Combinatorial Signature for Gray-Tone Images

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

Image Segmentation and Registration

Big Ideas in Mathematics

Predict the Popularity of YouTube Videos Using Early View Data

Common Core Unit Summary Grades 6 to 8

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Object Recognition and Template Matching

Object Tracking System Using Motion Detection

Module II: Multimedia Data Mining

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

CCSS Mathematics Implementation Guide Grade First Nine Weeks

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

1 BPS Math Year at a Glance (Adapted from A Story of Units Curriculum Maps in Mathematics P-5)

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Galaxy Morphological Classification

Mathematics. Mathematical Practices

ART Extension for Description, Indexing and Retrieval of 3D Objects

TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Tennessee Mathematics Standards Implementation. Grade Six Mathematics. Standard 1 Mathematical Processes

Mean-Shift Tracking with Random Sampling

FOREWORD. Executive Secretary

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Efficient Attendance Management: A Face Recognition Approach

CAMI Education linked to CAPS: Mathematics

Canny Edge Detection

Support Materials for Core Content for Assessment. Mathematics

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

Environmental Remote Sensing GEOG 2021

Heat Kernel Signature

ISAT Mathematics Performance Definitions Grade 4

Grade 5 Common Core State Standard

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Correlation and Convolution Class Notes for CMSC 426, Fall 2005 David Jacobs

Vision based Vehicle Tracking using a high angle camera

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

TWO-DIMENSIONAL TRANSFORMATION

Analecta Vol. 8, No. 2 ISSN

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Chapter 111. Texas Essential Knowledge and Skills for Mathematics. Subchapter B. Middle School

Standards and progression point examples

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

EXPLORING SPATIAL PATTERNS IN YOUR DATA

LESSON 4 Missing Numbers in Multiplication Missing Numbers in Division LESSON 5 Order of Operations, Part 1 LESSON 6 Fractional Parts LESSON 7 Lines,

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Content-Based Image Retrieval

Assessment Anchors and Eligible Content

Prentice Hall Connected Mathematics 2, 7th Grade Units 2009

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Face detection is a process of localizing and extracting the face region from the

Chapter ML:XI (continued)

Multimodal Biometric Recognition Security System

Performance Level Descriptors Grade 6 Mathematics

Novel Automatic PCB Inspection Technique Based on Connectivity

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

How To Cluster

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Building an Advanced Invariant Real-Time Human Tracking System

Transcription:

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering By, Swati Bhonsle Alissa Klinzmann Mentors Fred Park Department of Mathematics Ernie Esser Department of Mathematics

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering Abstract: Given images consisting of stem cell concentrations in tumors, we provide a method of distinguishing cancer cells by means of k-means clustering using a Fast Fourier Transform (FFT) descriptor, which is invariant to translation, rotation, scaling, and change of starting point. This descriptor is based on the FFT of the centroid distance function applied to a binary image of the cell data. We provide numerical results using k-means with this descriptor to distinguish different cells shapes. We then show how to improve this descriptor by incorporating the measurement of concentrations within the cells. We define this addition to the descriptor by dividing the cell data into specific regions and computing the histogram of stem cell concentration within these regions. This approach applies concepts from mathematics, computer science and biology to help quantify tumor cells.

Computing the Centroid To fully understand the descriptor which is based on the centroid distance function, it is essential to foremost understand how one computes the centroid. In this section, we note that the formulas were found from Yang, et al. [3]. The position of the centroid, the center of gravity, is fixed in relation to the shape. The shape in this particular context is a binary image. The centroid can be calculated by taking the average of all the points that are defined inside a shape. Under the assumption that our shape is simply connected, we can compute the centroid simply by using only the boundary points. The function defining the contour of our shape is given by the following discrete parametric equation: ( ) ( ( ) ( )) This equation is valid in the Cartesian coordinate system. Let N be the total number of points on the border of our figure. Here, n [0, N - 1], and Γ(N) = Γ(0) as the 0 and N index are the same point on our boundary. The x and y coordinates of the centroid, denoted by and respectively, are given by: ( )( ) ( )( ) Here, the area of the shape, A, is given by the following equation: ( )

These equations are valid because any polygon can be partitioned into triangles, each triangle having its own centroid and area. The centroid of a triangle is given by the following equation under the assumption that (0, 0) is a point on the triangle: ( ) The area of a triangle is given by: ( ) Generally the centroid of the entire polygon is represented by: Substituting the equations which compute the centroid and area of a triangle into this equation, we can confirm the validity of this equation. By taking the sum of the products of every respective triangles centroid and area then dividing by the total area of the entire polygon, the equations given to find the coordinates of the centroid from the boundary points of the shape are correct. Shape Signature Using the centroid distance function we can create a shape signature using the centroid and the boundary points. Shape signatures are one-dimensional functions which are derived from the contour of the shape. They provide information about the shapes features. Despite the fact that shape signatures can be used to describe a shape alone, they are often used as a preprocessing tool for other algorithms that extract features, like the Fourier descriptors (FD). This is because classification using shape signatures is computationally intensive due to the complex

normalization needed to obtain rotational invariance. The shape signature will only be rotationally invariant if the starting point on our original shape can be identified on our rotated shape. This may not always be obvious. In addition, because shape signatures are local representations of shape features extracted from the spatial domain, they are sensitive to noise. Fourier descriptors have the advantages of simple computation and also simple normalization making it preferable for online retrieval. We show later that the Fourier descriptors obtained from the centroid distance function are also robust to noise, yet another advantage. Centroid Distance Function The centroid distance function expresses the distances of the boundary points from the centroid ( ) of a shape. It is given by the following formula: ( ) ( ( ) ) ( ( ) ) Since this shape signature is only dependent on the location of the centroid and the points on the boundary, it is invariant to the translation of the shape and also the rotation of the shape if the starting point of the original shape can be identified on the rotated shape. If this is possible and we calculate our new shape signature starting at that point, then the shape signatures of the original and rotated shapes will be identical. This signature alone is not invariant to a change in starting point or scaling so we apply the Fourier transformation, normalize the coefficients, and then take the magnitude of these normalized coefficients to obtain a descriptor that is invariant to translation, rotation, scaling, and change of starting point. The Fourier Transform and Descriptors In this section we define the Fourier transform and descriptor obtained by taking the Fourier transform of the shape signature. The equations presented were found from Yang, et al. [3].

The discrete Fourier transform of the shape signature, ( ) is given by the following equation: ( ) ( ) The coefficients,, need to be further processed so that they are starting point and scaling invariant shape descriptors. Shape features are represented by some or all of the coefficients of the transform. They characterize the appearance of images. The general form of the Fourier coefficients of a contour centroid distance function, ( ), transformed through scaling and change of start point from the original function ( ) ( ) is given by ( ) ( ) Here, is the Fourier coefficient of the transformed centroid distance shape signature, and ( ) is the Fourier coefficient of the original shape. is the angle of rotation required to align the starting points, and is the scale factor. To normalize the coefficients returned by the Fourier transform we consider the following equation where is the normalized Fourier coefficient of the transformed shape and ( ) is the normalized Fourier coefficient of the original shape: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) After normalizing our coefficients by taking the values of the Fourier coefficients and dividing it by the first non-zero frequency component, the scale factor cancels showing that the descriptor is invariant to scaling. If the phase information is ignored by taking the magnitude of the normalized coefficients then the Fourier descriptor is also invariant to a change in the starting

point. ( ) because the absolute value of the expression ( ) is 1 using Euler s formula. Testing Implementation of the Fourier Transform Applied to the Centroid Distance Function To test our implementation of the Fourier descriptor in Matlab we created several functions to check for the properties expected, such as starting point and scaling invariance. Confirming Change of Starting Point Invariance To check for starting point invariance we defined a shape, in this example a rectangle for simplicity, and extracted its shape signature using the centroid distance function. The shape and its corresponding shape signature are shown below. Original Rectangle Figure 1. The shape signature of the original rectangle. Afterward, we coded a forward shift function which shifted the starting point on our original shape. Shifting the starting point by 1, 20, and 40 points respectively we acquired corresponding shifted shape signatures.

Shifted Shape Signatures Distance from Centroid Magnitude of Normalized Fourier Coefficients Points on Boundary Figure 2. The shape signatures of the original rectangle with the starting points shifted. Taking the Fourier transformation of each of the shifted signatures produced the same normalized Fourier coefficients for each point proving that our implementation is starting point invariant. Fourier Transform Frequency Figure 3. The Fourier transform of each of the shifted shape signatures.

Confirming Scaling Invariance To check for scale invariance we created two figures. Our first figure was a 10 by 24 rectangle, and our second figure was our original rectangle scaled by a factor of two, so a 20 by 48 rectangle. We again extracted the corresponding shape signatures shown in Figure 4. Original Rectangle Original Rectangle Scaled by 2 Figure 4. Comparing the shape signatures of a scaled rectangle to the original rectangle.

Taking the Fourier transform of the original rectangle and the scaled rectangle using all the points on the respective contours, gave us the following normalized Fourier coefficients: Figure 5. Comparing the magnitudes of the normalized Fourier coefficients of the original and scaled rectangle. We find that the graph of the Fourier transform applied to the scaled rectangle looks denser because there are twice as many points on the boundary of our bigger rectangle. To clearly demonstrate that our implementation is scale invariant we plotted just the lower frequencies common to both plots.

Figure 6. Plotting the lower frequencies common to both plots in Figure 5. This graph looks identical to the graph of the normalized Fourier coefficients extracted from the original rectangle confirming our implementation is invariant to changing the starting point of or scaling our shape. Fourier Descriptors are Robust to Noise In Zhang, et al. [1] that the Fourier descriptor is commonly used for shape retrieval because it is robust to noise. To illustrate this fact consider the following apples and their corresponding shape signatures (all images in this section are pulled from the article): Figure 7. Three apple shapes and their centroid distance signatures. The first apple is the original, the second apple is affected by noise, and the last apple has occlusion. When reconstructing these apples using the first 3 Fourier coefficient and the first 5 coefficients we obtain the following figures:

Figure 8. The Fourier reconstructed apple 1, 2, and 3 using (a) the first 3 FD; (b) the first 5 FD. The discrete Fourier transformation of a boundary signature generates a complete set of complex numbers; these Fourier descriptors represent the shape of the object in the frequency domain. Information about the general shape is contained in the lower frequency descriptors. Higher frequency descriptors contain information about more minute details of the shape. Details are already starting to emerge with just 5 coefficients, when comparing the image retrieved to that of the image retrieved with 3 coefficients. We find in this example that the reconstructed apples within Figure 8(a) and Figure 8(b) are very similar even though the original shapes and their corresponding shape signatures were differently affected by serious defections. Since low frequency coefficients are usually the most significant when it comes to retrieving the image, we find that shape descriptors using the Fourier coefficients are robust to variations of shape boundaries. Measuring Performance To differentiate between shapes, one can look specifically at a subset of the Fourier descriptors. Balancing both the accuracy and efficiency of image retrieval, Zhang, et al. [2] shows that 10 Fourier descriptors are enough for shape representation. When the number of Fourier descriptors is reduced to 10, the retrieval performance is not that much worse than if more Fourier descriptors are used. This doesn t require much computation and is yet another reason that Fourier descriptors are suitable for shape retrieval and indexing.

To test the retrieval performance of the Fourier descriptors using different shape signatures Zhang and Lu performed an experiment in which a Java online indexing and retrieval framework was implemented. A database of 1400 shapes was created, and each shape was used as a query for which the retrieval was evaluated by the precision and recall common performance measure. Precision is defined as the ratio of the number of retrieved relevant shapes to the total number of retrieved shapes. It measures the accuracy of the retrieval as it is the fraction of retrieved instances that are relevant. Recall is defined as the ratio of the number of retrieved relevant images to the total number of relevant shapes in the whole database. It measures the robustness (completeness) of the retrieval performance as it is the fraction of relevant instances that are retrieved. The precision of the retrieval at each level of the recall is obtained for each query. The result precision of retrieval using a type of FD is the average precision of all the query retrievals using the type of FD. The criteria for measuring the similarity between a query shape Q and a target shape T is the Euclidean distance between their FD representations. Their results are represented in the following graph: Figure 9. Average retrieval performance of different FD. The data shows that on average the retrieval using the Fourier descriptor from the centroid distance function performs significantly better in terms of robustness, computation complexity,

and retrieval performance of its Fourier descriptors, than the Fourier descriptors derived from other shape signatures. Cell Clustering Application: Methods Our research entailed the use of Matlab and pre-segmented tumor images containing stem cell concentrations. Centroid Distance Function, Shape Signatures, and FFT Applied to Cell Data Next, we implemented the FFT using the centroid distance function, and we applied our descriptor to both to a binary tumor image and also the same image rotated 30. This trial was to check for rotational invariance. Figure 10. The original stem cell image.

Figure 11. The original stem cell image with calculated boundary points and centroid. Figure 12. The original stem cell image rotated 30 with calculated boundary points and centroid. We extracted the shape signature of both the images, and obtained identical signatures that only differed by a periodic shift caused by the rotation. Figure 13. Shape signatures of both stem cell images. Blue = original, Red = 30 rotated image.

However, after applying FFT to both the shape signatures, we obtained two identical graphs showing that our implementation is invariant to rotation. Figure 14. FFT of both shape signatures. Wedges and Histograms Using the same original image (Figure 10), we converted the pixel locations into the polar coordinate system (r,θ), and then formed partitions in θ to create wedges within the cell. The wedge partitions were created from the centroid running clockwise from π to π in increments of π/12.

Figure 15. First wedge partition from π to 11π/12. Figure 16. Second wedge partition from 11π/12 to 10π/12. Within each of the wedges, we calculated the histograms of each of the RGB channels: red, green and blue.

Figure 17. Histograms within the first wedge. Figure18. Histograms within second wedge. By taking the histogram we were able to account for the stem cell concentrations in the tumor. Figure 17 and Figure 18 show the first and second wedge partitions within the cell and their corresponding histograms.

Polar Rectangles and Average Histogram Taking the application of creating wedges one step further, we applied partitions both in θ and in r to create polar rectangles within the cell. In the following image we created polar rectangles when θ = 4 and r = 2. Figure 19. Original stem cell image with polar rectangle partitions. Looking at the cell image, we noticed that the majority of the stem cell concentrations lie near the boundary of the cell. By partitioning r we are able to quantify the distance from the centroid where the stem cells lie. For an example we looked specifically at the polar rectangle when θ = 1 and r = 2. Figure 20. Polar rectangle where θ = 1 and r = 2. Again, we took the histogram of each channel within each of these polar rectangles to calculate the intensity values. To condense the three histograms into a single histogram, we plotted the

average of the intensities in each of the RGB channels. We also set the number of bins in the histogram equal to four to get a more general view of the histogram. Figure 21. Average histogram taken within the polar rectangle where θ = 1 and r = 2. With this foundation of polar rectangles, we can further extend the research to define a more informative descriptor that integrates within the polar rectangles to calculate the intensity values, instead of finding the histograms. Knowing that the stem cell concentrations are located near the boundary of the cell, we could also look at the average curvature of the cell boundary. This will provide insight on how stem cell concentrations affect the deformation of the cell boundary. Clustering We combined our descriptor to use k-means clustering to group similar cells together. For this we used an image containing six separate cell regions.

Figure 22. Original image used to displayed clustering. To begin we isolated the six different cell regions. Next, we applied our descriptor to each region. We utilized an up-sampling of the boundary points to resize all images to the same vector length because only when all six vectors were of the same length could we apply k-means to cluster the different cell regions. In our first example, we set k = 2 to cluster the cells into two groups. The first group is shown by the green boundary points and the second group shown by the yellow boundary points.

Figure 23. K-means clustering where k = 2. Next we ran k-means again, but this time setting k = 3 to cluster the cells into three groups. The three groups are shown by the green, white, and yellow colored boundaries.

Figure 24. K-means clustering where k = 3. The results were promising as similar cells were grouped together. The smaller, more round cells were clustered together and the more elongated cells were clustered into another group. Conclusion and Future Research: One thing to note is that when implementing the polar rectangles and histograms, invariance to start point is no longer addressed because we applied the polar rectangles and the histograms to the original image and not the normalized coefficients of the FFT. Future research entails applying FFT to theta, the angle at which we set the divisions for the wedges of our polar rectangles.

To progress the research with clustering, we hope to use both the shape descriptor along with the contextual information to more effectively classify and cluster cells based on their stem cell concentrations. More specifically, we hope to implement a method of classification and clustering on cancer cell images and be able to classify cancer cells from non-cancer cells, or distinguish between different of cancer cells stages. Acknowledgements The reported research was conducted while the authors were participating in the NSF funded icamp program (http://math.uci.edu/icamp) at the mathematics department of UC Irvine in the summer of 2011. We thank all the icamp mentors and graduate assistants, in particular our project mentors Dr. Fred Park, Dr. Ernie Esser and Dilan Gorur for their advice, guidance and expertise. Cell images were kindly provided by John Lowengrub. The NSF PRISM grant DMS- 0948247 is gratefully acknowledged. References [1] D. Zhang and G. Lu, A comparative study of curvature scale space and fourier descriptors for shape-based image retrieval, Visual Communication and Image Representation, vol. 14(1), 2003. [2] D. Zhang and G. Lu, A comparative study of fourier descriptors for shape representation and retrieval, in Proc. 5th Asian Conference on Computer Vision, 2002. [3] Yang Mingqiang, Kpalma Kidiyo and Ronsin Joseph, A Survey of Shape Feature Extraction Techniques, Pattern Recognition Techniques, Technology and Applications, Peng-Yeng Yin (Ed.) (2008) 43-90.