Word Level Script Identification for Scanned Document Images
|
|
- Clarence Miles
- 8 years ago
- Views:
Transcription
1 Word Level Script Identification for Scanned Document Images Huanfeng Ma and David Doermann Language and Media Processing Laboratory Institute for Advanced Computer Studies University of Maryland, College Park, MD 20742, USA ABSTRACT In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k-nearest-neighbor (k-nn)) are used to identify different scripts at the word level (glyphs separated by white space). These three classifiers are applied to a variety of bilingual dictionaries and their performance is compared. Experimental results show the capability of Gabor filter to capture script features and the effectiveness of these three classifiers for script identification at the word level. Keywords: Script Identification, Support Vector Machines (SVM), Gaussian Mixture Model (GMM), k-nearest- Neighbor (k-nn), Gabor Filter 1.1. Background 1. INTRODUCTION In recent years, the demand for tools to be able to recognize, search and retrieve written and spoken sources of multilingual information has increased tremendously. With the rapid explosion of online repositories, researchers and developers of cross-lingual search and translation systems can get a lot of resources they need easily from the Internet. However, there are still significant resources that can only be accessed in a printed form, especially for sparse, low density languages. Manipulation and conversion of these printed documents is essential for many researchers and organizations. One of the most important tasks to address with printed documents is the automatic recognition of text, which usually consists of three steps: (1) zone segmentation and text region identification using document layout analysis; (2) text line, word and character segmentation; and (3) optical character recognition (OCR). In the last step, OCR systems are often designed to work on documents with the specific script. In order to parse bilingual or multilingual documents such as patents 1 or bilingual dictionaries, or perform multilingual document retrieval, 2 the script must be identified before feeding words to an appropriate OCR system. Our motivation to do script identification stems from attempts to acquire lexicon from bilingual dictionaries. In our previous work, 2, 3 a document image was first segmented into physical zones then into entries based on extracted features. For bilingual dictionaries with one non-roman script, script identification is essential both in entry segmentation and in parsing and tagging the entry itself. Further author information: (Send correspondence to Huanfeng Ma) Huanfeng Ma: hfma@umiacs.umd.edu David Doermann: doermann@umiacs.umd.edu
2 1.2. Previous Work In earlier work on script identification, Hochberg et al. 4 described a technique for identifying 13 scripts including highly connected ones. In their algorithm, a scale-normalized cluster template was created for each script, based on the frequent characters or word shapes of this script. Scripts were then classified by comparing a subset of the document s textual symbols with these templates. Spitz et al. 1, 5 initially divided scripts into Asian (Chinese, Japanese and Korean) or Roman based on the observation that upward concavities are distributed evenly along the vertical axis of Asian characters, but tend to appear at specific locations in Roman characters. Furthermore, discrimination among Asian scripts was made on the basis of character density. Waked et al. 6 used features of the horizontal projection of a text line to classify scripts into three categories including Arabic, Roman and Ideographic. Specific features such as character complexity or curvatures can be used to distinguish different scripts if the classifier designer has sufficient knowledge of the scripts, classification, and thus, is case-dependent. In recent years, texture analysis techniques have been introduced to classify different font-styles and font-faces. Zhu et al. 7 present a font recognition algorithm based on global texture analysis. Gabor filters were used to extract the global texture features, and the extracted features were used to recognize different font styles and faces. They also demonstrated the capability to identify Chinese and English documents with different fonts. 2. SCRIPT IDENTIFICATION AT THE WORD LEVEL It should be noted that all of the script identification approaches mentioned above are at the block or page level, which means that a block or page is assumed to have the same script. Obviously, this is not the case for bilingual dictionaries where text with different scripts may be interlaced. For example, in the English-Chinese bilingual dictionary shown in Figure 1, there is no rule to identify which part should be Chinese and which part should be English unless the content is known. This means it is impossible to combine words which belong to the same script into a whole component, thus the identification must be done at the word level. Figure 1. English-Chinese dictionary. In our work, we perform script identification using the Gabor filter analysis of textures. The use of Gabor filters in extracting texture features of images was motivated by the following two factors: (1) The Gabor representation has been shown to be optimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency 8 ; and (2) Gabor filters can be considered as orientation and scale tunable edge and line detectors, and the statistics of these micro-features in a given region are often used to characterize the underlying texture information. In our previous work, 9 we proposed a general approach which computed the mean and standard deviation feature vectors of each class in the training phase. For each test sample, the classification is based on the distance between this sample and each class. Suppose the feature vector is in a d-dimensional space, and the computed mean and standard deviation feature vectors for class λ i are µ (i), α (i), where i = 1...M and M is the number of classes. Then for each test sample x R d, the distance between this sample and each class is computed using the following formula: d x k µ (i) k d(x, λ i ) = k=1 α (i) i = 1...M k
3 In this paper, the k-nearest-neighbor (k-nn), Support Vector Machine (SVM) and a Gaussian Mixture Model (GMM) are applied to the classification to improve the performance. The performance of these three classifiers will be compared in Section 4. Figure 2. System architecture. 3. SYSTEM DESIGN A diagram of the system is shown in Figure 2. The main operations in each part of the system will be described in detail in the following subsections Preprocessing The goal of document image preprocessing is to clean the document image and remove variations that may affect the final identification results. The main operations include: (1) Image deskewing; (2) Line removal; and (3) Symbol removal. Deskewing. During the scanning procedure, the document may be skewed. The word segmentation is based on the bounding box of the segmented word, which can be affected by skew. Deskewing is based on a horizontal projection profile. 10 Assuming the skew angle is less than 15 degrees, we first obtain the horizontal projection profile of all text lines. By iteratively rotating the image and computing the correlation of the profile, we can obtain the deskewing angle. The image is then simply rotated by this angle. Line removal. The lines we want to remove from bilingual dictionary pages usually appear as long horizontal lines at the top or bottom of a page, or as long vertical lines in the middle, so we are concerned primarily with these two cases in our work. Thus, the line detection and removal algorithm does not need to be complicated, a Hough Transform was applied to detect the lines to be removed. Symbol removal. In most bilingual dictionaries, there are some special symbols which belong to neither of the scripts we want to identify. Simply assigning these symbols to one class can degrade the classification performance. Before performing classification, these symbols need to be detected and removed from the original image to generate a clean image. In our work, a template matching approach was applied to complete the symbol detection. First, we extracted all the symbols and created one model template for each symbol. Then, for each saved template, we go through the image to detect and recognize the symbol based
4 on a generalized Hausdorff measure. The generalized Hausdorff distance actually measures the degree of mismatch between two point sets, which thus can also be employed to evaluate the resemblance of one point set to another. 11 Once a symbol is detected, the rectangular area that covers the symbol is simply set to background color. Figure 3 shows an example of symbol detection and removal. Figure 3. Symbol detection and removal, where the left image is the original image, the middle image shows the removed symbols, and the right image is the clean image Word Extraction and Processing Script classification is applied at the word level, so before extracting texture features, all words need to be extracted from the document image. In our work, the Docstrum algorithm 12 was applied to perform word segmentation. Word images in different classes, even different word images in the same class, may have different sizes (width and height). To make features be consistent, word image replication and scaling is applied to create a normalized image with predefined size (64 64 pixels in our case). Features used in the following sections are extracted from images with the same size. Figure 4 shows word image replication and scaling examples of two different scripts (Arabic, Roman). Figure 4. Word image replication and scaling. (a,c) original image, (b,d) normalized images Feature Extraction A pair of isotropic Gabor filters are applied to extract texture features of each class Gabor filter design The computational model for 2D isotropic Gabor filters are: h e (x, y) = g(x, y) cos[2πf(xcosθ+ysinθ)] h o (x, y) = g(x, y) sin[2πf(xcosθ+ysinθ)]
5 where h e and h o are the even-symmetric Gabor filters, and g(x,y) is an isotropic Gaussian function with form: g(x, y) = The spatial frequency responses of the Gabor functions are: where j = 1 and 1 + y 2 2πσ 2 exp( x2 2σ 2 ) H e (u, v) = [H 1(u, v) + H 2 (u, v)] 2 H o (u, v) = [H 1(u, v) + H 2 (u, v)] 2j H 1 (u, v) = exp{ 2π 2 σ 2 [(u fcosθ) 2 + (v fsinθ) 2 ]} H 2 (u, v) = exp{ 2π 2 σ 2 [(u + fcosθ) 2 + (v fsinθ) 2 ]} f, θ and σ are the spatial frequency, orientation and space constant of the Gabor envelope. In our case, the image size is normalized to 64 64, so four values of spatial frequency are selected: 0.04, 0.08, 0.16 and The combination of these four frequencies with four selected values of θ (0, 45, 90, 135 ) give a total of 16 Gabor channels. The non-orthogonality of the Gabor wavelets implies there is redundant information in the filtered images. In order to reduce the redundancy, the filters are designed to insure that the half-peak magnitude support of the filter responses in the frequency spectrum touch each other as shown in Figure 5. So the space constant σ is selected based on the formula: σ = 1/(0.6f). Figure 5. Frequency response of Gabor filters. (left: desired response, right: real response.) Feature representation The Gabor wavelet transform of an image I(x,y) is defined as: G mn (x, y) = I(s, t)gmn(x s, y t)dsdt where * indicates the complex conjugate. Based on the computed mean µ mn and the standard deviation σ mn of the magnitude of the transform coefficients, a feature vector (with dimension 32 to represent 16 channels) is constructed as: where µ mn and σ mn are computed as: x = [µ 00, σ 00, µ 01, σ 01,..., µ 33, σ 33 ] µ mn = G mn (x, y) dxdy σ mn = ( G mn (x, y) u mn ) 2 dxdy
6 3.4. Classifier Design Although the system can extract the texture features of different scripts, how they are best used depends on the characteristics of the specific scripts. It is important to assign appropriate weights to different features based on the training samples. As mentioned in Section 2, in previous work, 9 we performed the classification based on the distance of a test sample to each class. The distribution of training samples in the feature space was only taken into account by normalizing the distance with the standard deviations of training samples. To improve the performance of classification, we employ three new classifiers k-nn classifier The k-nearest-neighbor is the extension of the Nearest Neighbor classifier which was first introduced by Cover and Hart 13 in Illustrated in Figure 6, a test sample x is classified by assigning it the label most frequently represented among the k nearest samples. A decision is made by examining the labels of the k nearest neighbors and taking a vote. Figure 6. The k-nearest-neighbor classifier. It starts at the test sample x and grows a spherical region until k training samples are enclosed. The test sample is labeled by a majority vote of these samples. In this k=3 case, the test sample x would be labeled the class of * points SVM classifier SVMs were first introduced in the late seventies, but are recently receiving increased attention. SVMs have been applied in many fields such as handwritten digit recognition, 14 object recognition, 15 speaker identification, 16 face detection in images 17 and text categorization. 18 The SVM classifier constructs a best separating hyperplane (the maximal margin plane) in a high-dimensional feature space which is defined by nonlinear transformations from the original feature variables. Consider the binary classification task in which we have a set of training samples {x i, y i }, i = 1,..., N, y i { 1, 1}, x i R d, where y i are labels corresponding to two classes λ 1 and λ 2 and y i = ±1, the discriminant function is defined as: with the decision rule and all training points are correctly classified if g(x) = w T Φ(x) + b (1) w T Φ(x i ) + b > 0 for x i λ 1 with y i = +1 (2) w T Φ(x i ) + b < 0 for x i λ 2 with y i = 1 (3) y i (w T Φ(x i ) + b) > 0 for all i (4) Figure 7(a) shows two linearly separable sets of data. Many possible hyperplanes can separate these two sets. The goal of SVM is to determine the hyperplane for which the margin - the distance between two parallel hyperplanes (H1 and H2 in Figure 7, which are termed the canonical hyperplanes) on each side of the hyperplane
7 (a) (b) Figure 7. Separating hyperplanes for two sets of data. (a) Linear separating hyperplanes; (b) Nonlinear separating hyperplanes. The separating hyperplane is H : w T Φ(x)+b = 0 and two canonical hyperplanes are H 1 : w T Φ(x)+b = +1 and H 2 : w T Φ(x) + b = 1. The circled data points (lie on two canonical hyperplanes) are support vectors. H that separates the data - is the largest. The data points that lie on the two canonical hyperplanes are called support vectors (circled in Figure 7). The transformation defined by mapping function Φ(x) in Eq. 1 can be linear or nonlinear which can be applied to the separation of linearly-separable and nonlinearly-separable-only data. Figure 7(a) shows an example of separating hyperplanes of linearly separable data, while the two data sets shown in Figure 7(b) can only be separated nonlinearly. For nonlinear SVMs, the kernel function K(x i, x j ), which is defined as K(x i, x j ) = Φ(x i ) Φ(x j ) can be polynomial, Gaussian or sigmoid. Burges 18 gave a detailed description on how to find the separating hyperplanes. We chose the SVM implementation SVM-light 19 and the polynomial kernel function in our work. The SVM was trained using randomly chosen training pages GMM classifier The Gaussian Mixture Model (GMM) classifier is used to model the probability density function of a feature vector, x, by the weighted combination of M multi-variate Gaussian densities: p(x Λ) = M p i g i (x) i=1 where the weight (mixing parameter) p i corresponds to the prior probability that feature x was generated by component i, and satisfies M i=1 p i = 1. Each component λ i is represented by a Gaussian mixture model λ i = N(p i, µ i, Σ i ) whose probability density can be described as: g i (x) = 1 (2π)d Σ i exp( 1 2 (x µ i) T Σ 1 i (x µ i )) where µ i and Σ i are the mean vector and covariance matrix of Gaussian mixture component i respectively, and d is the dimension of the input feature vector. So the Gaussian mixture is completely specified by the mean vectors, covariance matrices and mixture weights of all components and can be represented by Λ = {λ i = N(p i, µ i, Σ i )} i = 1...M The probability that an observed input vector x belongs to the class λ i = N(p i, µ i, Σ i ) is given, in terms of density, by p(λ i x) = p(x λ i)p(λ i ) g i (x) = p i p(x Λ) M j=1 p (5) jg j (x) For script identification, component M is the number of different scripts. So for bilingual documents and 16 channels of Gabor filter features, we have M = 2 and d = 32. Given N training samples {x 1, x 2,..., x N }, using
8 standard techniques, the initial Gaussian mixture model represented by (p i, µ i, Σ i ) is estimated from the training samples as: p i = 1 N N n=1 p(λ i x n ) = N i N N n=1 µ i = p(λ i x n )x n N n=1 p(λ = 1 N i x (i) k (7) i x n ) N i k=1 N n=1 Σ i = p(λ i x n )(x n µ i )(x n µ i ) T N n=1 p(λ = 1 N i i x n ) N i (x (i) k k=1 (6) µ i)(x (i) k µ i) T (8) In Eqs. 6, 7 and 8, 1 i M and N i is the number of samples which belong to class λ i. Considering the fact that the distributions of script components on different pages are different, the estimated models are refined iteratively via the maximum-likelihood detection. At each iteration, the decision for each observation x (test sample) is: λ 1 > p(λ 1 x) < p(λ 2 x) (9) λ 2 Substituting Eq. 5 into the above equation, then computing the likelihood of both sides, we can get the following maximum likelihood decision rule: (x µ 2 ) T Σ 1 2 (x µ 2) (x µ 1 ) T Σ 1 The procedure to obtain the classifier to identify the two scripts is: λ 1 > < 1 (x µ 1) ln ( Σ 1 ) ln ( Σ 2 ) + ln p 2 ln p 1 (10) λ 2 (1) Estimate the parameters (p i, µ i, Σ i ) of Gaussian mixture models using Eqs. 6, 7 and 8; (2) For each feature vector x, perform the classification based on Eq. 10; (3) Reestimate the parameters (p i, µ i, Σ i ) based on the newly classified features vectors; (4) Go back to step (2) to perform the classification again until the iteration stop condition is satisfied; 4. EXPERIMENTS The proposed approaches were applied to 20 randomly chosen pages of four bilingual dictionaries: Arabic-English, Korean-English, Hindi-English and Chinese-English dictionary. Based on these pages, we did the following two experiments Experiment 1: leave-one-out This experiment is used to test how the individual classifier affects the performance for limited data. For each of the four dictionaries, we partition the 20 pages into 19 training pages and 1 test page. The process is repeated a total of 20 times and the accuracy across all partitions is shown in Table Experiment 2: use-one-training In this experiment, one single page of the 20 pages is selected as the training set. The trained system is applied to all of the other pages and the average accuracy is recorded. Compared with the first experiment, these results show how a smaller (and more realistic) training set affects the performance. The results of this experiment are shown in Table 2.
9 Table 1. Leave-One-Out experimental results (k=3 for k-nn; STD:standard deviation). Scripts Arabic Chinese Korean Hindi Classifiers k-nn SVM GMM k-nn SVM GMM k-nn SVM GMM k-nn SVM GMM Accuracy (%) Average STD Median Minimum Maximum Table 2. Use-One-Training experimental results (k=3 for k-nn; STD:standard deviation) Scripts Arabic Chinese Korean Hindi Classifiers k-nn SVM GMM k-nn SVM GMM k-nn SVM GMM k-nn SVM GMM Accuracy % Average STD Median Minimum Maximum Experimental Result Analysis The results in Tables 1 and 2 show the capability of Gabor filters to capture the features of different scripts and the effectiveness of these three classifiers to identify scripts at the word level. The comparison of results in Table 1 and Table 2 show that large number of training samples (19 pages) can produce better performance than a small number of training samples although the performance difference is often minimal. In order to show the robustness of these classifiers, we can also visualize the average accuracy and standard
10 (a) (b) Figure 8. The means and standard deviations of the three classifiers working on the four dictionaries. deviation of each classifier for each dictionary. From Figure 8(a), we can see that for large number of training samples, all these three classifiers are robust with the maximal standard deviation 5.43%, and the k-nn classifier obtains the best average accuracy while the SVM has the minimal deviation. Figure 8(b) shows that relatively small number of training samples can still provide reasonable results although we must be very careful when selecting the training set. In the last column of Table 2, there is a very low accuracy 15.34% which is highlighted in italic. The reason for such a low accuracy is that the page used for training only had several words, which made the GMM classifier fail. Figure 8(b) also shows that for small training sets, the performance of these three classfiers are almost the same. In the above analysis, we always set the k value of the k-nn classifier to 3. However, the choice of the k value for the k-nn classifier may also affect the performance of this classifier. By choosing different k values (k =1,3,5), we obtained the results of the leave-one-out experiment for all of the four dictionaries, which are shown in Figure 9. The results in this figure can explain that: for this script identification case, 3 -NN classifier can often get the best result, while the trend of results for different k values are consistent. Figure 9. Experimental result comparison of k-nn classifier with k=1,3,5. The results are sorted based on the values of the 3 -NN classifier.
11 (a) (b) (c) Figure 10. Word segmentation for different scripts and image quality. (a)over-segmentation of Arabic words; (b)oversegmentation of Chinese words caused by low image quality; (b)perfect word segmentation of Chinese words Factors in the Preprocessing Phase that Affect the Performance We are trying to provide a general script identification approach. The identification is a sequential process which includes three main phases: document image preprocessing, word segmentation and script identification. By manually examining the results, we found that the following factors in the preprocessing phase could affect the identification results: Word segmentation and font face: Since the script identification is at the word level, word segmentation results heavily affect the script identification performance. By browsing the results of the Arabic-English dictionary, we noticed that many of the incorrectly identified Arabic words are over-segmented, which was caused by the nature of Arabic language and the layout of this dictionary, some of the bounding boxes of text lines overlap (Figure 10(a)). Many of the incorrect identifications occurred when an italic Roman script word was identified as Arabic. This may be caused by the fact that they have similar texture features. Word segmentation and image quality: By checking the identification result of Chinese/Roman with low performance, we found that the incorrect identification was also caused by over-segmentation of Chinese words (Figure 10(b)). In addition to the over-segmentation of words, another factor that contributed to incorrect identification is the low image quality. Comparing the second and third images in Figure 10, we can see that the image in Figure 10(b) has lower quality and the word segmentation is not as good (this page has the lowest accuracy). The image in Figure 10(c) has higher quality and the word segmentation is perfect (this page has the highest identification accuracy). Single-character word: In all of the four dictionaries, another challenge is single-character Roman words. Although the identification is performed at word level, by training, we still can obtain the global texture representation of the Roman script. While for the Roman script, after word image replication and scaling, single characters may not have similar texture, leading to incorrect identification. 5. CONCLUSION In this paper, we have compared the performance of three classifiers applied to script identification at the word level. All of the three classifies are based on the Gabor filter features. Experiments were carried on Arabic-, Chinese-, Hindi- and Korean-English bilingual dictionaries, and the results show the effectiveness of the classifiers. Compared to our previous work, 9 all of these three classifiers (k-nn, SVM and GMM) can significantly improve the classification performance. Since our classification is at the word level, one primary factor that may affect the accuracy is word segmentation, which may be caused by scanning noise, text line spacing, word spacing, size and so on. We strongly believe that the results could be significantly improved by addressing the word segmentation problems. 6. ACKNOWLEDGMENT The support of this research under DARPA cooperative agreement N , National Science Foundation grant EIA and DOD contract MDA90402C0406 is gratefully acknowledged.
12 REFERENCES 1. A. L. Spitz, Determination of the script and language content of document images, IEEE Trans. Pattern Analysis and Machine Intelligence 19(3), pp , D. Doermann, H. Ma, B. Karagol-Ayan, and D. W. Oard, Lexicon acquisition from bilingual dictionaries, in SPIE Conference Document Recognition and Retrieval, pp , (San Jose, CA), H. Ma and D. Doermann, Bootstrapping structured page segmentation, in SPIE Conference Document Recognition and Retrieval, pp , (Santa Clara, CA), J. Hochberg, P. Kelly, T. Thomas, and L. Kerns, Automatic script identification from document images using cluster-based templates, IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), pp , P. Sibun and A. L. Spitz, Language determination: Natural language processing from scanned document images, in Proc. 4th Conference on Applied Natural Language Processing, pp , (Stuttgart), B. Waked, S. Bergler, and C. Y. Suen, Skew detection, page segmentation, and script classification of printed document images, in IEEE International Conference on Systems, Man, and Cybernetics (SMC 98), pp , (San Diego, CA), Y. Zhu, T. Tan, and Y. Wang, Font recognition based on global texture analysis, IEEE Trans. Pattern Analysis and Machine Intelligence 23(10), pp , J. G. Daugman, Complete discrete 2d gabor transforms by neural networks for image analysis and compression, IEEE Trans. Acoustics, Speech and Signal Processing 36, pp , H. Ma and D. Doermann, Gabor filter based multi-class classifier for scanned document images, in 7th International Conference on Document Analysis and Recognition, pp , (Edinburgh, Scotland), D. J. Ittner and H. S. Baird, Language-free layout analysis, in IAPR 2nd Int l Conf. on Document Analysis and Recognition, pp , (Tsukuba SCience City, Japan), D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, Comparing images using the hausdorff distance, IEEE Trans. on Pattern Analysis and Machine Intelligence 15(9), pp , L. O Gorman, The document spectrum for page layout analysis, IEEE Trans. Pattern Analysis and Machine Intelligence 15(11), pp , T. M. Cover and P. E. Hart, Nearest neighbor pattern classification, IEEE Trans. Information Theory IT-13(1), pp , C. Cortes and V. Vapnik, Support-vector networks, Machine Learning 20, pp , V. Blanz, B. Scholkopf, H. Bulthoff, C. Burges, V. Vapnik, and T. Vetter, Comparison of view-based object recognition algorithms using realistic 3d models, in International Conference on Artificial Neural Networks, pp , (Berlin), M. Schmidt, Identifying speaker with support vector networks, in Interface 96 Proceedings, (Sydney), E. Osuna, R. Freund, and F. Girosi, Training support vector machines: an application to face detection, in 1997 Conference on Computer Vision and Pattern Recognition, pp , (San Juan, Puerto Rico), C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2(2), pp , T. Joachims, Advances in Kernel Methods-Support Vector Learning, ch. Making Large-Scale SVM Learning Practical, pp MIT-Press, 1999.
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
More informationScript and Language Identification for Handwritten Document Images. Judith Hochberg Kevin Bowers * Michael Cannon Patrick Kelly
Script and Language Identification for Handwritten Document Images Judith Hochberg Kevin Bowers * Michael Cannon Patrick Kelly Computer Research and Applications Group (CIC-3) Mail Stop B265 Los Alamos
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationHigh-Performance Signature Recognition Method using SVM
High-Performance Signature Recognition Method using SVM Saeid Fazli Research Institute of Modern Biological Techniques University of Zanjan Shima Pouyan Electrical Engineering Department University of
More informationDetermining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationClassification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationVideo OCR for Sport Video Annotation and Retrieval
Video OCR for Sport Video Annotation and Retrieval Datong Chen, Kim Shearer and Hervé Bourlard, Fellow, IEEE Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 CH-190 Martigny
More informationVision based Vehicle Tracking using a high angle camera
Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu gramos@clemson.edu dshu@clemson.edu Abstract A vehicle tracking and grouping algorithm is presented in this work
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationSOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp
SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS Nitin Khanna and Edward J. Delp Video and Image Processing Laboratory School of Electrical and Computer Engineering Purdue University West Lafayette,
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
More informationjorge s. marques image processing
image processing images images: what are they? what is shown in this image? What is this? what is an image images describe the evolution of physical variables (intensity, color, reflectance, condutivity)
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationRecognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature
3rd International Conference on Multimedia Technology ICMT 2013) Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature Qian You, Xichang Wang, Huaying Zhang, Zhen Sun
More informationStatic Environment Recognition Using Omni-camera from a Moving Vehicle
Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationDocument Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationPalmprint Recognition. By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap
Palmprint Recognition By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap Palm print Palm Patterns are utilized in many applications: 1. To correlate palm patterns with medical disorders, e.g. genetic
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationMulti-modal Human-Computer Interaction. Attila Fazekas. Attila.Fazekas@inf.unideb.hu
Multi-modal Human-Computer Interaction Attila Fazekas Attila.Fazekas@inf.unideb.hu Szeged, 04 July 2006 Debrecen Big Church Multi-modal Human-Computer Interaction - 2 University of Debrecen Main Building
More informationTracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object
More informationResearch on Chinese financial invoice recognition technology
Pattern Recognition Letters 24 (2003) 489 497 www.elsevier.com/locate/patrec Research on Chinese financial invoice recognition technology Delie Ming a,b, *, Jian Liu b, Jinwen Tian b a State Key Laboratory
More informationLow-resolution Character Recognition by Video-based Super-resolution
2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro
More informationAn Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network
Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationCharacter Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationTemplate-based Eye and Mouth Detection for 3D Video Conferencing
Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer
More informationGalaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationFactoring Patterns in the Gaussian Plane
Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationEarly defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationHow To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationSignature Segmentation from Machine Printed Documents using Conditional Random Field
2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition
More informationDenial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Denial of Service Attack Detection Using Multivariate Correlation Information and
More informationA fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationVisualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi
More informationVisual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
More informationComparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationOnline Farsi Handwritten Character Recognition Using Hidden Markov Model
Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,
More informationVisualization of Large Font Databases
Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden Martin.Solli@itn.liu.se, Reiner.Lenz@itn.liu.se
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
More informationAutomatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior
Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Kenji Yamashiro, Daisuke Deguchi, Tomokazu Takahashi,2, Ichiro Ide, Hiroshi Murase, Kazunori Higuchi 3,
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More information