Visualization of Large Font Databases

Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden Martin.Solli@itn.liu.se, Reiner.Lenz@itn.liu.se Abstract: We describe a novel system for interaction with large font databases. The system is an efficient tool for browsing in large font databases and as such it can be used by people in the Graphic Arts industry. The proposed approach is based on shape descriptors developed for visual characterization of character images rendered from different fonts. Here the descriptors are used in a visualization of a large font database containing more than 2700 fonts. By applying geometry preserving linear- and non-linear manifold learning methods, in combination with a refinement process, character images of different fonts are organized on a two-dimensional grid, where fonts are positioned based on visual similarity. Keywords: font selection, visualization, dimensionality reduction, image analysis 1. Introduction Selecting an appropriate font for a print job, or some other Graphic Arts application, can be a difficult and time consuming process. With the number of available fonts increasing, the task will become even more demanding. Consequently, we need to find new ways of finding and selecting fonts in large font databases. One solution is to visualize the entire font collection in a transparent way, where the user easily can locate fonts with a desired appearance. Such visual overview of all fonts in the database might be helpful for various persons who are employed in the Graphic Arts industry, both those who are managing and selecting fonts in their daily work, and temporary users. The visualization is a challenging task, especially if the size of the database is large. We describe how to use the findings from the font recognition study presented in Solli and Lenz (2007), in the design of a visualization tool for large font databases. The basic idea with the method is to order all images of a given character in the different fonts in a transparent way, giving the user an opportunity to effectively interact with the database. As starting point we use the high-dimensional vectors (41 dimensions) obtained from Solli and Lenz (2007). Then linear and non-linear mappings are used to project the 41-D space of all images of a given letter to a twodimensional manifold. Using a two-dimensional representation is rather natural since we want to use a conventional monitor as display device. Finally we describe a refinement process that maps these raw two-dimensional representations to a regular two-dimensional grid to avoid overlapping images and provide a better overview for the user. The performance is illustrated with a large database consisting of 2755 fonts for the English alphabet. Font selection in general seems to be forgotten by the research community. With this work we intend to initiate a discussion on how to simplify font selection. The proposed method should be interpreted as a pilot study and inspiration for future research. As a complementary tool to the earlier developed font recognition system, the visualization system will be implemented on our publicly available website: http://media-vibrance.itn.liu.se/fyfont The paper is organized as follows. In Section 2 we give a brief overview of some previous attempts in font recognition and image database visualization. In Section 3 we describe the database used, the shape descriptors, and the method for calculating similarity between different fonts. Section 4 contains a description of dimensionality reduction methods used to obtain a two-dimensional representation of the characters, and Section 5 describes how coordinates are mapped to a two-dimensional grid. The results obtained from the experiments are illustrated in Section 6. We discuss the results and give some conclusions in Section 7. 2. Related work Research in font recognition used mainly two strategies: a local approach based on features for individual characters or words, and a global approach using features for blocks of text. An example of the local approach describing font clustering and cluster identification in document images can be found in Östürk et al. (2001). Four different methods (bitmaps, DCT coefficients, eigencharacters, and Fourier descriptors) are evaluated on 65 fonts. All tested methods resulted in adequate clustering performance, 1

but the eigenfeatures result provided the most compact representation. Another contribution to font recognition is Morris (1992), considering classification of typefaces using spectral signatures. A classifier capable of recognizing 100 typefaces is described by Baird and Nagy (1994), resulting in significant improvements in OCR-systems. In Cooperman (1997) font attributes such as "serifness" and "boldness" where estimated in an OCR-system. In a global approach by Zhu et al. (2001), text blocks are considered as images containing specific textures, and Gabor filters are used for texture recognition. The authors conclude that their method is able to identify more global font attributes, such as weight and slope, but less appropriate to distinguish finer typographical attributes. Similar approaches with Gabor filters can be found in Ha et al. (2005) and Yang et al. (2002). Also Avilés-Cruz et al. (2005) describes an approach based on global texture analysis. Features are extracted using third and fourth order moments. To our knowledge, only one commercial search engine is available for font recognition or selection: WhatTheFont. The engine is operated by MyFonts.com. An early description of the method can be found in Sexton et al. (2000). Fonts are identified by comparing features obtained from a hierarchical abstraction of binary character images at different resolutions. We are not aware of newer, publicly available, descriptions of improvements that are probably included in the commercial system. A competitor is TypeDNA (http://www.typedna.com). They are developing tools that can be used for recognising and finding fonts, and also for visualising a collection of fonts. However, their products have not yet reached a public release. As mentioned in the previous section, the font selection task in general received little attention from the research community. Here we mention a few examples regarding visualization of image database content, not particularly focusing on font databases. Nguyen and Worring (2004) present a method for similarity based visualization based on three criteria: 1) The displayed images should show a general overview of the whole dataset, 2) the original structure of the data should be preserved, and 3) reduce image overlap as much as possible. They use the SNE (Stochastic Neighbour Embedding) and the ISOMAP method for preserving neighbour identities. Moghaddam et al. (2001) visualize retrieved images on a 2-D screen not only in order of their decreasing similarities, but also according to their mutual similarities. Images are converted to feature vectors based on color moments, wavelet based textures, and water-filling edge features. They use PCA for reducing the dimensionality of the feature space to only two dimensions (using the two eigenvectors corresponding to the two largest eigenvalues), and then, to minimize the overlap between images, a constrained nonlinear optimization approach is used. For non-conventional browsing in image databases we mention Torres et al. (2003), where images are placed on concentric rings or spirals. The motivation for using such shapes is that they make it easier to keep user focus on both the query image, and the most similar retrieved images. The query image is placed in the centre of the spiral or rings, and retrieved images are placed on surrounding rings depending on similarity (images with high similarity close to the centre). In Heesch and Rüger (2004) three different interfaces are presented and evaluated. First an interface where images are displayed in the form of a spiral, with image distances from the centre being proportional to the distance to the query image in feature space. The second interface uses basically the same approach, but images in the periphery are displayed at a smaller size. By placing images from the centre and outwards, the method avoids overlapping images. In the final approach a pre-selection task is utilized where the user is shown a set of nodes (images) that are representative for different collections of images. Clicking on one of the nodes recovers the set of nearest neighbours from the database, which are then displayed such that their distances to the centre are proportional to their dissimilarity to the selected node. Within a query the user can easily move between interfaces for a better understanding of the search result. 3. Font recognition This section contains a brief summary of the font recognition method presented in Solli and Lenz (2007). The findings are used as input to the dimensionality reduction procedure described in the next section. We will use a database containing 2755 different fonts, where every character in a font is represented by an image. A few examples of character 'a' can be seen in Figure 1. We choose to represent each image by a coordinate vector based on principal component analysis, an approach frequently used in other applications. This method was popularized by Turk and Pentland in (1991) where they applied it on face images, and called it the eigenface approach. In our application we will call it the eigenfont representation and use it as follows. All characters in the database are in a given orientation and are given by gray value 2

images. They are however of different sizes and in a first step all images of a given character are therefore resized to a fixed image size. Figure 1. Examples of character a. Images of characters from different fonts are quite similar in general; therefore images can be described in a lower dimensional subspace. The principal component analysis (or Karhunen-Loéve expansion) reduces the number of dimensions, leaving only the features that are most important for font classification. Eigenvectors and eigenvalues are computed from the covariance matrix of each character in the original database. The eigenvectors corresponding to the K highest eigenvalues describe a K- dimensional subspace on which the original images are projected. The coordinates in this lowdimensional space are stored as the new descriptors. The first five eigenimages for character 'a' can be seen in Figure 2. Figure 2. First five eigenimages for character a (normalized gray values). The shape of a character can obviously be described by its contour. We use this observation and filter the character images with different edge filters before calculating eigenimages. We applied several filters and found that a combination of a horizontal and two diagonal filters resulted in the best recognition results. Vertical filtering did not improve the performance, probably because many characters contain almost the same vertical lines. Summarizing the results of these experiments we decided to represent characters by vectors in a 41-dimensional feature space, where the distance between two points describes the similarity between those fonts. Table 1 provides an overview of the final combination of different parameters and settings included in the recognition method. Details can be found in Solli and Lenz (2007). Table 1. An overview of the final combination of different parameters and settings included in the recognition method. Image scaling: Square images, size 24x24 pixels, for "square characters" like a, e, and o, and rectangular images, for instance 24x20 pixels, for "rectangular characters" like l, i, and t. Characters are aligned and scaled to fill the whole image. We use bilinear interpolation for scaling. Edge filtering: Three Sobel filters, one horizontal and two diagonal Number of eigenimages: 40 Extra feature: Ratio between character height and width before scaling 4. Dimensionality reduction After the process of feature extraction described in the previous section, every character in the font database is represented by a 41-dimensional feature vector, which will be mapped to a two-dimensional surface. To be meaningful this mapping must preserve the intrinsic geometrical structure (visual similarity) of the original representation space. In the following we will illustrate two examples of such mappings, one linear, and the other non-linear. The linear mapping is simply a selection of the second and third component in the PCA coordinate space (the first coordinate has very little visual information related to font appearance). In many applications linear mappings are too restrictive since they cannot take into account that the actual data vectors may be located on a lower dimensional manifold embedded into a higher dimensional space. In the simplest example of such a situation the data vectors are all located on a two-dimensional circle. Traditional PCA will always indicate that the data is twodimensional and will need two principal components to represent the data. These two-dimensions are, however, only a consequence of the coordinate system used to describe the data. Selecting a polar coordinate system with origin at the centre of the circle will reveal that the radial value for all data vectors is constant and that the data can be represented by the angle without information loss. In this simple 3

example it is clear how to choose the coordinate system in which the lower-dimensionality of the data becomes obvious. In the general case it is however very difficult to estimate the intrinsic dimensionality of the data and to construct non-linear mappings to achieve this reduction. This problem is known as manifold learning in the machine learning community (see http://www.math.umn.edu/~wittman/mani/ for an overview of different methods). In our experiments we mainly worked with the ISOMAP algorithm described in Tenenbaum et al. (2000). The ISOMAP (isometric feature mapping) estimates the true distance between two data points by first linearly fitting neighbouring points. Then these local approximations are patched together and the shortest paths connecting any two points is computed. In the final step this table of approximations of the geodesic distances between point pairs is used to embed the data points into a low-dimensional subspace such that the intrinsic geometry is preserved as much as possible. Many powerful tools for both linear and non-linear dimensionality reduction have been presented in the past. Well-known examples are SOM, or Self-Organizing Maps, also known as the Kohonen Maps (see http://www.cis.hut.fi/projects/somtoolbox/), Locally Linear Embedding (LLE) by Saul and Rowels (2004), and Laplacian Eigenmaps by Belkin and Niyogi (2002). Preliminary experiments with Self- Organizing Maps gave no improvements compared to the PCA or ISOMAP approach. However, the performance of the SOM algorithm seems to improve if rather few of the dimensions included in the original dataset (totally 41 dimensions) are used as input. Possibly, the explanation is related to the conclusion made by De Backer et al. (1998), that SOM performs better for very low number of dimensions. The method might be more suitable in a classification task, where the obtained code vectors (also known as model vectors or prototype vectors) provide the classes. In general, the use of different dimensionality reduction methods should be investigated further in future research. 5. Grid representation The dimensionality reduction technique (both PCA and ISOMAP) results in a visualization in which large parts of the space are relatively empty while others are crowded. For many purposes a more ordered presentation will provide a better overview of the font database. Our solution is to evenly distribute images on a square grid, while at the same time trying to maintain mutual relations. The following algorithm was used to compute such an ordered display. Let M be the matrix containing the font indices: k=m(r,c) means that font number k is presented at position (r,c) in the display. We use an almost square-formed display leading to a matrix of size 52x53 for the 2755 fonts. The matrix P contains the coordinates of the characters as computed by PCA or ISOMAP: the vector (P(i,1),P(i,2)) contains the coordinates for character images i in the two-dimensional space obtained by the PCA or ISOMAP transformation. The coordinates in P are first scaled to fit the size of M. Each element in M will be assigned an image index i corresponding to the nearest neighbour in P. The algorithm used is as follows: Starting with the top row, for every position (r,c) we do the following: I. Find the index i giving the minimum Euclidean distance between M(r,c) and P(i), where i=1,,2755. II. Move the image at coordinates P(i) to M(r,c). III. Mark the index P(i), so it can not be relocated. IV. If we have not reached the end of the row increase the column index by one and go to step I, if we have reached the end of the row, reset the column index to 1 and go to the next row. 6. Results In Figure 3 and Figure 4 we show all the font images for character 'a' (for 2755 fonts) at coordinates obtained from the PCA respectively ISOMAP representation. It can be seen that the ISOMAP reduction seems to better divide different types of fonts into different clusters, and fewer images are overlapping. In Figure 5 and Figure 6 PCA and ISOMAP results for character 'a' are converted to grid representations, as described in Section 5. We also tested if the ordered representations could be improved by further optimizations. For this purpose we selected two points in an ordered representation described above. If we could improve the presentation by swapping these two points then the accumulated distances between these two points and their 4

neighbours would decrease after the swapping compared to the same accumulated distance before the swapping. We created a large number of random pairs for swapping and systematically tested swapping points located near each other. In none of the cases we could observe a reduction of the distances after swapping. This indicates an optimal local solution of the selected ordering method. An optimal global solution, which can be searched for in upcoming research, would require for instance a more exhaustive swapping method where accumulated distances are calculated over the entire grid. Figure 3. A visualization of all characters 'a' in the database based on the second and third PCA coefficients. 5

Figure 4. A visualization of all characters 'a' in the database based on ISOMAP manifold learning. 6

Figure 5. A grid representation of all characters 'a' in the database based on the PCA approach. 7

Figure 6. A grid representation of all characters 'a' in the database based on the ISOMAP representation. 7. Discussion and conclusions A novel system for interaction with large font databases has been described. Shape descriptors obtained from related research have been applied in the visualization of a large font database containing 2755 fonts. It was shown that geometry preserving linear- and non-linear manifold learning methods can be used for mapping the structure of the high-dimensional feature space to a two-dimensional manifold. To avoid overlapping font images and provide a better overview for the user, a refinement process is utilized, where the raw two-dimensional representation is mapped to a regular 2-D grid. A visual evaluation of the image grid obtained from the ISOMAP reduction, shown in Figure 6, reveals that different font styles are clearly separated. In the upper left corner mainly bold characters are located. The upper right corner is dominated by the regular style. Similar regions can be found for italic, "small caps", etc. We believe the proposed method can be implemented in an interface used by, for instance, people in the Graphic Arts industry, as an efficient tool for browsing in large font databases. The user can relatively quick find a group of interesting fonts, and study these in more detail. In the final application, the idea is that the user can zoom in on parts of the space for studying a few fonts more closely, and then obtain names, and possibly more characters, for specific fonts. An example is illustrated in Figure 7. 8

Figure 7. An illustration of the intended usage of the visualization method. The user can zoom in on a selected area of the grid, and by clicking on a font image, the user will obtain the name of the font together with a few more characters. A topic for further research is to extend the method to utilize more than one character in the visualization. Moreover, the proposed method should be evaluated in user tests to clarify if people find the system useful, and how they interact with it. Another topic do be dealt with in future work is the method for dimensionality reduction. There are many tools for both linear and non-linear dimensionality reduction and our preliminary experiments with Self-Organizing Maps reveal no improvements compared to the PCA or ISOMAP approach. Nevertheless, the use of different dimensionality reduction tools should be investigated further. One can also think of using other methods as a pre-classifier, and then for instance apply ISOMAP on individual classes to obtain a final visualization grid. 8. Literature Avilés-Cruz, C., Rangel-Kuoppa, R., Reyes-Ayala, M., Andrade-Gonzalez, A., and Escarela-Perez R., (2005), High-order statistical texture analysis - font recognition applied, Pattern Recognition Letters, 26(2), 135-145 Baird, H.S., and Nagy, G., (1994), Self-correcting 100-font classifier, in Proceedings of SPIE, 2181, 106-115 Belkin, M., and Niyogi, P., (2002), Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems 14, 585-591 Cooperman, R., (1997), Producing good font attribute determination using error-prone information, in SPIE, 3027, 50-57 De Backer, S., Naud, A., and Scheunders, P., Non-linear dimensionality reduction techniques for unsupervised feature extraction, Pattern Recognition Letters 19, 8, 711-720 Ha, M.H., Tian, X.D., and Zhang, Z.R., (2005), Optical font recognition based on Gabor filter, in Proc. 4th Int. Conf. on Machine Learning and Cybernetics, 8, 4864-4869 Heesch, D., and Rüger, S., (2004), Three interfaces for content-based access to image collections, Lect. Notes Comput. Sci., 3115 LNCS, 2067 Nguyen, G., and Worring, M., (2004), Optimizing similarity based visualization in content based image retrieval, in 2004 IEEE International Conference on Multimedia and Expo, ICME04, 2, 759-762 Moghaddam, B., Tian, Q., and T.S., Huang., (2001), Spatial visualization for content-based image retrieval, in 2001 IEEE International Conference on Multimedia and Expo, ICME01, 22-25 9

Morris, R.A., (1992), Classification of digital typefaces using spectral signatures, Pattern Recognition, 25(8), 869-876 Saul, L., and Rowels, S., (2004), Think globally, fit locally: Unsupervised learning of low dimensional manifolds, J. Mach. Learn. Res., 4, 119-155 Sexton, A., Todman, A., and Woodward, K., (2000), Font recognition using shape-based quad-tree and kd-tree decomposition, in Proceedings of the Fifth Joint Conference on Information Sciences, JCIS 2000, 2, 212-215 Solli, M. and Lenz, R., (2007), FyFont: Find-your-font in large font databases, 15th Scandinavian Conference on Image Analysis, SCIA 2007, Lect. Notes Comput. Sci., 4522 LNCS, 432-441 Tenenbaum, J., De Silva, V. and Langford, J., (2000), A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319-2323 Torres, R., Silva, C., Medeiros, C., and Rocha, H., (2003), Visual structures for image browsing, in Proceedings of the 12th ACM International Conference on Information and Knowledge Management, CIKM 2003, 49-55 Turk M., and Pentland. A., (1991), Eigenfaces for recognition, Journal of Cognitive Neuroscience, 3(1), 71-86 Yang, F., Tian X.D., and Guo B.L., (2002), An improved font recognition method based on texture analysis, in Proc. 1st Int. Conf. on Machine Learning and Cybernetics, 4, 1726-1729 Zhu, Y., Tan, T.N., and Wang, Y.H., (2001), Font recognition based on global texture analysis, IEEE T-PAMI, 23(10), 1192-1200 Östürk, S., Sankur, B., and Abak, A.T., (2001), Font clustering and cluster identification in document images, Journal of Electronic Imaging, 10(2), 418-430 10