Results of the MUSCLE CIS Coin Competition 2006 Michael Nölle 1 and Michael Rubik 2 1 Video and Safety Technology, 2 High Performance Image Processing ARC Seibersdorf research GmbH, A-2444 Seibersdorf e-mail: michael.noelle@arcs.ac.at Allan Hanbury PRIP, Institute of Computer-Aided Automation Vienna University of Technology Favoritenstraße 9/1832, A-1040 Vienna, Austria e-mail: hanbury@prip.tuwien.ac.at Abstract We present the history and structure of the MUSCLE Coin Images Seibersdorf (CIS) benchmark, as well as the results of the CIS coin recognition competition 2006. 1 Introduction It is important to evaluate computer vision algorithms objectively on data for which the ground truth is known. Various evaluation campaigns exist, especially in the domains of object recognition (PASCAL Visual Object Classes Challenge [2]) and image retrieval (ImageCLEF [7] and ImagEVAL [3]). We present in this paper the results of the Coin Images Seibersdorf (CIS) Coin Recognition Competition 2006. This competion used a large database of images of coins (60 000 training and test images, 20 000 competition images) with a large number of classes (over 600 coin classes with over 2000 face classes). The two objectives of the campaign were to: use a dataset of images in a narrower image domain [8], allowing a completely objective creation of the ground truth. create a classification task with a large number of classes. This work was supported by the European Union Network of Excellence MUSCLE (FP6-507752).
The evaluation campaigns listed above use a set of images belonging to a wide domain [8], a set of photographs of arbitrary scenes. For such a wide domain, it is usually difficult to create a completely objective ground truth. In image retrieval, for example, different people may consider different images to be relevant for a specific query image. For object recognition, some objects may be missed in an image. Through the use in this benchmark of a narrower image domain a set of images where each image contains only one well lit coin face it is possible to create ground truth with minimal uncertainty. Object recognition campaigns generally use a low number of classes (e.g. 10 categories in the PASCAL Visual Object Classes Challenge 2006). These problems can be solved by using a collection of two-class classifiers, such as support vector machines. This approach is less tractable with the 692 classes in the CIS coin benchmark. The paper is organized as follows. Sections 2 and 3 describe the history and structure of the benchmark and competition. The competition results are presented in Section 4. Section 5 concludes. 2 History The changeover from 12 European currencies to the Euro in 2001 created a unique situation. Great volumes of money had to be physically returned to the national banks of the member states. In Austria alone the charitable donations amounted to several hundred tons of cash. Unfortunately, the coins could only be collected as a potpourri of currencies. The sheer volume of material called for an automatic coin recognition and sorting system which was designed and built at ARC Seibersdorf research GmbH. The system, called Dagobert [6, 4], successfully sorted the coins between the middle of 2003 and the end of 2004, thereby returning their face value for the charity organizations. In order to built Dagobert and to perform the recognition task a training data set had to be collected. The training data together with several tens of thousands images of test coins were published in 2005 and constitutes the MUSCLE Coin Image Seibersdorf, short CIS, Benchmark [5], which is available on the MUSCLE Benchmark site [1]. 3 Structure of the Benchmark and Competition The structure of the benchmark and the structure and organisation of the competition based on the benchmark are presented in this section. 3.1 Benchmark The benchmark consists of 30 000 classified and validated test coins, corresponding to 60 000 images, as the front and back of each coin is imaged. These are divided into 692 coin classes with 2 270 different coin face classes. There are more coin face classes than coin classes as some coins changed their appearance over time, when new coin series were issued or when coins with a design to mark a special occasion were minted. These changes could be a different image or different text printed on the coin. Small differences such as minor changes in design are
encoded as coin face sub-classes in the data. Additional information consisting of the thickness and diameter of each coin is also provided in a text file and encoded into the first row of each image. These measurements were made by light sensors and are of limited accuracy. 3.2 Competition The benchmark was used as training and test data, and was available for download from six months before the program submission deadline. Executable programs were submitted and run on unseen competition data. The competition data consisted of 10 000 coins (20 000 images). 362 of the 692 coin types in the training data appeared in the competition set. For every known coin face, there are up to 30 example images in the training data. A coin face in the competition data is taken as known if it corresponds to one of the coin faces present in the benchmark set. 242 of the coins in the competition set are not in the benchmark set. These were to be classified as unknown. The thickness and diameter measurements were also available for the competition data. The coin face sub-classes were not to be taken into account for the competition. It was required that the submitted coin classification programs meet the following criteria: At least 70% of the coins in the competition dataset must be correctly classified. This requirement is not too difficult to meet: the coin which appears the most often (10 Groschen) makes up 30% of the set, the 10 most often occurring coins make up more than 50% of the set. The 40 most often occurring coins make up more than 70% of the set. A program must not take more than 8 hours to process a run of 5000 coins. The final score on the competition dataset was calculated as follows: 1 point for every correctly classified coin, cc (also for unknown coins which are classified as unknown) 0 points for every known coin which is not classified, cu (classified as unknown) 100 points for every wrongly classified coin, cw (a known coin which is wrongly classified or an unknown coin which is classified into one of the known classes). This is due to the specification of the problem: wrongly classified and hence wrongly sorted coins result in an excessive manual workload to correct the mis-classifications. It is therefore recommended that in case of doubt to classify a coin as unknown. 25 points for every coin type in the training set which is correctly classified at least once, co. This ensures that an algorithm that can only recognise a smaller number of coin types does not do so well. This gives a formula to calculate the final score: S = cc 1 + cu 0 + co 25 cw 100. (1)
cc cu co cw S (Eq. 1) CIS-2006 10 000 362 19 050 Maastricht 6 731 3 048 278 221 8 419 Freiburg 9 724 276 339 0 18 199 Table 1: Results of the MUSCLE CIS coin competition Benchmark 2006. cc correctly classified coin, cu known coins classified as unknown, co number of coin classes classified at least once, cw coins incorrectly classified. The top row gives the maximum number of points available in the competition. 4 Results A summary of the competition entries as well as how well they performed is presented here. More details on the algorithms are available in the further papers in these proceedings. 4.1 Participants Seven institutes registered during the initial registration period, which ran until two months before the program submission deadline. Three institutes submitted programs, of which two functioned correctly on the test data. These functioning programs were submitted by: Marco Reisert, Olaf Ronneberger and Hans Burkhardt of the Albert-Ludwig University in Freiburg, Germany and L. J. P. van der Maaten and P. J. Boon of Maastricht University, the Netherlands. 4.2 Evaluation The CIS competition dataset 2006 contains 10 000 coins from 362 classes. 242 coins are unknown with respect to the training data set. Table 1 gives an overview of the results of the competition. Both programs met most of the criteria for the benchmark, i.e. a running program which performed the task within the time limit given. The Maastricht program failed by a small margin to classify at least 70% of the coins correctly. The recognition results from Freiburg are quite outstanding and the group clearly wins the competition. In particular there are no wrong classifications, which is very important in real life applications. The competition data 2006 as well as the ground truth and the detailed results are published on the MUSCLE Benchmark site. 5 Conclusion and future activities Now that we have a set of features for which it is known that good classification is possible, we would like to open these to the machine learning community as a test bed on which to develop algorithms which can handle a large number of classes. At the moment the images are classified by coin type. It would also be possible, for example, to relabel the coin images based on the designs printed on the coins. This would create classes
of faces, crowns, crosses, etc. A second coin recognition competition is planned for 2007. Here we will concentrate on classification in the presence of occlusion. The call for participation as well as the detailed specification of the MUSCLE CIS Coin Competition 2007 will be made available through the MUSCLE Benchmark site [1]. References [1] Muscle benchmark site. http://muscle.prip.tuwien.ac.at/index.php. 2, 5 [2] M. Everingham, A. Zisserman, C. Williams, L. Van Gool, M. Allan, C. Bishop, O. Chapelle, N. Dalal, T. Deselaers, G. Dorko, S. Duffner, J. Eichhorn, J. Farquhar, M. Fritz, C. Garcia, T. Griffiths, F. Jurie, D. Keysers, M. Koskela, J. Laaksonen, D. Larlus, B. Leibe, H. Meng, H. Ney, B. Schiele, C. Schmid, E. Seemann, J. Shawe-Taylor, A. Storkey, S. Szedmak, B. Triggs, I. Ulusoy, V. Viitaniemi, and J. Zhang. The 2005 PASCAL visual object classes challenge. In Selected Proceedings of the First PASCAL Challenges Workshop. Springer- Verlag, 2006 (in press). 1 [3] Christian Fluhr, Pierre-Alain Moëllic, and Patrick Hède. ImagEVAL: Usage-oriented multimedia information retrieval evaluation. In Proceedings of the second MUSCLE/ImageCLEF Workshop on Image and Video Retrieval Evaluation, pages 3 8, Alicante, Spain, September 2006. 1 [4] Reinhold Huber, Herbert Ramoser, Konrad Mayer, Harald Penz, and Michael Rubik. Classification of coins using an eigenspace approach. Pattern Recogn. Lett., 26(1):61 75, 2005. 2 [5] Michael Nölle and Allan Hanbury. MUSCLE Coin Images Seibersdorf (CIS) Benchmark Competition 2006. IAPR Newsletter, 28(2):18 19, April 2006. 2 [6] Michael Nölle, Harald Penz, Michael Rubik, Konrad Mayer, Igor Hollnder, and Reinhard Granec. Dagobert a new coin recognition and sorting system. In Proceedings of the 7th International Conference on Digital Image Computing - Techniques and Applications (DICTA 03). 2 [7] C. Peters, P. Clough, J. Gonzalo, G. J. F. Jones, M. Kluck, and B. Magnini, editors. Multilingual Information Access for Text, Speech and Images, volume 3491 of LNCS. Springer, 2004. 1 [8] Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349 1380, December 2000. 1, 2