Prediction and Validation of Indexing Performance for Biometrics

Similar documents
Image restoration for a rectangular poor-pixels detector

Online Bagging and Boosting

Applying Multiple Neural Networks on Large Scale Data

Use of extrapolation to forecast the working capital in the mechanical engineering companies

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

Machine Learning Applications in Grid Computing

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Searching strategy for multi-target discovery in wireless networks

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Software Quality Characteristics Tested For Mobile Application Development

Construction Economics & Finance. Module 3 Lecture-1

Quality evaluation of the model-based forecasts of implied volatility index

arxiv: v1 [math.pr] 9 May 2008

Managing Complex Network Operation with Predictive Analytics

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

SAMPLING METHODS LEARNING OBJECTIVES

Data Set Generation for Rectangular Placement Problems

COMBINING CRASH RECORDER AND PAIRED COMPARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMPACTS WITH SPECIAL REFERENCE TO NECK INJURIES

An Innovate Dynamic Load Balancing Algorithm Based on Task

Implementation of Active Queue Management in a Combined Input and Output Queued Switch

Fuzzy Sets in HR Management

An Approach to Combating Free-riding in Peer-to-Peer Networks

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

A quantum secret ballot. Abstract

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes

Data Streaming Algorithms for Estimating Entropy of Network Traffic

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

The Velocities of Gas Molecules

Pure Bending Determination of Stress-Strain Curves for an Aluminum Alloy

Preference-based Search and Multi-criteria Optimization

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET

ADJUSTING FOR QUALITY CHANGE

Pricing Asian Options using Monte Carlo Methods

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

Dynamic Placement for Clustered Web Applications

An improved TF-IDF approach for text classification *

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

Factored Models for Probabilistic Modal Logic

Lecture L9 - Linear Impulse and Momentum. Collisions

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Modeling Parallel Applications Performance on Heterogeneous Systems

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM

Modeling operational risk data reported above a time-varying threshold

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

How To Get A Loan From A Bank For Free

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search

Resource Allocation in Wireless Networks with Multiple Relays

Online Appendix I: A Model of Household Bargaining with Violence. In this appendix I develop a simple model of household bargaining that

Method of supply chain optimization in E-commerce

A Score Test for Determining Sample Size in Matched Case-Control Studies with Categorical Exposure

( C) CLASS 10. TEMPERATURE AND ATOMS

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Markovian inventory policy with application to the paper industry

AUC Optimization vs. Error Rate Minimization

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

Partitioned Elias-Fano Indexes

Work, Energy, Conservation of Energy

A magnetic Rotor to convert vacuum-energy into mechanical energy

Design of Model Reference Self Tuning Mechanism for PID like Fuzzy Controller

6. Time (or Space) Series Analysis

High Performance Chinese/English Mixed OCR with Character Level Language Identification

Experiment 2 Index of refraction of an unknown liquid --- Abbe Refractometer

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP July 2013

ASIC Design Project Management Supported by Multi Agent Simulation

Markov Models and Their Use for Calculations of Important Traffic Parameters of Contact Center

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument

The Model of Lines for Option Pricing with Jumps

PREDICTION OF MILKLINE FILL AND TRANSITION FROM STRATIFIED TO SLUG FLOW

A Gas Law And Absolute Zero Lab 11

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

Leak detection in open water channels

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX

Information Processing Letters

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach

Driving Behavior Analysis Based on Vehicle OBD Information and AdaBoost Algorithms

Botnets Detection Based on IRC-Community

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

Position Auctions and Non-uniform Conversion Rates

Kinetic Molecular Theory of Ideal Gases

Salty Waters. Instructions for the activity 3. Results Worksheet 5. Class Results Sheet 7. Teacher Notes 8. Sample results. 12

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor

The Virtual Spring Mass System

Identification and Analysis of hard disk drive in digital forensic

Online Methods for Multi-Domain Learning and Adaptation

HW 2. Q v. kt Step 1: Calculate N using one of two equivalent methods. Problem 4.2. a. To Find:

Transcription:

Prediction and Validation of ndexing Perforance for Bioetrics Suresh Kuar R. * t Bir Bhanu t Subir Ghosh Ninad Thakoor t Abstract The perforance of a recognition syste is usually experientally deterined. Therefore, one cannot predict the peiforance of a recognition syste a priori for a new dataset. n this paper, a statistical odel to predict the value of k in the rank-k identification rate for a given bioetric syste is presented. Thus, one needs to search only the topost k atch scores to locate the true atch object. A geoetrical probability distribution is used to odel the nuber of non atch scores present in the set of siilarity scores. The odel is tested in siulation and by using a public dataset. The odel is also indirectly validated against the previously published results. The actual results obtained using publicly available database are very close to the predicted results which validates the proposed odel. keywords: Object identification, Perforance prediction, Rank-k identification rate, geoetric distribution odel. 1. ntroduction Two iportant operational tasks of a bioetric syste are authentication and identification. Several indicators have been proposed for easuring the perforance of the above tasks which include False Accept Rate, False Reject Rate, Receiver Operating Characteristic, Rank-k identification/recognition rate (Rd and Cuulative Match Characteristic (CMC) Curve. Out of these, the last two easures, naely, Rank-k identification rate and Cuulative Match Characteristic Curve have been specifically proposed for bioetric identification systes. Rank-k identification rate provides the nuber of ties the correct object is present in the top k ost likely candidates. CMC Curve provides a plot of the rank-k identification rate against k [6]. These results are usually evaluated epirically for a given dataset. A fundaental proble in the epirical evaluation is that the results for a new dataset cannot be predicted [4]. What is fundaentally issing here is a theoretical approach that would predict the identification perforance for any bioetric syste. This paper focuses on developing such an approach for predicting the perforance of a bioetric sys- *ail: suresh. kurnar@eail. uer. edu t Center for Research in ntelligent Systes. UC. Riverside Dept of Statistics. University of California. Riverside tern. The results of our approach are validated on a publicly available dataset. Given a collection of objects (gallery [7]) this paper predicts the value of k for a specific Rk so that the atching object could be found aong the top k atches provided by the recognition syste. t should be noted that k is an index position in the sorted list of siilarity scores provided by the syste. For a given gallery and the corresponding atching algorith, this is achieved by collecting a set of siilarity scores for various probe objects. The siilarity scores are then processed and odeled using a geoetric distribution. Once the paraeter of the geoetric distribution has been estiated, the bounds of the values of k are predicted. n addition, this paper exaines the change in the value of k as the gallery size is increased. t has been found that the value of k expressed as a fraction of the size of the gallery reains ore or less a constant (with a slight decrease) when the gallery size is increased to a very large value. The iportant ters used in this paper are defined in Table 1. 2. Related Work and Contributions 2.1. Related Work To the best of our knowledge the proble of predicting bounds of the atch score indices in a bioetric identification syste has not been addressed earlier and, therefore, a direct coparison with published research is not possible. The following describe the related work on indexing as well as on prediction. While coparing our work with existing indexing related work, it should be noted that while the end results of an indexing algorith and our work are the sae, the approaches are quite different. Our work is independent of a specific bioetric and uses atch scores produced by a atching algorith on a given dataset and predicts the value of k for a given Rb whereas a typical indexing algorith is tailor ade to a specific bioetric and does not predict the results on a different dataset. Bhanu et al. [1] describe indexing using inutiae triplets and its perforance is reported for NST-4 fingerprint database [12]. Cappelli et al. [3] describe a new hash based indexing ethod to speed up fingerprint identification in large databases and results are reported on several databases including NST-4. Daugan [5] presents binoial odels that used only the non-atch scores to estiate the probability that a false atch never occurs [8]. Grother and 978-1-4577-1359-/11/$26. 211

Gallery Probe Siilarity score Match score Non-atch score ndexing Trial ndex, Rank Table 1: Definitions A collection of objects. A query object whose atching object is present aong the gallery objects. A real nuber representing the siilarity between two objects. A siilarity score ay be either a atch score or a non-atch score. Depending on the way the siilarity score is coputed, the saller the score the ore siilar the objects or vice versa. n this paper a low value of score is preferred. The siilarity score obtained when two atching (siilar) objects are copared. The siilarity score obtained when two dissiilar objects are copared. A ethod of re-organizing a set of data such that object retrieval fro a gallery becoes easier. Any experient generating n siilarity scores. n a sorted set of siilarity scores, the position of the given siilarity score. Phillips [7] present the prediction of recognition perforance of large sized bioetric galleries using a binoial odel under the assuption that the atch score distribution and the non atch score distribution are independent. This approach has been followed by others [11] for recognition with an additional assuption that the atch and non atch score distributions reain the sae when the gallery size is increased. n [2], Boshra et al. present a theory for predicting object recognition which is verified on synthetic aperture radar data. 2.2. Contributions The contributions of this paper are as follows: 1. We developed a foundational approach for predicting identification perforance, applicable to a variety of bioetrics. The approach is rooted in statistical perforance characterization. Such a theoretical approach does not exist in the bioetric or object recognition field in general. 2. Given the atch scores of a bioetric, using the geoetric distribution, we can predict the indexing perforance for a given Rank-k identification rate. 3. The approach presented is independent of a specific bioetric and can adapt to any atching algorith. 4. We carry out both direct and indirect validation of our results in two publicly available databases. While the prediction using a known ("nice") distribution, (geoetric distribution in this paper) is not new, this is the first paper to theoretically odel the perforance for Rankk identification and fills the void in the bioetric, coputer vision and patter recognition field. 3. Technical Approach An overview of the technical approach is shown in Figure 1. The input to our syste is a collection of siilarity scores produced by different probe objects which are noisy versions of the objects present in the gallery. Here we assue a closed-set recognition syste [7]. The siilarity scores are processed by our syste and a geoetric distribution odel is built. The paraeter p of the geoetric Siilarity Scores Geoetric Distribution Bound Paraeter r---- ndex Prediction - Bounds stiation Figure 1: Overview of the technical approach distribution odel when estiated fro the given data, enables us to predict the value of k for a any given recognition rate Rk. The details of the approach are given below. Let N be the size of the gallery. When a single probe is presented to the syste, it is atched against all the N objects of the gallery and, thus, N siilarity scores are produced. This constitute one trial. Let S represent the set of siilarity scores. S contains one atch score and N -1 non-atch scores. Let S be sorted in the ascending order of the score values. Here we assue that the lower the siilarity score, the better the atch is. n a different recognition syste, the opposite ay be true and our approach can be applied by noralizing the siilarity scores. Let X be a rando variable representing the count of the non atch scores present in S before the first atch score. f the recognition syste is ideal, the value of X would be zero. We want to odel X in order to predict its value for a given Rk. This is the sae as predicting the value of k for a specific Rk. n order to build a statistical odel, we exaine the properties of X, which are: 1. X represents a count and hence X Z, the set of integers. 2. X[O,N-l]. 3. Many a tie X ay have value zero. This is because, often, for a well designed recognition syste, the correct atch score is the first one in the set of sorted siilarity scores. Considering the above properties one can see that X can be odeled using geoetric distribution. t odels the nuber of trials needed before the first success in repeated Bernoulli trials [1]. n our case, each eleent in the set S can be as-

1 8 6 4 2 6 5 4 3 " 2 1 ""-- N=1 Noise Variance = 1 1 2 3 4 (a) Siulation N = 6 2 4 6 (c) NST BSSA data 2 15 1 " 5 7 6 5 4 8' ]! 3 2 1 N = 2 198 594 99 1386 1782 (b) NST -4 Fingerprint dataset N=1 p=.5 5 1 15 (d) Geoetric distribution Figure 2: Histogra of the nuber of non atch scores present before the atch score (rando variable X) obtained fro (a) Siulation with gallery size = 1, noise variance = 1 (b) NSTfingerprint data [12l (c) NST Bioetric score set, Right index finger [9](d) geoetric distribution with p =. 5,N = 1. Note the siilarity of the histogra of data in (a), (b) and (c) and that of a geoetric rando variable in (d). Note that these figures provide visual justification for the selection of geoetric distribution to odel the rando variable. The datasets in (a), (b), (c) and (d) are not related to each other. sued to be coing fro a Bernoulli trial (a given siilarity score ay be a atch score or a non atch score) and if the score is a atch score, we denote it as success and if the score is a non atch score, we denote it as failure. Thus, geoetric distribution can odel the nuber of non atch scores present before the first atch score. Besides, our requireent of high probability for very sall values of X is also satisfied by the geoetric distribution. The rando variable X is a waiting-tie rando variable (we wait till the first atch score is obtained) and the geoetric distribution is a natural candidate odel to describe the waiting-tie rando variable. The suitability of geoetric distribution can also be visually verified by observing the histogras of X obtained fro experiental trials and the histogra of a geoetrically distributed rando variable as shown in Figure 2. Thus one ay note that the candidacy of the geoetric odel is neither based on prior knowledge nor epirically obtained fro the data. t is a first order forulation with proising results. The reason for odeling the values of X as coing fro a single geoetric distribution, though the objects which produce the siilarity scores are the instances of different gallery objects, is the fact that, the siilarity score values are independent of the gallery/probe object naes and the score never bears any inforation concerning the objects fro which it is calculated. t only represents the closeness of the objects. To give an exaple, if we have an ideal atching algorith, the 'distance' between an object and 2 its noised version would be zero, irrespective of the object. 3.1. stiation of geoetric distribution paraeter Let X be a geoetrically distributed rando variable. Then P(X=x)=p(l-pY (1) where x =, 1,... Let Xl,X2,...,XT be the realizations of T independent and identically distributed geoetrical trials described by the odel in eq. (1). The value of p can be estiated using axiu likelihood approach and it can be shown that the estiate is unbiased. 3.1.1 stiating the value of k for Rk = 95% The CDF of the geoetric distribution is given by F(x) = 1-(1-pyl (2) We need to find out the range of values of X which occur in 95% of the experiental trials. As X is odeled as a geoetric rando variable, lower bound of x is obviously zero. The upper bound is given by the equation F(x) = 1-(1-py! =.95 (3) f there are x non atch scores before the first atch score, then k (as defined in Rk) = x 1 (4) Fro eqs. (3) and (4), the predicted bounds for Rk = 95% can be obtained. 4. xperiental Results The following sections describe the experiental results obtained fro siulation and by using NST 4 [12] fingerprint dataset. The unknown paraeter p in eq. (1) is estiated using half the available data. The odel is tested using the reaining data. 4.1. Siulation 4.1.1 Technical details of siulation The siulation was carried out on a gallery of variable nuber of objects, starting fro 1 to 5,. ach object was represented by 1 point features. The feature points were obtained fro a uniforly distributed rando variable in the interval [, 1]. The probe objects were obtained by shifting the feature point locations of the gallery objects by adding a Gaussian noise of zero ean and different variances. This helps to evaluate the perforance at different noise levels. The idea is not to identify all the noise levels that could be present in a bioetric syste because soe of the are known and others are unknown but to represent

Predicted Bounds and Actual Ranks of the Match Score(Siulation) Gallery Size = 5 Noise Variance = 1 Confidence = 95 Rk = 92.8% 2 L---,-----,-r=2=g=====u 4.5 Predicted value of k as % of gallery size... 15 1 5 t * -f" * v N iii <=- CJ ;f. 3.5 2.5 2 1 1 1 1 1 xperient Nuber Figure 3: Siulation results on a gallery of size 5. The odel predicted that the top 21 atch scores would contain the exact atch 95% of the tie. The actual success was 92%. Figure 4: Variance plot of the predicted value of k expressed as fraction of the gallery size. The variance changes fro 1-35 to 1-6 The gallery sizes are 1, 5, 1, 2, 5, 1, and 5,. k value predicted for Rk = 95%. The validation of the prediction is shown in fig. 5. the in ters of Gaussian noise variance. The siilarity score between two objects were obtained by finding the su of the uclidean distance between the closest feature points of the two objects. 4.1.2 Results on a gallery of size 5 Our ai is to predict the upper bound of k (The lower bound of k is always one) for Rk = 95% so that by searching k top atches we could find the actual atch object in 95% of the cases. 5 probe objects were generated by adding Gaussian noise N(O, 1) to each of the 5 gallery objects. Out of these, 25 probe objects were used to build the odel. The odel predicted the value of k to be 21 ( rv 4% of the gallery size). The prediction was validated by testing using the reaining 25 probe objects. t was found that the top 21 atches contained the actual atch object in 92% cases. This result is shown in Figure 3. t should be noted that the probe objects used for training and testing the odel were derived fro different gallery objects, giving ore credibility to the results. 4.1.3 Ten fold validation of results To understand how stable the above results are, we perfored a lo-fold cross-validation on the sae gallery of size 5, each tie trying to predict k for Rk = 95%. The ean and variance of the predicted value of k for Rk = 95% and the actual value of Rk were calculated. The ean value of k was found to be equal to 194 with alost zero variance. The average success of finding the exact atching object on searching the top 194 objects was 91.11 % with a variance of.58. The low variance for the value k and for the accuracy of prediction shows the stability of our prediction odel. 4.1.4 Results on different sizes of gallery n order to understand how our prediction odel perfors for different gallery sizes, the procedures entioned in sections 4.1.2 and 4.1.3 were repeated for gallery sizes 1, 5, 1, 2, 1, and 5,. The sae nuber of trials were used for building the odel and for validating it. The nuber of trials used for training and testing were 5, 25, 5, 1, 25, 5, and 25 respectively. The ean value of k predicted for each of the different gallery sizes is shown in Figure 4. The vertical bar at each data point represents the variance obtained fro the lo-fold cross validation which varied fro 1-35 to 1-6. t can be observed fro the figure that we need to search approxiately the top 4% of the gallery size siilarity scores to find the atch irrespective of the gallery size. Note the very low value of variance of the predicted k values. The easured value of Rk with variance obtained in lo-fold cross validation experientation is shown in Figure 5. 4.2. ffect of noise on the prediction n order to study the effect of noise on our prediction odel, we carried our prediction on a gallery of size 5. We choose noise variance to be 1, 3, 5, 7, and, 1 and predicted the value of k for Rk = 95%. For each noise variance value, a lo-fold cross-validation was also carried out. The variance of predicted value of k with noise is shown in Figure 6. ven with large value of noise, the variance of the predicted value of k is extreely sall and therefore the vertical bars representing the variance of the k values appear as points. As expected the ean value of k increases as noise increases. For low noise variance, k = 1, iplying that the first score itself is the atch score. For these values of k, 1% success is obtained. Thus, for low noise

93 Validation of the predicted bounds t--1 1 Variation of R with noise... 92.5 99 98 92. f 91.5 > c 91 '.e,z 9.5 9 97 96,z 95 94 93 92 91 89.5 1 1 1 1 1 Figure 5: Validation of predicted bounds for Rk = 95%. Rk values obtained for gallery sizes 1, 5, 1, 2, 5, 1, and 5, in siulation. The variance values are.7,. 1,.,.86,.58,.2,. 5 respectively. These results were obtained with a 1 fold validation. By perforing a larger fold validation in relation to the gallery size, a soother variance plot can be obtained. The predicted bounds are shown in fig. 4.. <=- > ' ;f. 4.5 3.5 2.5 1.5 Variation of k with noise -------i 1 L-- L- 1 1 Noise Variance Figure 6: ffect of noise on the predicted value of k, for Rk = 95%. The variance of k for different noise values in a 1 fold experientation was so sall that the vertical bars appear like points. variance, our prediction over achieves the results. The actual value of Rk obtained under different noise variances is shown in Figure 7. Note the increasing bar size as the noise variance is increased. ven with a noise variance of 1, the achieved value of Rk is around 9l.5% while the prediction was ade for 95% Rk. Thus we see that the perforance of our odel does not deteriorate significantly in presence of high noise. 4.3. Results on NST-4 Fingerprint Dataset The NST-4 fingerprint database [12] consists of 2 pairs of fingerprints. Using a fingerprint atching algorith [1], 4 illion siilarity scores were generated resulting in 2 sets of trial data (one probe object would produce one trial data containing 2 siilarity scores). This 9 1 1 Noise Va ria nee Figure 7: ffect of noise in the achieved value of Rk. k was predicted for Rk = 95%. The variance of k was in the range to 7 X 1-5 dataset was equally divided into two halves randoly and one half was used for building the odel and the other half was used for testing the odel. Our syste predicted that we need to exaine the top 215 atch scores for getting the true atch object 95% of the tie. We found that we actually get 93.8% success by exaining the top 215 scores. By conducting a 1-fold cross-validation experient, we found that our odel predicted the ean value of k to be 194 (9.7% of gallery) with alost zero variance for Rk = 95%. By searching the top 194 atch scores, the actual atch object was found in 92.44% trials with variance.16. n order to evaluate the perforance of our prediction odel on different gallery sizes of real data, we built saller galleries of size 1, 2, 5, and 1 fro the original NST-4 2 gallery set and in each case carrying out 1 trials. lo-fold cross-validation experientation was done for each of these sizes. The predicted k value for Rk = 95% with its variance is shown in Figure 8. The Rk values achieved for various gallery sizes is shown in fig. 9. 4.4. ndirect validation of our odel As entioned earlier in section 2.1, Bhanu et ai. [] and Cappelli et ai. [3] have reported results on indexing perforance on the NST-4 dataset. Bhanu et ai. report 83% success on exaining the top 1% of the gallery while Cappelli et al. report 96% success. The success achieved using our prediction odel is 92.44%. While Cappelli et ai. report a higher success, they have also indicated that a 5-pixel border has been reoved fro the iages before extracting the inutiae. t is also not clear whether the entire dataset was used for their experient. We have reported results on the entire database without doing any preprocessing and our results are in the sae ballpark. Besides, note that ours is a generic approach which is independent of a specific bioetric and requires only the siilarity scores produced by any atching algorith whereas the algoriths used by others are specifically eant for fingerprint indexing. Thus the re-

11.6 11.4 11.2 11 v N iii <=- 1.8 CJ 1.6 ;f. 1.4 1.2 1 9.8 9.6 Predicted value of k as % of gallery size t--1 1 1 Figure 8: Variance plots of the predicted values of k for NST4 Fingerprint data, for Rk = 95%. The variance was in the range 2. 5 x 1-5 to 1-4 Gallery sizes 1, 2, 5, 1, 2. The validation of the predicted bounds is shown in fig. 9. 93.8 93.6 93.4. 93.2 f > c 93 ' 92.8.e,z 92.6 92.4 92.2 Validation of the predicted bounds 1 1 Figure 9: Validation of the predicted bounds for gallery sizes 1, 2, 5, 1, 2 on NST4 fingerprint data. The bounds were predicted for Rk = 95%. The predicted bounds are shown in fig. 8. suits reported by Bhanu et al. [1] and Cappelli et al. [3] provide an indirect validation of our odel. 5. Conclusions We developed a odel for predicting the value of k for a given rank-k identification rate. Our ethod is independent of specific bioetrics used and can adapt to any atching algorith. t just requires the atch scores produced by different probe objects. The odel was tested using siulation and found that even when the noise is increased, the odel perfors gracefully. The odel was also tested on a public dataset available fro NST, of size 2. We predicted the value of k for 95% rank-k identification rate. t was experientally found that the predicted k value provided success rv 95%. The slight difference between the prediction and actual perforance is due to the deviation of the data fro the pure geoetric distribution, which is a first order odel. By varying the gallery size fro a sall value to a very large value, we found that the fraction of the gallery that should be exained when a new probe object is presented reains a constant or decreases slightly. The value of k decreasing as the gallery size is increased is a very good indication. t iplies that the fraction of the gallery that needs to be exained for locating a atching object would not increase proportionately if the gallery size is increased. However in siulation, we found that the k values reains ore or less a constant. The perforance of our odel was also validated indirectly with the results reported in the literature. Applicable to any bioetrics, and adaptable to any atching algorith, our ethod provides a fundaental approach for predicting indexing perforance. 6. Acknowledgeents This work was supported by the NSF grant 91527. The contents and inforation do not reflect positions or policies of the US governent. References [1] B. Bhanu and X. Tan. Fingerprint indexing based on novel features of inutiae triplets. Trans. Pattern Anal. Mach. ntell., 25:616-622, 23. [2] M. Boshra and B. Bhanu. Predicting perforance of object recognition. Trans. Pattern Anal. Mach. ntell., 22:956-969, 2. [3] R. Cappelli, M. Ferrara, and D. Maltoni. Fingerprint indexing based on inutia cylinder-code. Trans. Pattern Anal. Mach. ntell., 33(5):151-157, May 21l. [4] G. Chollet, B. Dorizzi, and D. Petrovska-Delacretaz. Guide to Bioetric Reference Systes and Perforance valuation, chapter ntroduction-about the Need of an valuation Fraework in Bioetrics. Springer Verlag, Feb. 29. [5] J. Daugan. The iportance of being rando: statistical principles of iris recognition. Pattern Recognition, 36(2): 279-291, 23. [6] D. Gorodnichy. volution and evaluation of bioetric systes. n Syposiu on Coputational ntelligence for Security and Defense Applications, pages 1-8, July 29. [7] P. Grother and P. J. Phillips. Models of large population recognition perforance. Coputer Vision and Pattern Recognition, Coputer Society Conference on, 2:68-75, 24. [8] A. Y. Johnson, J. Sun, and A. F. Bobick. Using siilarity scores fro a sall gallery to estiate recognition perforance for larger galleries. nternational Workshop on Analysis and Modeling of Faces and Gestures, : 1, 23. [9] NST. Nist bioetric score set. http://www. i tl. nist. gov/iad/894.3/bioetricscores/. [1] A. Papoulis and S. U. Pillai. Probability, Rando Variables and Stochastic Processes. Mc Graw Hill, 22. [11] R. Wang and B. Bhanu. Predicting fingerprint bioetrics perforance fro a sall gallery. Pattern Recogn. Lett., 28:4-48, Jan. 27. [12] c. Watson and C. Watson. NST special database 4: fingerprint database. National nstitute of standards and technology, Mar. 1992.