PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR



Similar documents
Verifying Numerical Convergence Rates

FINITE DIFFERENCE METHODS

Tangent Lines and Rates of Change

Derivatives Math 120 Calculus I D Joyce, Fall 2013

In other words the graph of the polynomial should pass through the points

Instantaneous Rate of Change:

Geometric Stratification of Accounting Data


The EOQ Inventory Formula

Chapter 7 Numerical Differentiation and Integration

CHAPTER 7. Di erentiation

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Nonparametric adaptive age replacement with a one-cycle criterion

Distances in random graphs with infinite mean degrees

Bandwidth Selection for Nonparametric Distribution Estimation

Math 113 HW #5 Solutions

Schedulability Analysis under Graph Routing in WirelessHART Networks

An inquiry into the multiplier process in IS-LM model

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

SAT Subject Math Level 1 Facts & Formulas

ACT Math Facts & Formulas

Solutions by: KARATUĞ OZAN BiRCAN. PROBLEM 1 (20 points): Let D be a region, i.e., an open connected set in

2 Limits and Derivatives

Optimized Data Indexing Algorithms for OLAP Systems

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Pre-trial Settlement with Imperfect Private Monitoring

Lecture 13: Martingales

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS. Swati Dhingra London School of Economics and CEP. Online Appendix

Sections 3.1/3.2: Introducing the Derivative/Rules of Differentiation

The Derivative as a Function

Research on the Anti-perspective Correction Algorithm of QR Barcode

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

College Planning Using Cash Value Life Insurance

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

Catalogue no XIE. Survey Methodology. December 2004

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

SAT Math Must-Know Facts & Formulas

ON THE EXISTENCE AND LIMIT BEHAVIOR OF THE OPTIMAL BANDWIDTH FOR KERNEL DENSITY ESTIMATION

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

Notes: Most of the material in this chapter is taken from Young and Freedman, Chap. 12.

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

2.1: The Derivative and the Tangent Line Problem

Compute the derivative by definition: The four step procedure

f(x) f(a) x a Our intuition tells us that the slope of the tangent line to the curve at the point P is m P Q =

The modelling of business rules for dashboard reporting using mutual information

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...


Theoretical calculation of the heat capacity

Chapter 10: Refrigeration Cycles

How To Ensure That An Eac Edge Program Is Successful

Welfare, financial innovation and self insurance in dynamic incomplete markets models

E3: PROBABILITY AND STATISTICS lecture notes

Writing Mathematics Papers

Math Test Sections. The College Board: Expanding College Opportunity

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

2.23 Gambling Rehabilitation Services. Introduction

Keskustelualoitteita #65 Joensuun yliopisto, Taloustieteet. Market effiency in Finnish harness horse racing. Niko Suhonen

New Vocabulary volume

A Multigrid Tutorial part two

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

by Siegfried Heiler 4 Problems of simple kernel estimation and restricted approaches

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

Cyber Epidemic Models with Dependences

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Pretrial Settlement with Imperfect Private Monitoring

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Lecture 8: More Continuous Random Variables

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS

WORKING PAPER SERIES THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS NO. 366 / JUNE by Peter Christoffersen and Stefano Mazzotta

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

f(x + h) f(x) h as representing the slope of a secant line. As h goes to 0, the slope of the secant line approaches the slope of the tangent line.

Equilibria in sequential bargaining games as solutions to systems of equations

Training Robust Support Vector Regression via D. C. Program

6. Differentiating the exponential and logarithm functions

Metric Spaces. Chapter Metrics

The Method of Least Squares

BANACH AND HILBERT SPACE REVIEW

Average and Instantaneous Rates of Change: The Derivative

Tis Problem and Retail Inventory Management

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

5. Linear Regression

Sums of Independent Random Variables

Taylor and Maclaurin Series

The Heat Equation. Lectures INF2320 p. 1/88

A strong credit score can help you score a lower rate on a mortgage

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Transcription:

PLUG-IN BANDWIDTH SELECTOR FOR THE KERNEL RELATIVE DENSITY ESTIMATOR ELISA MARÍA MOLANES-LÓPEZ AND RICARDO CAO Departamento de Matemáticas, Facultade de Informática, Universidade da Coruña, Campus de Elviña s/n, 57 A Coruña, Spain BANDWIDTH SELECTION FOR THE RELATIVE DENSITY Abstract Tis paper is focused on two kernel relative density estimators in a two-sample problem An asymptotic expression for te mean integrated squared error of tese estimators is found and, based on it, two solve-te-equation plug-in bandwidt selectors are proposed In order to examine teir practical performance a simulation study and a practical application to a medical dataset are carried out Key words and prases: kernel-type estimates, smooting parameter, solve-teequation rules, survival analysis, two-sample problem Introduction Te study of differences among groups or canges over time is a goal in fields suc as medical researc and social science researc Te traditional metod for tis purpose is te usual parametric location and scale analysis However, tis is a very restrictive tool, since a lot of te information available in te data is unaccessible In order to make a better use of tis information it is convenient to focus on distributional analysis, ie,

on te general two-sample problem of comparing te cumulative distribution functions cdf, F and F, of two random variables, X and X Useful tools for tis purpose are te relative distribution function, R t, and te relative density function, r t, of X wit respect to wrt X : R t = P F X t = F F t, < t < were F t = inf {x F x t} denotes te quantile function of F and r t = R t = f F t f F t, < t < were f and f are te densities pertaining to F and F, respectively Tese two curves, as well as estimators for tem, ave been studied by Gastwirt 968, Ćwik and Mielniczuk 993, Hsie 995, Hsie and Turnbull 996, Cao et al 2, 2 and Handcock and Janssen 22 Tese functions, R and r, are closely related to oter statistical metods Te ROC curve, used in te evaluation of te performance of medical tests for separating two groups, is related to R troug te relationsip ROC t = R t see, for instance, Holmgren 996 and Li et al 996 for details and te density ratio x = fx f x,x R, used by Silverman 978 is linked to r troug x = r F x Trougout te paper, we will focus on two kernel-type estimators of r, similar to te one already proposed by Ćwik and Mielniczuk 993 In te following section we will give some notation and obtain an asymptotic representation for te MISE of te relative density estimators Tis is a difference wit respect to Ćwik and Mielniczuk 993, were an asymptotic expression for te MISE of only te dominant part of te estimator was found Section 3 is concerned wit automatic global bandwidt selection Two solve-te-equation plug-in bandwidt selectors based on te ideas by Seater and Jones 2

99 are proposed A simulation study is sown in Section 4 were te performance of te data-driven selectors proposed in tis paper is compared wit te selector proposed in Ćwik and Mielniczuk 993 A medical application is presented in Section 5 Finally, te proofs of te results presented in Sections 2 and 3 are included in Section 6 2 Kernel relative density estimators Consider te two-sample problem wit completely observed data: {X,,X n }, {X,,X m } were te X i s are independent and identically distributed as X ; and te X j s are independent and identically distributed as X Tese two sequences are independent eac oter Trougout tis paper all te asymptotic results are obtained if bot sample sizes m and n tend to infinity in suc a way tat, for some constant < λ <, m lim m n = λ We assume te following conditions on te underlying distributions, te kernels K and M and te bandwidts and to be used in te estimators see, 2 and 3 below: F F and F ave continuous density functions, f and f, respectively F2 f is a tree times differentiable density function wit f 3 bounded R r is a differentiable density wit compact support contained in [, ] wit r bounded 3

K K is a symmetric four times differentiable density function wit compact support [, ] and K 4 bounded K2 M is a symmetric density and continuous function except at a finite set of points B and m 3 B2 and n 4 Since K t z dr z is close to r t and for smoot distributions it is satisfied tat: t z K dr z = K t F z df z, a natural way to define a kernel-type estimator of r t is to replace te unknown functions F and F by some appropriate estimators We consider two proposals: ˆr t = K t F n zdf m z = m m K t F n X j j= and 2 ˆr, t = K t F n z df m z = m m j= K t F n X j, were K t = K t, K is a kernel function, is te bandwidt used to estimate r, F n and F m are te empirical distribution functions based on X i s and X j s, respectively, and F n is a kernel-type estimate of F given by: n x 3 Fn = n Xi M i= were M denotes te cdf of te kernel M and is te bandwidt used to estimate F 4

Using a Taylor expansion, ˆr t can be written as follows: ˆr t = K t F z df m z + K t F z F z F n zdf m z + F z F n z 2 sk 2 t F z s F n z F zdsdf m z Let us define Ũn = F n F and R m = F m F Ten, ˆr t can be rewritten in a useful way for te study of its mean integrated squared error MISE: 4 ˆr t = r t + A + A 2 + B, were B = r t = A 2 = n A = K t F zdf m z = m n i= F z F n z 2 v Ũnv m K t F X j j= K t vd Rm R v F w {X i w}k t F wdfw sk 2 t F z s F n z F z dsdf m z Proceeding in a similar way, we can rewrite ˆr, as follows: 5 ˆr, t = r t + A + A 2 + Â + ˆB, were ˆB = Â = F n w F n w K t F wdf m w F z F n z 2 sk 2 t F z s Fn z F z dsdf m z Our main result is an asymptotic representation for te MISE of ˆr t and ˆr, t Wit te purpose of simplifying te exposition of te results obtained, from ere on we will denote C g = g2 xdx, for any square integrable function g 5

Teorem AMISE Assume conditions F, R, K and B Ten were d K = x2 K xdx MISEˆr = AMISE + o m + 4 + o n AMISE = m CK + 4 4 d 2 KCr 2 + n CrCK If te conditions F2, K2, and B2 are assumed as well, ten te same result is satisfied for te MISEˆr, Remark From Teorem it follows tat te optimal bandwidt, minimizing te asymptotic mean integrated squared error of any of te estimators considered for r, is given by CKλCr + 6 AMISE = d 2 K Cr2 m 5 Remark 2 Note tat AMISE derived from Teorem does not depend on te bandwidt A more iger-order analysis sould be considered to address simultaneously te bandwidt selection problem of and 3 Bandwidt selectors 3 Estimation of density functionals It is very simple to sow tat, under sufficiently smoot conditions on r r C 2l R, te functionals 7 C r l = r l x 2 dx appearing in 6, are related to oter general functionals of r, denoted by Ψ 2l : 8 C r l = l r 2l xr xdx = l Ψ 2l 6

were Ψ l = r l xr xdx = E [ r l F X ] Te equation above suggests a natural kernel-type estimator for Ψ l as follows 9 ˆΨl g = m [ m m ] m Ll g F n X j F n X k j= k= were L is a kernel function and g is a smooting parameter called pilot bandwidt Likewise in te previous section, tis is not te unique possibility and we could consider anoter estimator of Ψ l, Ψl g = m [ m m m Ll g Fn X j F ] n X k, j= k= were F n in 9 is replaced by F n Since te difference between bot estimators decreases as tends to zero, it is expected to obtain te same teoretical results for bot estimators Terefore, we will only sow teoretical results for ˆΨ l g We will obtain te asymptotic mean squared error of ˆΨ l g under te following assumptions R3 Te relative density r C l+6 R K3 Te kernel L is a symmetric kernel of order 2, L C l+7 R and satisfies tat l 2 +2 L l d L >, L l = L l+ =, wit d L = x2 Lxdx B3 g = g m is a positive-valued sequence of bandwidts satisfying lim g = and m lim m mgmax{α,β} = were α = 2 l + 7,β = l + + 2 5 2 7

Condition R3 implies a smoot beaviour of r in te boundary of its support, contained in [, ] If tis smootness fails, te quantity C r l could be still estimated troug its definition, using a kernel estimation for r l see Hall and Marron 987 for te one-sample problem setting Condition K3 can only old for even l Observe tat in condition B3 for even l, max {α,β} = α for l =, 2 and max {α,β} = β for l = 4, 6, Teorem 2 Assume conditions F, R3, K3 and B3 Ten it follows tat [ MSE ˆΨl g = + λψ mg l+ll + 2 d LΨ l+2 g 2 + O g 4 ng l+ +o ] 2 2 + m 2 g 2l+Ψ C L l + o m 2 g 2l+ + O n Remark 3 Te first term in te rigt-and side of corresponds to te squared bias term of MSE Note tat, using K3 and 8, te main bias term can be made to vanis by coosing g as g l 2L l λψ + g l = d L Ψ l+2 m l+3 = 2L l d 2 K Ψ l+3 5 4 l+3 AMISE d L Ψ l+2 CK 32 STE rules based on Seater & Jones ideas As in te context of ordinary density estimation, te practical implementation of te kernel-type estimators proposed ere see and 2, requires te coice of te smooting parameter Our two proposals, SJ and SJ2, as well as te selector b 3c recommended by Ćwik and Mielniczuk 993, are modifications of Seater & Jones 99 Since te Seater & Jones selector is te solution of an equation in te bandwidt, it is also known as a solve-te-equation STE rule Motivated by te formula 6 8

for te AMISE-optimal bandwidt and te relation 8, solve-te-equation rules require tat is cosen to satisfy te relationsip = C K λ Ψ γ + d 2 K Ψ 4 γ 2 m were te pilot bandwidts for te estimation of Ψ and Ψ 4 are functions of γ and γ 2, respectively and Motivated by Remark 3, we suggest taking 2 L d 2 K γ = Ψ 4 g 4 d L Ψ2 g 2 CK 5, 3 5 3 2 L 4 d 2 K γ 2 = Ψ 7 4 g 4 5 7, d L Ψ6 g 6 CK were Ψ j, j =, 2, 4, 6 are kernel estimates Note tat tis way of proceeding leads us to a never ending process in wic a bandwidt selection problem must be solved at every stage To make tis iterative process feasible in practice one possibility is to propose a stopping stage in wic te unknown quantities are estimated using a parametric scale for r Tis strategy is known in te literature as te stage selection problem see Wand and Jones 995 Wile te selector b 3c in Ćwik and Mielniczuk 993 used a Gaussian scale, now for te implementation of SJ2, we will use a mixture of betas based on te Weierstrass approximation teorem and Bernstein polynomials associated to any continuous function on [, ] see Kakizawa 24 and references terein for te motivation of tis metod Later on we will sow te formula for computing te reference scale above-mentioned, togeter wit te selector b 3c we used in Section 4 In te following we denote te Epanecnikov kernel by K, te uniform density in [, ] by M and we define L as follows Lx = Γ8 8 x + x + 8 { x } 2Γ9Γ9 2 2 9

Next, we detail te steps required in te implementation of SJ2 Step Obtain PR ˆΨ j j =, 4, 6, 8, parametric estimates for Ψ j j =, 4, 6, 8, wit te replacement of r in C r j/2 see 7, by a mixture of betas, bx, as it will be explained later on see 2 Step 2 Compute kernel estimates for Ψ j j = 2, 4, 6, by using Ψ j gj PR j = 2, 4, 6, wit g PR j = 2 Lj λˆψ PR + d L ˆΨ PR j+2 m j+3, j = 2, 4, 6 Step 3 Estimate Ψ and Ψ 4, using, by means of Ψ ˆγ and Ψ 4 ˆγ 2, were ˆγ = 2 L d 2 K Ψ 4 g PR 4 d L Ψ2 g2 PR C K 3 5 3 and ˆγ 2 = 2 L 4 d 2 K Ψ 4 g PR 4 d L Ψ6 g6 PR C K 7 5 7 Step 4 Select te bandwidt SJ2 as te one tat solves te following equation in : = C K λ Ψ ˆγ + d 2 K Ψ 4 ˆγ 2 m In order to solve te equation above, it will be necessary to use a numerical algoritm In te simulation study we will use te false-position metod Te main reason is tat te false-position algoritm does not require te computation of te derivatives, wat simplifies considerably te implementation of te proposed bandwidt selectors At te same time, tis algoritm presents some advantages over oters because it tries to combine te speed of metods suc as te secant metod wit te security afforded by te bisection metod 5

Unlike te Gaussian parametric reference, used to obtain b 3c, te selector SJ2 uses in Step a mixture of betas as follows: 2 bx = N j R n,m N j= R j n,m βx,j,n j + N were 3 4 m R n,m x = m x M F n X j, g j= 2 xm x M xdx g = nd 2 M C r 3, βx,a,b stands for te beta density βx,a,b = and N is te number of betas in te mixture Γa + b ΓaΓb xa x b,x [, ], Since we are trying to estimate a density wit support in [, ] it seems more suitable to consider a parametric reference wit tis support A mixture of betas is an appropriate option because it is flexible enoug to model a large variety of relative densities, wen derivatives of order, 3 and 4 are also required Note tat, for te sake of simplicity, we are using above te AMISE-optimal bandwidt g for estimating a distribution function in te setting of a one-sample problem see Polansky and Baker 2 for more details in te kernel-type estimate of a distribution function Te use of tis bandwidt requires te previous estimation of te unknown functional, C r We will consider a quick and dirty metod, te rule of tumb, tat uses a parametric reference for r to estimate te above-mentioned unknown quantity More specifically, our reference scale will be a beta wit parameters p, q estimated from te smooted relative sample { Fn X j } m j=, using te metod of moments Following te same ideas as for 3 and 4, te bandwidt selector used for te

kernel-type estimator F n introduced in 3 is based on te AMISE-optimal bandwidt in te one-sample problem: = 2 xm x M xdx nd 2 M C f As it was already mentioned above, in most of te cases tis metodology will be applied to survival analysis, so it is natural to assume tat our samples come from distributions wit support on te positive real line Terefore, a gamma reference distribution, Gammaα,β, as been considered, were te parameters α,β are estimated from te smooted relative sample { Fn X j } m j=, using te metod of moments For te implementation of SJ, we proceed analogously to tat of SJ2 above Te only difference now is tat trougout te previous discussion, Ψj and F n are replaced by, respectively, ˆΨ j and F n As a variant of te selector tat Ćwik and Mielniczuk 993 proposed, b 3c is obtained as te soluction to te following equation: b 3c = CK + λˆψ a d 2 ˆΨ K 4 α 2 b 3 c m { } were a = 78ˆσm 3, ˆσ = min s m,îqr/349, s m and ÎQR denote, respectively, te empirical standard deviation and te sample interquartile range of te relative data, {F n X j } m j=, α 2 b 3c = 7694 were GS stands for standard Gaussian scale, 5 3 ˆΨ4 g GS 7 4 b 5 7 ˆΨ 6 g6 GS 3c,, g 4 GS =247ˆσm 7,g6 GS =234ˆσm 9 and te estimates ˆΨ j wit j =, 4, 6 were obtained using 9, wit L replaced by te standard Gaussian kernel and wit data driven bandwidt selectors derived from 2

reducing te two-sample problem to a one-sample problem It is interesting to note tat all te kernel-type estimators presented previously ˆr t, ˆr, t, Rn,m x and F n x were not corrected to take into account, respectively, te fact tat r and R ave support on [, ] instead of on te wole real line, and te fact tat f is supported only on te positive real line Terefore, in order to correct te boundary effect in practical applications we will use te well known reflecting metod to modify ˆr t, ˆr, t, R n,m x and F n x, were needed 4 Simulations We compare, troug a simulation study, te performance of te bandwidt selectors SJ and SJ2, proposed in Section 3, wit te standard competitor b 3c recommended by Ćwik and Mielniczuk 993 Altoug we are aware tat te smooting parameter N introduced in 2 sould be selected by some optimal way based on te data, tis issue goes beyond te scope of tis article Consequently, from ere on, we will consider N = 4 components in te beta mixture reference scale model given by 2 We will consider te first sample coming from te random variate X = W U and te second sample coming from te random variate X = W S, were U denotes a uniform distribution in te compact interval [, ], W is te distribution function of a Weibull distribution wit parameters 2, 3 and S is a random variate from one of te following five different populations see Figure : a A beta distribution wit parameters 4 and 7 β 4, 7 b A mixture consisting of V wit probability 4 5 and V 2 wit probability 5, were V = β 4, 37 and V 2 = β 4, 2 3

c A mixture consisting of V wit probability 3 and V 2 wit probability 2 3, were V = β 34, 5 and V 2 = β 5, 3 Put Figure about ere Coosing different values for te pair of sample sizes m and n and under eac of te models presented above, we start drawing 5 pair of random samples and, according to every metod, we select te bandwidts ĥ Ten, in order to ceck teir performance we approximate by Monte Carlo te mean integrated squared error, EM, between te true relative density and te kernel-type estimate for r, given by wen ĥ = b 3c, SJ or by 2 wen ĥ = SJ 2 Te computation of te kernel-type estimations can be very time consuming by using a direct algoritm Terefore, we will use linear binned approximations tat, tanks to teir discrete convolution structures, can be fast computed by using te fast Fourier transform FFT see Wand and Jones 995 for more details For all te models, te values of tis criterion for te tree bandwidt selectors, SJ, SJ2 and b 3c, can be found in Table Put Table about ere A careful look at te table points out tat te new selector SJ2 presents a muc better beaviour tan te selector b 3c, especially wen te sample sizes are equal or wen m is larger tan n Te improvement is even larger for unimodal relative densities model a and b On te oter and, it is observed tat te oter proposal, SJ, presents only a moderate improvement over b 3c for unimodal relative densities model a and b and performs only sligtly better or even worse tan b 3c for bimodal relative densities model c Te ratio m n produces an important effect on te beaviour of any of te 4

tree selectors considered For instance, it is clearly seen an asymmetric beaviour of te selectors in terms of te sample sizes Oter proposals for selecting ave been investigated For instance, versions of SJ2 were considered in wic eiter te unknown functionals Ψ are estimated from te viewpoint of a one-sample problem or te STE rule is modified in suc a way tat only te function ˆγ 2 is considered in te equation to be solved see Step 4 in Section 3 After a simulation study similar to te one detailed ere, but now carried out for tese versions of SJ2, it was observed a similar practical performance to tat observed for SJ2 However, a worse beaviour was observed wen, in te implementation of tese versions of SJ2, te smoot estimate of F is replaced by te empirical distribution function F n Terefore, altoug SJ2 requires te selection of two bandwidt parameters, a clear better practical beaviour is observed wen considering te smooted relative data instead of te non-smooted ones 5 A medical application In tis section we apply te plug-in STE selector SJ2 detailed above, to estimate te relative density for a real data set concerned wit prostate cancer PC Te data consist of 599 patients suffering from PC + and 835 patients PC-free - For eac patient te illness status as been determined troug a prostate biopsy carried out for first time in Hospital Juan Canalejo Galicia, Spain between January of 22 and September of 25 In te literature, tere exists an increasingly interest in finding a good diagnostic test tat elps in te early detection of PC and avoids te need of undergoing a prostate biopsy Tere are several studies in wic, troug ROC curves, it was investigated te 5

performance of different diagnostic tests based on some analytic measurements suc as te total prostate specific antigen tpsa, te free PSA fpsa or te complexed PSA cpsa As it was mentioned in Section, tere exists a close relation between te concepts of ROC curve and relative density Relative density estimates can provide more detailed information about te performance of a diagnostic test wic can be useful not only in comparing different tests but also in designing an improved one Tis issue goes beyond te scope of tis article and terefore it will not be investigated ere In tis section we compare from a distributional point of view te above mentioned measurements tpsa, fpsa and cpsa among te two groups in te data set PC+ and PC- To tis end we start computing te appropriate bandwidts using te datadriven bandwidt selector SJ2 and ten te corresponding relative density estimates are computed using 2 Tese estimates are sown in Figure 2 Put Figure 2 about ere It is clear from Figure 2 tat te relative density estimate is above one in a central interval accounting for a probability of about 4% of te PC- distribution for te variables tpsa and cpsa Tis is also te case in te % rigt tail of te PC- group for tpsa and cpsa, but not for fpsa In te case of fpsa te central interval wit relative density above one is sligtly sifted to te left of te PC- distribution: between percentiles 2% and 6% 6 Proofs Te proof of Teorem will be a direct consequence of some previous lemmas were eac one of te terms tat result from expanding te expression for te MISE are studied 6

Some of tem will produce dominant parts in te final expression for te MISE wile oters will yield negligible terms Lemma Assume te ypotesis above Ten i E[ r t rt 2 ]dt = CK r tdt + m 4 4 d 2 K Cr2 + o + 4 m ii E[A2 2]dt = n CrCK + o n iii E[A2 ]dt = o n iv E[B2 ]dt = o n v E[2A r rt]dt = vi E[2A 2 r rt]dt = vii E[2B r rt]dt = o m + 4 viii E[2A A 2 ]dt = o m + 4 ix E[2A B]dt = o m + 4 x E[2A 2B]dt = o n + 4 = O n Lemma 2 Assume te ypotesis above Ten i E[Â2 ]dt = o n ii E[ ˆB 2 ]dt = o n Proof of Lemma Te proof of i is not included ere because it is a classical result in te setting of ordinary density estimation in a one-sample problem see Wand and Jones 995 for details We next prove ii Standard algebra gives E[A 2 2] = n 2 4 n i= n j= K t F w E [ F w {Xi w }F w 2 {Xj w 2 } ] K t F w 2 dfw dfw 2 7

Due to te independence between X i and X j for i j, and using te fact tat Cov {F X i u }, {F X i u 2 } = u u 2 g u u 2, were g t = t, te previous expression can be rewritten as follows t E[A 2 2] = 2 n 4 = n 4 t u u 2 g u 2 K u u 2 [ t g u 2 d u K u u 2 Now, using integration by parts, it follows tat t K u2 ru du ] 2 ru ru 2 du du 2 5 E[A 2 2] = n 4 + n 4 lim g u 2 Gu 2 2 + u 2 n 4 Gu 2 2 g u 2 du 2 lim g u 2 Gu 2 2 u 2 + were Gu 2 = t u K u ru du u 2 Since G is a bounded function and g =, te second term in te rigt and side of 5 vanises to zero On te oter and, due to te boundedness of K and r, it follows tat Gu 2 K r u 2 2 5 is zero as well Terefore, 2, wic let us conclude tat te first term in 6 E[A 2 2] = n 4 Gu 2 2 g u 2 du 2 Now, using integration by parts, it follows tat t u Gu 2 = u ru K + u 2 t u K [ ru + u r u ]du u 2 t u2 = u 2 ru 2 K + t u K [ u r u ru ]du, u 2 8

and plugging tis last expression in 6, it is concluded tat E[A 2 2] = n 2I 2 + 2I 22 + I 23, were I 22 = u 2 K I 2 = t r 2 u 2 K 2 u2 du 2 t u 2 ru u2 2K t u [ u r u ru ]du du 2 I 23 = K u 2 2 t u u 2 K t u [ u r u ru ] u 2 [ u r u ru ]du du du 2 Terefore, 7 E[A 2 2]dt = n 2I 2dt + 2 n 2I 22dt + n 2I 23dt Next, we will study eac summand in 7 separately Te first term can be andled by using canges of variable and a Taylor expansion: n 2I 2 dt = n r 2 u 2 u 2 u 2 K 2 sds du 2 Let us define K 2 x = x K2 sds and rewrite te previous term as follows n 2I 2 dt = n r 2 u2 u 2 K 2 K 2 u 2 du 2 Now, by splitting te integration interval into tree subintervals: [,], [, ] and [, ], using canges of variable and te fact tat CK x K 2 x = x, 9

it is easy to sow tat n 2I 2 dt = CK Cr + O n n Below, we will study te second term in te rigt and side of 7 By using canges of variable, Caucy-Scwarz inequality and conditions r <, r < and CK <, straigtforward calculations lead to n 2I 22 dt = O n + = O n 2 n 2 Similar arguments give n 2I 23 dt = O n 2 Terefore, it as been sown tat E[A 2 2]dt = n CrCK + O Finally te proof of ii concludes using condition B We now prove iii Direct calculations lead to + O n n 2 8 E[A 2 ] = 4E [I ] were 9 [ I =E t v Ũnv v 2 Ũnv 2 K v t K v2 d R m Rv d R m Rv 2 /X,,X n ] To tackle wit 8 we first study te conditional expectation 9 It is easy to see tat I = V ar[v/x,,x n ] were V = m m j= t X j ŨnX j K Xj 2

Tus { I = [ ] 2 t v v m ŨnvK drv [ and E[A 2 ] = m 4 E Taking into account tat { [ ] }[ 2 t v E v Ũnv K [ v Ũnv v 2 Ũnv 2 [ ] E sup Ũnv v 2 = v ] K t v t v v ŨnvK ] 2 drv m 4 t K v2 P sup Ũnv v 2 > c dc, v we can use te Dvoretzky-Kiefer-Wolfowitz inequality, to conclude tat [ ] 2 E sup Ũnv v 2 v 2e 2nc dc = 2 n ye y2 dy = O ] 2 } drv drv drv 2 n Consequently, using 2 and te conditions r < and K < we obtain tat E[A 2 ] = O mn 4 Te proof of iii is concluded using condition B Te results appearing in items iv-x can be proved by first conditioning to some appropriate random variables and ten andling te conditional moments using standard arguments For tis reason teir proofs are not included ere ten Proof of Lemma 2 We start proving i Let us define D n w = F n w F n w, E[Â2 ] = E[E[Â2 /X,,X m ]] [ ] = E E[D n w D n w 2 ]K t F w K t F w 2 df m v df m v 2 Based on te results set for D n w in Hjort and Walker 2, te conditions F2 and K2 and since E[D n w D n w 2 ] = CovD n w,d n w 2 + E[D n w ]E[D n w 2 ], 4 it follows tat E[D n w D n w 2 ] = O + O 4 n 2

Terefore, for any t [, ], we can bound E[Â2 ], using suitable constants C 2 and C 3 as follows 4 2 E[Â2 t ] = C 2 K F z fzdz 4 m 4 m t +C 3 K F z t K F z 2 fz 4 fz 2 dz dz 2 m Besides, te condition R allows us to conclude tat 2 K t F z fzdz = O and K t F z K t F z 2 fz fz 2 dz dz 2 = O 2 for all t [, ] Terefore, E [Â2 and B2, implies i ] dt = O 4 n 3 +O 4 2, wic, taking into account conditions B We next prove ii Te proof is parallel to tat of item iv in Lemma Te only difference now is tat instead of requiring E[sup F n x F x p ] = On p 2, were p is an integer larger tan, it is required tat [ 2 E sup F n x F x p ] = On p 2 To conclude te proof, below we sow tat 2 is satisfied Define H n =sup F n x F x, ten, as it is stated in Amad 22, it follows tat H n E n +W n were E n = sup F n x F x and W n = sup E F n x F x = O 2 Using te binomial formula it is easy to obtain tat, for any integer p, H p n p j= Cp j W n p j E j n, were te constants C p j s wit j {,,,p,p} are te binomial coefficients Terefore, since E[E j n] = On j 2 and W p j n = O 2p j, condition B2 leads to W p j n E [E j n] = On p 2 As a straigtforward consequence, 2 olds and te proof of ii is concluded Proof of teorem 2 Below, we will briefly detail te steps followed to study te asymptotic beaviour of te mean squared error of ˆΨ l g defined in 9 First of all, let 22

us observe tat wic implies: ˆΨ l g = m Ll g + m 2 m m j= k=,j k L l g F n X j F n X k, ] E [ˆΨl g = + E [ L mg l+ll g l F n X F n X 2 ] m Starting from te equation E [ L l g F n X F n X 2 ] = E [ E [ ]] L g l F n X F n X 2 /X,,X n [ ] = E L g l F n x F n x 2 f x f x 2 dx dx 2 = and using a Taylor expansion, we ave E [ L l g F n x F n x 2 ] f x f x 2 dx dx 2 22 E [ L l g F n X F n X 2 ] = 7 i= I i were I i = I = E g l+ll i!g l+i+ll+i F x F x 2 g F x F x 2 f x f x 2 dx dx 2 g [ F n x F x F n x 2 + F x 2 i] f x f x 2 dx dx 2 i =,,6 I 7 = 7!g l+7+e [ L l+7 ξ n F n x F x F n x 2 + F x 2 7] f x f x 2 dx dx 2 and ξ n is a value between F x F x 2 g and F nx F n x 2 g 23

Now, consider te first term, I, in 22 It is easy to see tat I = = = z2 /g L l g z z 2 rz rz 2 dz dz 2 L g xrz z 2 r l z 2 dz dz 2 Lxrz 2 + gxr l z 2 dxdz 2, ence using a Taylor expansion, we ave I = Ψ l + /2d L Ψ l+2 g 2 + O g 4 Assume x > x 2 and define Z = n i= {x 2 <X i x } Ten, te random variable Z as a Bin,p distribution wit p = F x F x 2 and mean µ = np It is easy to sow tat, for i =,,6, I i = 2 x 2 i!g l+i+ll+i F x F x 2 g f x f x 2 n iµ i Z dx dx 2 were µ r Z = E [Z E [Z] r ] = m k = E [ Z k] = S m,n = n j= k j= r r j m r j µ j, j j= S k,j n!p j, n j! n j j n j m n! Noting µ Z = and µ 2 Z = nf x F x 2 F x + F x 2, we ave I = and 23 I 2 = L l+2 F x F x 2 fx ng l++2 fx 2 g F x F x 2 F x + F x 2 dx dx 2 = u v L l+2 u v u vrurvdudv ng l++2 g = ng l+ v v/g L l+2 xx gxrv + gxrvdxdv 24

Using a Taylor expansion of gxrv + gx and noting xll+2 xdx = L l see condition K3, we ave from 23 I 2 = ng l+ψ L l + O ng l Similar arguments can be used to andle I i = On 2 g l+2 for i = 3, 4 and I i = O n 3 g l+3 for i = 5, 6 Coming back to te last term in 22 and using Dvoretzky- Kiefer-Wolfowitz inequality and condition K3, it is easy to sow tat I 7 =O n2g 7 l+8 Terefore, ] E[ˆΨl g =Ψ l + 2 d LΨ l+2 g 2 + + Ψ mg l+ll ng l+ll +O g 4 +o ng l+ In order to study te variance of ˆΨ l g, note tat ] 24 V ar [ˆΨl g = were 3 c n,i V l,i i= c n, = c n,2 = c n,3 = 2 m m 3 4 m m 2 m 3 m m 2 m 3 m 3 25 26 27 V l, =V ar [ L l g F n X F n X 2 ] V l,2 =Cov [ L g l F n X F n X 2,L l g F n X 2 F n X 3 ] V l,3 =Cov [ L g l F n X F n X 2,L l g F n X 3 F n X 4 ] Terefore, in order to get an asymptotic expression for te variance of ˆΨ l g, we will start getting asymptotic expressions for te terms 25, 26 and 27 in 24 To deal wit te term 25, we will use 28 V l, = E [ L l g F n X F n X 2 ] 2 E [ 2 L g l F n X F n X 2 ] 25

and study separately eac term in te rigt and side of 28 Note tat te expectation of L g l F n X F n X 2 as been already studied wen dealing wit te expectation of ˆΨ l g Next we study te first term in te rigt and side of 28 Using a Taylor expansion, te term: [ L l E g F n X F n X 2 ] 2 [ [ L l = E E g F n X F n X 2 ]] 2 /X,,X n [ ] = E F n x F n yf x f ydxdy = E L l2 g [ L l2 g F n x F n y ] f xf ydxdy can be decomposed in a sum of six terms tat can be bounded easily Te first term in tat decomposition can be rewritten as Ψ g 2l+ C L l + o after applying some g 2l+ canges of variable and a Taylor expansion Te oter terms can be easily bounded using Dvoretzky-Kiefer-Wolfowitz inequality and standard canges of variable Tese bounds and condition B3 prove tat te order of tese terms is o Consequently, g 2l+ V l, = g 2l+Ψ C L l + o Te term 26 can be andled using g 2l+ = g 2l+Ψ C L l Ψ 2 l + o Ψ l + o 2 + o g 2l+ 29 V l,2 =E [ L l g F n X F n X 2 L g l F n X 2 F n X 3 ] E 2 [ L l g F n X F n X 2 ] As for 28, it is only needed to study te first term in te rigt and side of 29 26

Note tat E [ L g l F n X F n X 2 L g l F n X 2 F n X 3 ] = E [ E [ L l g = F n X F n X 2 L l g F n X 2 F n X 3 /X,, X n ]] f yf zf tdydzdt E [ L g l F n y F n zl g l F n z F n t ] Taylor expansions, canges of variable, Caucy-Scwarz inequality and Dvoretzky- Kiefer-Wolfowitz inequality, give: = E [ L l g F n X F n X 2 L g l F n X 2 F n X 3 ] r l2 z r z dz + O + O + O + O n n 2 n 3 Consequently, using B3 and 29, V l,2 = O To study te term V l,3 in 27, let us define A l = [ L l g It is easy to sow tat: Now a Taylor expansion gives 3 V ar A l = were n 4 g 2l++4 F n y F n z L l g F y F z ] f yf zdydz V l,3 = V ar A l N V ar T k + k= A l = N k= N T k, k= N Cov T k,t l, F y F z T k = f yf z k!g l+ll+k g k Fn y F n z F y F z dydz, for k =,,N, g l= k l 27

T N = ξ N!g l+ll+n n f yf z Fn y F n z F y F z g N dydz, for some positive integer N We will only study eac one of te first N summands in 3 Te rest of tem will be easily bounded using Caucy-Scwarz inequality and te bounds obtained for te first N terms Now te variance of T k is studied First of all, note tat V art k E [ T k 2] = L l+k F y F z g k y,z,y 2,z 2 dy dz dy 2 dz 2 k!g l+k+ L l+k F y 2 F z 2 g 2 fy fz fy 2 fz 2 were k y,z,y 2,z 2 = E { [F n y F n z F y F z ] k [F n y 2 F n z 2 F y 2 F z 2 ] k} Using canges of variable we can rewrite E[Tk 2 ] as follows: E [ T k 2] = = 2 rs rt rs 2 rt 2 L l+k g s t L g l+k s 2 t 2 k! k F s,f t,f s 2,F t 2 ds dt ds 2 dt 2 s2 s s 2 s k! 2 rs rs u rs 2 rs 2 u 2 L g l+k u L l+k g u 2 k F s,f s u,f s 2,F s 2 u 2 du ds du 2 ds 2 Note tat closed expressions for k can be obtained using te expressions for te moments of order r = r,r 2,r 3,r 4,r 5 of Z, a random variable wit multinomial distribution wit parameters n;p,p 2,p 3,p 4,p 5 Based on tese expressions, te condition 28

R3 and te use of integration by parts we can rewrite E [T k 2 ] as follows: were E [ T k 2] = s2 s s 2 2l+k u l+k u2 l+k s 2 L g u L g u 2 k! k u,s,u 2,s 2 du ds du 2 ds 2, k u,s,u 2,s 2 = rs rs u rs 2 rs 2 u 2 k F s,f s u,f s 2,F s 2 u 2 Besides, based on te multinomial moments we can sow tat sup z R 4 k z = O n k Tis result and condition R3 allow us to conclude tat V ar Tk E T 2 k = O n, for k < N, wic implies tat V ar k Tk = o n, for 2 k < N A Taylor expansion of order N = 6, gives V ar T 6 = O condition B3, proves V ar T 6 = o n Consequently, ] V ar [ˆΨl g =, wic using n N g 2N+l+ 2 m 2 g 2l+Ψ C L l m + o 2 g 2l+ + O n were Remark 4 If equation 22 is replaced by a tree-term Taylor expansion 2 i= I i+i 3 I3 = E [ L l+3 ζ 3!g l+4 n F n x F x F n x 2 + F x 2 3] f x f x 2 dx dx 2, and ζ n is a value between F x F x 2 and F nx F n x 2, ten I g g 3 = O and we n 3 2 g l+4 would ave to ask for te condition ng 6 to conclude tat I3 = o However, ng l+ tis condition is very restrictive because it is not satisfied by te optimal bandwidt g l wit l =, 2, wic is g l n l+3 We could consider 3 i= I i+i 4 and ten we would need to ask for te condition ng 4 However, tis condition is not satisfied by g l wit 29

l = In fact, it follows tat ngl 4 if l = and ng4 l if l = 2, 4, Someting similar appens wen we consider 4 i= I i+i 5 or 5 i= I i+i 6, ie, te condition required in g, it is not satisfied by te optimal bandwidt wen l = Only wen we stop in I 7, te required condition, ng 4 5, it is satisfied for all even l If equation 3 is reconsidered by te mean-value teorem, and ten we consider tat A l = T wit T = ζ g l+2ll+ n [F n y F n z F y F z]f x f x 2 dydz, it follows tat V ar A l = O However, assuming tat g, it is impossible ng 2l+2 to conclude from ere tat V ar A l = o n Acknowledgements Researc supported by Grants BES-23-7 EU ESF support included for te first autor, XUGA PGIDIT3PXIC55-PN for te second autor and BFM22-265 and MTM25-429 EU ERDF support included for bot autors Te autors would like to tank two anonymous referees and an associate editor wose comments ave elped to improve te paper substantially Tanks are also due to Sonia Pértega Díaz and Francisco Gómez Veiga from te Hospital Juan Canalejo in A Coruña for providing te prostate cancer data set References Amad, IA 22 On moment inequalities of te supremum of empirical processes wit applications to kernel estimation, Statistics & Probability Letters, 57, 25 22 Cao, R, Janssen, P and Veraverbeke, N 2 Relative density estimation wit censored data, Te Canadian Journal of Statistics, 28, 97 3

Cao, R, Janssen, P and Veraverbeke, N 2 Relative density estimation and local bandwidt selection for censored data, Computational Statistics & Data Analysis, 36, 497 5 Ćwik, J & Mielniczuk, J 993 Data-dependent bandwidt coice for a grade density kernel estimate, Statistics & Probability Letters, 6, 397 45 Gastwirt, JL 968 Te first-median test a two-sided version of te control median test, Journal of te American Statistical Association, 63, 692 76 Hall, P and Marron, JS 987 Estimation of integrated squared density derivatives, Statistics & Probability Letters, 6, 9 5 Handcock, M and Janssen, P 22 Statistical inference for te relative density, Sociological Metods & Researc, 3, 394 424 Hjort, NL and Walker, SG 2 A note on kernel density estimators wit optimal bandwidts, Statistics & Probability Letters, 54, 53 59 Holmgren, EB 996 Te P-P plot as a metod for comparing treatment effects, Journal of te American Statistical Association, 9, 36 365 Hsie, F 995 Te empirical process approac for semiparametric two-sample models wit eterogenous treatment effect, Journal of te Royal Statistical Society Series B, 57, 735 748 Hsie, F and Turnbull, BW 996 Nonparametric and semiparametric estimation of te receiver operating caracteristic curve, Te Annals of Statistics, 24, 25 4 Kakizawa, Y 24 Bernstein polynomial probability density estimation, Journal of Nonparametric Statistics, 6, 79 729 Li, G, Tiwari, RC and Wells, MT 996 Quantile comparison functions in twosample problems wit applications to comparisons of diagnostic markers, Journal of 3

te American Statistical Association, 9, 689 698 Polansky, AM and Baker, ER 2 Multistage plug-in bandwidt selection for kernel distribution function estimates, Journal of Statistical Computation & Simulation, 65, 63 8 Seater, S J and Jones, M C 99 A reliable data-based bandwidt selection metod for kernel density estimation, Journal of te Royal Statistical Society Series B, 53, 683 69 Silverman, B W 978 Density ratios, empirical likeliood and cot deat, Applied Statistics, 27, 26 33 Wand, MP and Jones, MC 995 Kernel Smooting, Capman and Hall, London 32

Table Values of EM for SJ, SJ2 and b 3c for models a-c EM Model a Model b Model c n,m SJ SJ2 b 3c SJ SJ2 b 3c SJ SJ2 b 3c 5, 5 8437 5523 282 278 772 544 7663 5742 778, 532 377 6654 6636 4542 7862 4849 359 477 2, 2 2789 2 33 486 2977 4534 2877 2246 283, 5 5487 384 762 697 4833 8796 498 3864 4982 2, 326 2443 3949 4227 3275 488 3298 26 3252 4, 2 739 346 958 253 924 273 83 49 8 5, 8237 5329 89 26 7356 42 736 5288 735, 2 528 3627 634 6462 4288 7459 4568 324 4449 2, 4 2738 923 392 3926 28 4299 2782 299 27 33

6 b a c Fig Plots of te relative densities a-c 2 8 6 4 2 8 6 4 2 2 3 4 5 6 7 8 9 Fig 2 Relative density estimate of te PC+ group wrt te PC- group for te variables tpsa solid line, SJ2 = 84, cpsa dotted line, SJ2 = 8 and fpsa dased line, SJ2 = 74 34