JOURNAL OF L A TEX CLASS FILES 1 Robust Loistic Reression with Bounded Data Uncertainties Patrick L. Harrinton Jr., Aimee Zaas, Christopher W. Woods, Geoffrey S. Ginsbur, Lawrence Carin Fellow, IEEE, and Alfred O. Hero III. Fellow, IEEE Abstract Buildin on previous work in robust optimization, we present a formulation of robust istic reression under bounded data uncertainties. The robust estimates are obtained usin block coordinate radient descent with iterative roup thresholdin, which zeros out hihly uncertain variables. For hih dimensional problems with uncertain measurements, we discuss the addition of reularization penalties such that both robustness and block sparsity are imposed in the parameter estimates. An empirical approach to estimate the uncertainty manitude is presented throuh the use of quantiles. We compare the results of l 1-Loistic Reression aainst l 1-Robust Loistic Reression on a real ene expression data set and achieve reductions in the worstcase false alarm rate and probability of error by 10% 20%, thus illustratin the value added of usin robust classifiers in risk sensitive domains when confronted with uncertain measurements. Index Terms Robust Optimization, Group Structured Reularization, Loistic Reression, Gene Expression Microarrays. I. INTRODUCTION THERE are two common methods for accommodatin uncertainty in the observed data in risk minimization problems. The first approach assumes stochastic measurement corruption, centered about the true sinal. This method is commonly known as error in variables (EIV and has a rich history in least-squares and istic reression (LR problems [1], [2], [3], [4]. Unfortunately, EIV estimators are optimistic, require solvin non-convex optimization problems, and de-reularize Hessian-like matrices makin numerical estimation less stable. The latter approach of accommodatin measurement uncertainty involves developin estimators that are robust to worst-case perturbations in the data and result in solvin well posed convex prorams [5], [6], [7], [8], [9], [10], [11]. The work presented throuhout this paper builds on the results of [7] and [5]. Specifically, we eneralize the robust optimization problem to a variety of different uncertainty P. Harrinton is with the Bioinformatics Graduate Proram and the Department of Statistics, University of Michian, Ann Arbor, MI, 48109 USA e-mail: (see http://www-personal.umich.edu/ plhjr/. Aimee Zaas, Christopher Woods, and Geoffrey S. Ginsbur are with Department of Medicine and the Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, USA. Lawrence Carin is with the Department of Electrical and Computer Enineerin, Duke University, Durham, North Carolina, USA. A. Hero is with the Department of Electrical Enineerin and Computer Science, the Department of Biomedical Enineerin, and the Department of Statistics, University of Michian, Ann Arbor, MI, 48109 USA e-mail: (see http://www.eecs.umich.edu/ hero/. sets appropriate for real problems. We present novel roupthresholdin conditions which produce block-sparse parameters when confronted with rouped uncertainty. The robust risk functions, resultin from the minimax estimation, are reularized with roup structured penalties to accommodate hih dimensional data when the underlyin sinal is both blocksparse and measurements are uncertain. A block coordinate radient descent with an active shootin speed up alorithm is presented which exploits the iterative rouped thresholdin conditions. An interestin relationship between ride LR and robust LR (RLR is presented. The robustness of ride LR is established by identifyin conditions when the uncertainty manitude of robust can be re-parameterized in terms of the ride tunin parameter such that both methods yield the same solution. Conditions on the Hessians of each method are established such that the RLR approach converes to this solution faster than ride LR. We also present an empirical approach to estimatin the uncertainty bounds usin quantiles. We conclude by presentin a illustrative example usin ene expression data and discuss how robust l 1 -reularization paths recover robust enes that were previously over-looked by standard l 1 -reularization. The worst-case probability of errors and false alarm rates of RLR are always less than or equal to those from LR. To the authors knowlede, this is the first application of a robust classifier bein applied to ene expression data. The results suest that robustification of istic classifiers can lead to sinificant performance ains in ene expression analysis. The specific contributions of this paper are as follows. 1 We extend the penalized robust istic reression formulation of [7] to accommodate roup structured variable selection (l 1 penalties. 2 We ive necessary and sufficient conditions for the solution to this modified robust istic reression problem and propose an iterative Newton-type optimization alorithm with a roup thresholdin operator. 3 We obtain a relation between this solution and the solution to ride istic reression that uses a l 2 squared penalty. 4 We obtain expressions for the Hessian of the objective functions that are minimized by robust and ride istic reression, respectively, which can be used to obtain and compare asymptotic converence rates of iterative robust and ride istic reression. 5 We show that the bounded error approaches advocated here (and in [5], [7] are natural for bioinformatics applications, or indeed any applications where there are technical replicates that can be used to prescribe estimate error bounds. 6 We illustrate
JOURNAL OF L A TEX CLASS FILES 2 concrete benefits of our approach for prediction of symptomatic infection from ene microarray data collected durin a human viral challene study. In particular, our predictor suffers sinificantly less deradation in probability of error as compared to the standard non-robustified istic predictor. The outline of the paper is as follows. In Sec. II we review the robust istic reression problem and in Sec. III we introduce a roup penalized version of this problem. In Sec. IV we describe our active shootin block coordinate descent approach to solvin the associated roup penalized optimization. In Sec. V we propose a quantile-based method to extract relevant uncertainty bounds when empirical technical replicates of the trainin data are available. In Sec. VII we evaluate the performance of the proposed method usin simulations and a real ene microarray data set collected by our roup. Finally, in Sec. VIII we state our principal conclusions. II. ROBUST LOGISTIC REGRESSION The oal of robust istic reression (RLR is to extract an estimator by minimizin the worst-case errors in measurements on the LR loss function (binomial deviance subject to bounded uncertainty. The eneral case of RLR with spherical uncertainty, previously explored by [7], involves n measured trainin variables {x i, y i } n, x i R p, y i { 1, +1}, and adopts a minimax formulation max {δi} n n (1 + e yi(βt (x i+δ i+β 0 subject to δ i l2 ρ i = 1,..., n. (1 Here, the true sinals are iven by {z i = x i + δ i } n, δ i is not observed, and the parameter ρ is the manitude of the worst-case perturbation. The minimax problem in (1 is solved by first solvin the inner maximization step analytically, resultin in a convex RLR loss function which is then minimized with respect to β, β 0. Note that the maximization over each of the n perturbations, δ i can be moved within the sum loss function in (1 n max δi (1 + e yi(βt (x i+δ i+β 0 subject to δ i l2 ρ i = 1,..., n. (2 We would like to reduce the minimax problem in (2 to a closed form minimization problem over β as performed in [5]. We bein by notin the followin y i β T δ i β l2 δ i l2 β l2 ρ. (3 Given that the loss function is monotonic in y i β T δ i, we have the followin upper-bound on the binomial deviance for the i th sample: (1 + e yi(βt (x i+δ i+β 0 Φ = (1 + e yi(βt x i+β 0+ρ β l2. (4 The upper-bound in both (3 and (4 is achievable for δ i collinear with β, i.e., δ i = γ i β for some alinment parameter γ i, thus yieldin the solution to the constrained maximization step for the i th observation (1 + e yi(βt (x i+δ i+β 0 max δi, δ i l2 ρ = (1 + e yi(βt x i+β 0+ρ β l2. (5 We may now proceed with obtainin the robust estimated normal vector β correspondin to the binomial deviance via the followin unconstrained minimization problem: n (1 + e yi(βt x i+β 0+ρ β l2. (6 Geometrically, the robust binomial deviance penalizes points based upon their distance to the uncertainty marins ±ρ β 2, i.e., a point is penalized more for bein the same distance away from the hyperplane under the robust formulation than standard LR. This pessimism is intuitive as observations that are close to the decision boundary could have their true value lyin on the misclassified side under worst-case perturbations (see Fiure 1. ρ ρ ρ β T x + β 0 = ρ β l2 β T x + β 0 =0 β T x + β 0 = ρ β l2 Fi. 1. Bounded uncertainty modification penalizes based on potentially mis-classified points translatin istic reression loss to penalize based on marins III. ROBUST LOGISTIC REGRESSION WITH GROUP STRUCTURED UNCERTAINTY SETS In practice, joint spherical uncertainty may be inappropriate for modelin real data. A more appropriate form of uncertainty occurs when it affects roups of variables. Here, we assume that perturbations have roup structure and are applied to G disjoint subsets of the p variables. These assumptions produce the followin robust optimization problem n max δi (1 + e yi(βt (x i+δ i+β 0 subject to { δ i, l2 ρ } G =1 i = 1,..., n (7 where δ i = {δ i, } G =1 with δ i, R I and I I, I = {1,..., p}, are the set indices correspondin to the variables
JOURNAL OF L A TEX CLASS FILES 3 in roup. We will assume that I I = for. Note that that loss function is monotonic in G =1 y iβ T δ i,, and therefore, under worst-case perturbations, we have G y i β T δ i, =1 G β l2 δ i, l2 =1 G ρ β l2. (8 =1 The inner maximization step in (7 is achieved when the upper bounds in (8 is tiht, which is when β is colinear with δ i,. Therefore, as in (5, we have analytically computed the innermaximization step, and thus, our problem reduces to solvin the followin: n (1 + e yi(βt x G i+β 0+ =1 ρ β l 2. (9 The term within the arument of the RLR loss function is the roup lasso penalty (9 which tends to promotes spareness in the roups (or factors when used to reularize convex risk functions [12], [13]. When the number of roups is G = p (each variable in its own roup, the perturbations are interval based, and the roup lasso penalty is equivalent to the l 1 -penalty. In this case the optimization problem (9 becomes, when ρ = ρ, for all = 1,..., p, n (1 + e yi(βt x i+β 0+ρ β l1. (10 The minimization problem in (10 was previously treated in the context of interval perturbations in [7]. A. Reularized Robust Loistic Reression Penalties such as the l 1 -norm are used in hih-dimensional data settins, as they tend to zero out many of elements in β, which may better represent the structure of the underlyin sinal. Many fields of research increasinly involve hihdimensional data measurements that are obtained under noisy measurement conditions, such as ene expression microarrays that measure the activity of thousands of enes by assayin the abundances of mrna in the sample. Here we develop new istic classifiers that have the combined advantaes of sparsity in variables and robustness to measurement uncertainty. We will assume that the roup structure of the reularization penalties coincide with the structure of uncertainty sets. For an arbitrary set of G disjoint roups, the followin reularized robust solution is min β,β 0 n (1 + e y if i + G =1 ρ β l2 + G λ β l2 =1 (11 with f i = β T x i + β 0. The presence of the additional rouplasso penalty should promote block-sparsity in β while bein robust to measurement error affectin the same variables in the roup. IV. COMPUTATIONS FOR REGULARIZED ROBUST LOGISTIC REGRESSION Here, we present a numerical solution to the eneral reularized RLR problem based on block co-ordinate radient descent with an active shootin step to speed up the converence time when confronted with sparse sinals or many roups. Since the penalized loss function in (11 is convex, denoted by L ρ,λ, we can iteratively obtain β via block coordinate radient descent. However, the radient of (11 does not exist at β = 0, and thus we must resort to sub-radient methods to identify optimality conditions [14], [13], [15]. The necessary and sufficient conditions for β to be a valid solution of (11 require X T A ρ y + (ρ tr (A ρ + λ β β l2 = 0, β 0 X T A β,ρy l2 ( ρ tr(a β,ρ + λ, β = 0 with A ρ = (I + K ρ 1, K ρ = dia (e yi(βt x G i+β 0 =1 ρ β l 2. Note that A β,ρ means that β is set to 0 in K ρ. The roup thresholdin that arises from these optimality conditions appears in roup lasso reularized problems [12], [13], [15] and to the authors knowlede, has not been previously extracted in the context of RLR [7]. While RLR uses a different loss function than the binomial deviance, the thresholdin conditions (above establish the relationship between reularizin a standard convex risk function with a non-differentiable penalty and the sparse solutions that tend to appear when applyin robust linear classifiers to uncertain data [7]. It is intuitive that the thresholdin conditions depend on both the uncertainty manitude ρ and the sparseness penalty parameter λ. The proposed block co-ordinate radient descent consists of updatin the th roup parameters by initially computin a Newton-step δβ (m [ = 2 β L (m ρ,λ ] 1 β L (m ρ,λ (12 followed by performin a backtrackin line search ([16] for appropriate step size ν (m > 0, and then updatin β (m+1 β (m+1 β (m + ν (m δβ (m. (13 The numerical solution to solvin (11 is outlined below in Alorithm 1. The active shootin [17] step updates the parameters that were non-zero after the initial step until converence. After this subset has convered, radient descent is performed over all the variables. This preferential update tends to reach the lobal minima faster when confronted with many roups. Alorithm 1: Active Shootin Block Coordinate Descent with Group Thresholdin 1 Initialize: a β (1 0 ν (0 0 δβ(0 0 with all parameters set to zero b β (1 ν (0 δβ (0, for = 1,..., G, with β 0 evaluated at β (0 0 and all other parameters set to zero
JOURNAL OF L A TEX CLASS FILES 4 2 Define the active set Λ = { : β (0 0} 3 β (m+1 0 β (m 0 + ν (m 0 δβ (m 0 with δβ (m 0 via (12, ν (m 0 by performin backtrackin, and β held at previous value 4 For Λ a if X T A β,ρy l2 ρ tr ( A β,ρ + λ, β (m+1 0 b else, evaluate δβ (m from (12 while holdin all other parameters at previous values, compute step size ν (m via backtrackin, and update β (m+1 β (m + ν (m δβ (m 5 Repeat steps 3 and 4 until some converence criteria met for active parameters in Λ. 6 If converence criteria satisfied, define Λ = {1,..., G} and repeat 3 and 4 until converence in all parameters. V. EMPIRICAL ESTIMATION OF UNCERTAINTY The manitude of the potential uncertainty is determined by ρ. There are situations when a researcher has prior knowlede on the value of ρ but more often this parameter must be estimated empirically. In modern biomedical experiments, in which ene expression microarrays are used to assay the activity of tens of thousands of enes, technical replicates are frequently obtained to assess the effect of of measurement uncertainty. We will estimate ρ usin a eneralization of the method in [18] based on quantiles. For rouped uncertainty sets, we estimate ρ via the followin ˆρ (α = inf τ P( x T x τ = α (14 where the distribution in (14 is taken with respect to a data set independent of the trainin data, such as technical replicates of a bioical experiment. Note that (14 reduces to interval based quantile estimates when the number of roups G = p. As the cumulative distribution function (CDF in (14 does not depend on class label y, we obtain (14 by P(z τ = P(z τ Y = yp(y = y (15 y { 1,+1} with z = x T x and data centered about their respective class centroids. The class priors are estimated empirically by ˆP(Y = y = m y /m, where m y and m are the number of replicate samples with label y and total number of replicate samples, respectively. The estimation in (14 can be assessed with respect to the empirical CDF of the data {x i, } n or approximated by the inverse-cdf of the χ p distribution [18] with p = I derees of freedom. The application of the proposed estimation of uncertainty bounds is presented below in the context of hih-dimensional ene expression data in which technical replicates are available. VI. ROBUST VS. RIDGE REGRESSION A. Robustness of Ride Loistic Reression One important question is how the proposed RLR formulation relates to Ride LR. Ride LR involves addin a squared l 2 -penalty to the binomial deviance loss function: n min β (1 + e yiβt x i + λ 2 β 2 l 2. (16 Our oal is to identify values of ρ as a function of λ for which both robust (1 and ride (16 produce approximately identical estimates of β. We bein by inspectin the optimality conditions for β. The radients for ride and robust, respectively, are iven as (intercept removed for clarity: β L λ = X T W y + λβ (17 β L ρ = X T β W ρ y + ρtr(w ρ. β l2 (18 where W = dia( e y i βt x i 1+e y i βt x i and W ρ = dia( e y i βt x i +ρ β l2 1+e y i βt x i +ρ β l2. If we linearize the simoidal terms in W and W ρ, the scaled radients can be approximated by the followin: ( 1 n 1 β L λ 4n XT X + (λ/ni β 1 2 δ x n with δ x n = n 1 (n + x + n x = H n,λ β 1 2 δ x n (19 n 1 β L ρ 1 4n XT Xβ 1 2 δ x n + 1 4 ρ2 β ( ρ + (1 βt δ x n β 1 2 β l2 2 2 β 2 l 2 δ x n = c β + a β ρ 2 + b β ρ. (20 Since the radient for ride reression can be approximated by the linear system of equations in (19, we can approximate β λ via ˆβ λ 1 2 H 1 n,λ δ x n. (21 Insertin (21 into (20, summin (20 over its elements by an inner product with a vector of 1 s, and solvin for ρ, produces the followin quadratic equation of ρ in terms of λ ( 2 1 T b ˆβλ ± 1 T b ˆβλ 4 1T a ˆβλ 1T c ˆβλ ρ λ 2 1 T. (22 a ˆβλ The positive value of ρ λ is the uncertainty manitude. This establishes equivalence between the solutions of ride and robust istic reression for a iven λ. B. Converence Rates of Ride and Robust We established conditions of equivalence between ride and robust LR estimators throuh a re-parameterization of ρ in terms of λ (22. It is also of interest to investiate the rate at which these methods convere under the proposed conditions of equivalence. For ease of exposition in the analysis, we will assume that the data has been centered and there are balanced sample sizes (n + = n. We bein by assumin that the radients of both ride LR (19 and RLR (20 are in some neihborhood about 0. The structure
JOURNAL OF L A TEX CLASS FILES 5 of the Hessian specify rates of converence throuh the eienvalues. Under the assumptions detailed above, the scaled Hessian for the ride case is obtained from differentiatin (19 is iven by n 1 2 βl λ 1 4n XT X + (λ/ni. (23 1 By differentiatin (20 and notin that 8 βt δ x n 1 2, we obtain the scaled Hessian for robust istic reression n 1 2 βl ρ ( 1 4n XT X + 1 4 ρ2 I with positive semi-definite matrix A ( 1 A = I 1 2 β l2 β 2 ββ T l 2 and positive semi-definite matrix B B = + ρ (A B (24 (25 1 8 β l2 δ x n β T. (26 We are interested in extractin conditions on ρ such that the Hessian correspondin to the robust binomial deviance is larer than that correspondin to ride, i.e., H = 2 β L ρ 2 β L λ 0, when both methods yield the same β, or equivalently w T Hw 0, w. (27 One can directly derive necessary conditions that must be satisfied by H by enforcin that (27 holds for some choice of w. We take w = δ x, which is the riht eienvector of B, and substitute ˆβ λ (21 and ρ λ (22 for β and ρ, respectively in (27. The conditions such that robust istic reression converes faster to the solution of ride istic reression are ρ 2 λ + 2(1 ɛ λ ˆβ λ l2 ρ λ 4λ/n (28 where ɛ λ is iven by ( ( 2 ɛ 1 λ = ˆβT λ δ x δ x 2 l 2 ˆβ + 1. (29 λ 2 4 l 2 VII. NUMERICAL RESULTS A. Recovery of Reularization Path Under Sinal Corruption Many researchers in machine learnin and bioinformatics assess importance of various biomarkers based upon the orderin of l 1 -reularization paths. Thus, it is worthwhile to explore how the orderin of the variables within the reularization path chane as uncertainty is added. We first show a simple synthetic example there are n = 100 samples in each class with x R p, p = 22, with x N (µ y, Σ. The class centroids are iven by µ +1 = [ 1 2, 1 3, 1 2, 1 2, 1 2, 1 2, 0,, 0]T and µ 1 = [ 1 2, 1 3, 0,, 0, 0, 0, 0, 0]T. The structure of Σ is block diaonal, where diaonal components {σ ii } p are equal to one and the block structure affects only the variables x 1, x 2, x 3, x 4, x 5, x 6, where the off-diaonal elements in this block {σ ij } 6 j=1,j i are equal to 0.1. The other off-diaonal elements have covariance of zero. Fiure 2(a represents the l 1 -reularization path as a function of 10 λ on the oriinal data. In Fiure 2(b, x 3 has been perturbed such that x 3 x 3 +δ 3 with ρ δ 3 ρ, which leads to a shift in the orderin within the reularization path under normal l 1 -LR. Usin the perturbed data set and knowlede that ρ = 0.1 for x 3 and presented with interval uncertainty, the l 1 -reularized robust istic reression recovers the oriinal orderin of the variables (see Fiure 2(c. B. Human Rhino Virus Gene Expression Data Here we present numerical results on peripheral blood ene expression data set from a roup of n = 20 patients inoculated with the Human Rhino Virus (HRV, the typical aent of the common cold [19]. Half of the patients responded with symptoms (y = +1 and the other half did not (y = 1. The oriinal 12, 023 enes on the Affymetrix olionucleotide microarray were reduced to p = 129 differentially expressed enes controllin for a 20% False Discovery Rate (FDR usin the same methodoy as applied in ([20]. We will reularize both the robust binomial deviance and standard binomial deviance with an l 1 -penalty to control the sparsity of the resultin model forcin many elements in β to be zero. In this experiment there were approximately 20 microarray chips that were technical replicates. The technical replicates were used to estimate the interval uncertainty bounds usin (14. Since olionucleotide microarray devices detect the presence of mrna abundance by includin thousands of different probe sets (short sequences that bind to a particular mrna molecule, produced from a specific ene, we will adopt interval uncertainty across the p enes. The estimated interval uncertainty bounds, { ˆρ i (α} p, were obtained from (14 usin the empirical CDF of the independent technical replicate data with linear interpolation between sampled data. The CDF in (14 was computed by (15 with each conditional CDF centered about their class dependent sample mean and equal class priors ˆP(Y = +1 = ˆP(Y = 1 = 1/2. Technical replicates are enerated from the blood sample used to assess ene expression activity in the trainin data and thus, are commonly used to isolate the effect of measurement error. We explored the effect of interval uncertainty bounds resultin from quantiles of (14 at levels of α = {0.25, 0.50, 0.75, 0.95, 0.99}. Fiure 3 shows the l 1 -reularization paths obtained by solvin (11 on the trainin data for different interval uncertainty (includin no uncertainty correspondin to standard l 1 -LR and illustrates how robustness affects the orderin of the enes. We see from Fiure 3(a that the first ene to appear in the reularization path is the anti-viral defense ene RSAD2. RSAD2 persists at the first ene in the reularization path for α = {0, 0.25, 0.50, 0.75} (latter three not shown but disappears from the reularization path completely, alon with a few other enes, in Fiure 3(a, when α = {0.95, 0.99}.
JOURNAL OF L A TEX CLASS FILES 6 "#!"#!!!"# "# &!!"#!!"# "# % "#!"#!!!"# (a l 1 -Reularization path on oriinal data "# &!!"#!!"# "# % (b l 1 -Reularization path with bounded perturbations in x 3 and ρ 3 = 0.1 "#!"#!!!"# & & & '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% response to viral infection barely appears in Fiure 3(a, yet is the first ene to appear in the robust reularization paths that is common to both α = {0.95, 0.99} (see Fiures 3(b and 3(c. These results suest that when assessin variable importance via reularization paths, one should be aware of the effect of measurement uncertainty on the orderin of the variables. The robust formulation within this paper aims at minimizin the worst-case confiuration of perturbations on the istic reression loss function. As our oal is discriminatin between two phenotypes, it is of interest to compare the worst-case probability of error for LR and RLR on perturbed data sets, i.e., x i x i + δ i, δ i ˆρ i (α, after trainin on the oriinal data. The l 1 -tunin parameter λ for both standard l 1 -LR and l 1 -RLR was chosen to minimize the out-of sample probability of error via 5-fold cross validation. Given the cross validated value of λ for both methods, the models were then fit to the entire set of trainin data. 50,000 perturbed data matrices were then enerated subject to interval uncertainty bounds { ˆρ i (α} p for α = {0.25, 0.50, 0.75, 0.95, 0.99}. The bestcase and worst-case probability of error were recorded in Table I. We see for all values of α explored, the worst-case probability of error correspondin to l 1 -RLR is always less than or equal to that of l 1 -LR. When α = {0.95, 0.99}, the worst-case probability of error for l 1 -LR is 0.30 but reduces to 0.20 when usin l 1 -RLR with interval uncertainty estimated from the data usin (14. Correspondin to the worst-case probability of errors are the sensitivity and 1-specificity, iven in Table II. We see that l 1 -RLR achieves the same power as l 1 -LR at false alarm rates less than or equal to that of the non-robust method. The proposed method can be used to reduce classifier error sensitivity in lare scale classification problems. This can be important in practical applications such as biomarker discovery and predictive health and disease. TABLE I BEST AND WORST CASE PROBABILITY OF ERROR, P e, FOR THE HRV DATA SET α l 1 -LR l 1 -Robust LR min P e max P e min P e max P e 0.25 0.10 0.15 0.10 0.15 0.50 0.05 0.20 0.10 0.20 0.75 0.10 0.25 0.10 0.25 0.95 0.00 0.30 0.05 0.20 0.99 0.00 0.30 0.00 0.20 "# &!!"#!!"# "# % (c l 1 -Robust reularization path with ρ 3 = 0.1 Fi. 2. Reularization paths as a function of 10 λ: robust recovers oriinal orderin after perturbation The three robust enes that persist across all explored values of α are ADI1, OAS1, and TUBB2A. Of these three, OAS1 (codes for proteins involved in the innate immune VIII. CONCLUSION Buildin on the results of [7] and [5], we have formulated the robust istic reression problem with roup structured uncertainty sets. By addin reularization penalties, one can enforce block-sparsity assumptions of the underlyin sinal. We have presented a block co-ordinate radient method with iterative rouped thresholdin for solvin the penalized RLR problems. The roup thresholdin is affected by both the roup-lasso penalty and the manitude of the worst-case uncertainty. Thus, RLR tends to promote thresholdin of hihly
JOURNAL OF L A TEX CLASS FILES 7 " # %!#!"!!!'!& # &' %&' %!%&' &'!# &' %&' % 123 445" 463"7 89:8# :1; <=17> 5;12# ;?5<3@?# AB>>#1 =@@!!!"!# % (* +!,! 0 % -./ (a α = 0 123 456#7%% 518 9:;:<7= 9>==#1!!!"!# % (* +!,! 0 % -./ (b α = 0.95 TABLE II SENSITIVITY AND 1-SPECIFICITY CORRESPONDING TO WORST CASE PROBABILITY OF ERROR FROM HRV DATA α l 1 -LR l 1 -Robust LR Sens. 1-Spec. Sens. 1-Spec. 0.25 0.70 0.00 0.70 0.00 0.50 0.70 0.10 0.70 0.10 0.75 0.70 0.20 0.70 0.20 0.95 0.70 0.20 0.70 0.10 0.99 0.70 0.30 0.70 0.10 approach was applied to a real ene expression data set where quantile estimation was applied to a set of technical replicates. The numerical results on this data set establish that the proposed approach can yield lower worst-case probability of error and lower false alarm rates. Such a ain in worstcase detection performance can improve the performance for predictive health and disease, and in particular for predictin patient phenotype. It will be interestin to explore the situation when the roup-structure of the uncertainty differs from that of the reularization penalty. This situation could potentially be solved by modifyin the sparse roup-lasso solution as in [21]. It is also of interest to explore the kernelization of this method in which one can study the propaation of bounded data uncertainty in a reproducin kernel Hilbert space. ACKNOWLEDGMENT The authors sincerely appreciate the thouhtful insiht of Ami Wiesel in developin the ideas presented in this paper. The authors also appreciate the valuable suestions made by Ji Zhu, Kerby Shedden, and Clayton Scott. The views, opinions, and/or findins contained in this article/presentation are those of the author/presenter and should not be interpreted as representin the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Aency or the Department of Defense. A (Approved for Public Release, Distribution Unlimited!%&' &' 123 245'6 789:# ;1< <3=:>8?@AA#1!!!"!# % (* +!,! 0 % -./ (c α = 0.99 Fi. 3. Reularization paths on the HRV data set as a function of 10 λ/λ max for different manitudes of interval uncertainty, as determined by the α quantile uncertain variables, essentially performin an initial step of variable selection. We have proposed an empirical approach to estimatin the uncertainty manitude usin quantiles. This REFERENCES [1] A. Wiesel, Y.C. Eldar, and A. Yeredor, Linear reression with Gaussian model uncertainty: Alorithms and bounds, IEEE Transactions on Sinal Processin, vol. 56, no. 6, pp. 2194 2205, 2008. [2] A. Wiesel, Y.C. Eldar, and A. Beck, Maximum likelihood estimation in linear models with a Gaussian model matrix, IEEE Sinal Processin Letters, vol. 13, no. 5, pp. 292, 2006. [3] R.J. Carroll, C.H. Spieelman, K.K.G. Lan, K.T. Bailey, and R.D. Abbott, On errors-in-variables for binary reression models, Biometrika, vol. 71, no. 1, pp. 19 25, 1984. [4] Sabine Van Huffel, Ivan Markovsky, Richard J. Vaccaro, and Torsten Söderström, Total least squares and errors-in-variables modelin., Sinal Processin, vol. 2007, 2007. [5] S. Chandrasekaran, G. H. Golub, M. Gu, and A. H. Sayed, Parameter estimation in the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl., vol. 19, no. 1, pp. 235 252, 1998. [6] Laurent El Ghaoui and Hervé Lebret, Robust solutions to least-squares problems with uncertain data, SIAM J. Matrix Anal. Appl., vol. 18, no. 4, pp. 1035 1064, 1997.
JOURNAL OF L A TEX CLASS FILES 8 [7] Laurent El Ghaoui, Gert R. G. Lanckriet, and Geores Natsoulis, Robust classification with interval data, Tech. Rep. UCB/CSD-03-1279, EECS Department, University of California, Berkeley, Oct 2003. [8] Gert R. G. Lanckriet, Laurent E Ghaoui, Chiranjib Bhattacharyya, and Michael I. Jordan, A robust minimax approach to classification, Journal of Machine Learnin Research, vol. 3, 2002. [9] Seun-Jean Kim, Alessandro Manani, and Stephen P. Boyd, Robust fisher discriminant analysis, in In Advances in Neural Information Processin Systems. 2006, MIT Press. [10] Huan Xu, Shie Mannor, and Constantine Caramanis, Robustness, risk, and reularization in support vector machines, CoRR, vol. abs/0803.3490, 2008. [11] Aharon Ben-Tal, Laurent E. Ghaoui, and Arkadi Nemirovski, Robust Optimization (Princeton Series in Applied Mathematics, Princeton University Press, 2009. [12] Min Yuan, Min Yuan, Yi Lin, and Yi Lin, Model selection and estimation in reression with rouped variables, Journal of the Royal Statistical Society, Series B, vol. 68, pp. 49 67, 2006. [13] Lukas Meier, Sara van de Geer, and Peter Bühlmann, The roup lasso for istic reression, Journal Of The Royal Statistical Society Series B, vol. 70, no. 1, pp. 53 71, 2008. [14] Dimitri P. Bertsekas, Nonlinear Prorammin, Athena Scientific, 1999. [15] Volker Roth and Bernd Fischer, The roup-lasso for eneralized linear models: uniqueness of solutions and efficient alorithms, in ICML 08: Proceedins of the 25th international conference on Machine learnin, 2008, pp. 848 855. [16] Stephen Boyd and Lieven Vandenberhe, Convex Optimization, Cambride University Press, 2004. [17] Jie Pen, Pei Wan, Nenfen Zhou, and Ji Zhu, Partial correlation estimation by joint sparse reression models, Journal of the American Statistical Association, vol. 104, no. 486, pp. 735 746, 2009. [18] A. Wiesel, Y.C. Eldar, and S. Shamai, Optimization of the mimo compound capacity, IEEE Transactions on Wireless Communications, vol. 6, no. 3, pp. 1094, 2007. [19] A.K. Zaas, M. Chen, J. Varkey, T. Veldman, A.O. Hero, J. Lucas, Y. Huan, R. Turner, A. Gilbert, R. Lambkin-Williams, et al., Gene Expression Sinatures Dianose Influenza and Other Symptomatic Respiratory Viral Infections in Humans, Cell Host & Microbe, vol. 6, no. 3, pp. 207 217, 2009. [20] A. Swaroop, A.J. Mears, G. Fleury, and A.O. Hero, Multicriteria Gene Screenin for Analysis of Differential Expression with DNA Microarrays, EURASIP Journal on Advances in Sinal Processin, 2004. [21] J. Friedman, T. Hastie, and R. Tibshirani, A note on the roup lasso and a sparse roup lasso, arxiv:1001.0736v1, 2010.