THERE are two common methods for accommodating

Size: px
Start display at page:

Download "THERE are two common methods for accommodating"

Transcription

1 JOURNAL OF L A TEX CLASS FILES 1 Robust Loistic Reression with Bounded Data Uncertainties Patrick L. Harrinton Jr., Aimee Zaas, Christopher W. Woods, Geoffrey S. Ginsbur, Lawrence Carin Fellow, IEEE, and Alfred O. Hero III. Fellow, IEEE Abstract Buildin on previous work in robust optimization, we present a formulation of robust istic reression under bounded data uncertainties. The robust estimates are obtained usin block coordinate radient descent with iterative roup thresholdin, which zeros out hihly uncertain variables. For hih dimensional problems with uncertain measurements, we discuss the addition of reularization penalties such that both robustness and block sparsity are imposed in the parameter estimates. An empirical approach to estimate the uncertainty manitude is presented throuh the use of quantiles. We compare the results of l 1-Loistic Reression aainst l 1-Robust Loistic Reression on a real ene expression data set and achieve reductions in the worstcase false alarm rate and probability of error by 10% 20%, thus illustratin the value added of usin robust classifiers in risk sensitive domains when confronted with uncertain measurements. Index Terms Robust Optimization, Group Structured Reularization, Loistic Reression, Gene Expression Microarrays. I. INTRODUCTION THERE are two common methods for accommodatin uncertainty in the observed data in risk minimization problems. The first approach assumes stochastic measurement corruption, centered about the true sinal. This method is commonly known as error in variables (EIV and has a rich history in least-squares and istic reression (LR problems [1], [2], [3], [4]. Unfortunately, EIV estimators are optimistic, require solvin non-convex optimization problems, and de-reularize Hessian-like matrices makin numerical estimation less stable. The latter approach of accommodatin measurement uncertainty involves developin estimators that are robust to worst-case perturbations in the data and result in solvin well posed convex prorams [5], [6], [7], [8], [9], [10], [11]. The work presented throuhout this paper builds on the results of [7] and [5]. Specifically, we eneralize the robust optimization problem to a variety of different uncertainty P. Harrinton is with the Bioinformatics Graduate Proram and the Department of Statistics, University of Michian, Ann Arbor, MI, USA (see plhjr/. Aimee Zaas, Christopher Woods, and Geoffrey S. Ginsbur are with Department of Medicine and the Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, USA. Lawrence Carin is with the Department of Electrical and Computer Enineerin, Duke University, Durham, North Carolina, USA. A. Hero is with the Department of Electrical Enineerin and Computer Science, the Department of Biomedical Enineerin, and the Department of Statistics, University of Michian, Ann Arbor, MI, USA (see hero/. sets appropriate for real problems. We present novel roupthresholdin conditions which produce block-sparse parameters when confronted with rouped uncertainty. The robust risk functions, resultin from the minimax estimation, are reularized with roup structured penalties to accommodate hih dimensional data when the underlyin sinal is both blocksparse and measurements are uncertain. A block coordinate radient descent with an active shootin speed up alorithm is presented which exploits the iterative rouped thresholdin conditions. An interestin relationship between ride LR and robust LR (RLR is presented. The robustness of ride LR is established by identifyin conditions when the uncertainty manitude of robust can be re-parameterized in terms of the ride tunin parameter such that both methods yield the same solution. Conditions on the Hessians of each method are established such that the RLR approach converes to this solution faster than ride LR. We also present an empirical approach to estimatin the uncertainty bounds usin quantiles. We conclude by presentin a illustrative example usin ene expression data and discuss how robust l 1 -reularization paths recover robust enes that were previously over-looked by standard l 1 -reularization. The worst-case probability of errors and false alarm rates of RLR are always less than or equal to those from LR. To the authors knowlede, this is the first application of a robust classifier bein applied to ene expression data. The results suest that robustification of istic classifiers can lead to sinificant performance ains in ene expression analysis. The specific contributions of this paper are as follows. 1 We extend the penalized robust istic reression formulation of [7] to accommodate roup structured variable selection (l 1 penalties. 2 We ive necessary and sufficient conditions for the solution to this modified robust istic reression problem and propose an iterative Newton-type optimization alorithm with a roup thresholdin operator. 3 We obtain a relation between this solution and the solution to ride istic reression that uses a l 2 squared penalty. 4 We obtain expressions for the Hessian of the objective functions that are minimized by robust and ride istic reression, respectively, which can be used to obtain and compare asymptotic converence rates of iterative robust and ride istic reression. 5 We show that the bounded error approaches advocated here (and in [5], [7] are natural for bioinformatics applications, or indeed any applications where there are technical replicates that can be used to prescribe estimate error bounds. 6 We illustrate

2 JOURNAL OF L A TEX CLASS FILES 2 concrete benefits of our approach for prediction of symptomatic infection from ene microarray data collected durin a human viral challene study. In particular, our predictor suffers sinificantly less deradation in probability of error as compared to the standard non-robustified istic predictor. The outline of the paper is as follows. In Sec. II we review the robust istic reression problem and in Sec. III we introduce a roup penalized version of this problem. In Sec. IV we describe our active shootin block coordinate descent approach to solvin the associated roup penalized optimization. In Sec. V we propose a quantile-based method to extract relevant uncertainty bounds when empirical technical replicates of the trainin data are available. In Sec. VII we evaluate the performance of the proposed method usin simulations and a real ene microarray data set collected by our roup. Finally, in Sec. VIII we state our principal conclusions. II. ROBUST LOGISTIC REGRESSION The oal of robust istic reression (RLR is to extract an estimator by minimizin the worst-case errors in measurements on the LR loss function (binomial deviance subject to bounded uncertainty. The eneral case of RLR with spherical uncertainty, previously explored by [7], involves n measured trainin variables {x i, y i } n, x i R p, y i { 1, +1}, and adopts a minimax formulation max {δi} n n (1 + e yi(βt (x i+δ i+β 0 subject to δ i l2 ρ i = 1,..., n. (1 Here, the true sinals are iven by {z i = x i + δ i } n, δ i is not observed, and the parameter ρ is the manitude of the worst-case perturbation. The minimax problem in (1 is solved by first solvin the inner maximization step analytically, resultin in a convex RLR loss function which is then minimized with respect to β, β 0. Note that the maximization over each of the n perturbations, δ i can be moved within the sum loss function in (1 n max δi (1 + e yi(βt (x i+δ i+β 0 subject to δ i l2 ρ i = 1,..., n. (2 We would like to reduce the minimax problem in (2 to a closed form minimization problem over β as performed in [5]. We bein by notin the followin y i β T δ i β l2 δ i l2 β l2 ρ. (3 Given that the loss function is monotonic in y i β T δ i, we have the followin upper-bound on the binomial deviance for the i th sample: (1 + e yi(βt (x i+δ i+β 0 Φ = (1 + e yi(βt x i+β 0+ρ β l2. (4 The upper-bound in both (3 and (4 is achievable for δ i collinear with β, i.e., δ i = γ i β for some alinment parameter γ i, thus yieldin the solution to the constrained maximization step for the i th observation (1 + e yi(βt (x i+δ i+β 0 max δi, δ i l2 ρ = (1 + e yi(βt x i+β 0+ρ β l2. (5 We may now proceed with obtainin the robust estimated normal vector β correspondin to the binomial deviance via the followin unconstrained minimization problem: n (1 + e yi(βt x i+β 0+ρ β l2. (6 Geometrically, the robust binomial deviance penalizes points based upon their distance to the uncertainty marins ±ρ β 2, i.e., a point is penalized more for bein the same distance away from the hyperplane under the robust formulation than standard LR. This pessimism is intuitive as observations that are close to the decision boundary could have their true value lyin on the misclassified side under worst-case perturbations (see Fiure 1. ρ ρ ρ β T x + β 0 = ρ β l2 β T x + β 0 =0 β T x + β 0 = ρ β l2 Fi. 1. Bounded uncertainty modification penalizes based on potentially mis-classified points translatin istic reression loss to penalize based on marins III. ROBUST LOGISTIC REGRESSION WITH GROUP STRUCTURED UNCERTAINTY SETS In practice, joint spherical uncertainty may be inappropriate for modelin real data. A more appropriate form of uncertainty occurs when it affects roups of variables. Here, we assume that perturbations have roup structure and are applied to G disjoint subsets of the p variables. These assumptions produce the followin robust optimization problem n max δi (1 + e yi(βt (x i+δ i+β 0 subject to { δ i, l2 ρ } G =1 i = 1,..., n (7 where δ i = {δ i, } G =1 with δ i, R I and I I, I = {1,..., p}, are the set indices correspondin to the variables

3 JOURNAL OF L A TEX CLASS FILES 3 in roup. We will assume that I I = for. Note that that loss function is monotonic in G =1 y iβ T δ i,, and therefore, under worst-case perturbations, we have G y i β T δ i, =1 G β l2 δ i, l2 =1 G ρ β l2. (8 =1 The inner maximization step in (7 is achieved when the upper bounds in (8 is tiht, which is when β is colinear with δ i,. Therefore, as in (5, we have analytically computed the innermaximization step, and thus, our problem reduces to solvin the followin: n (1 + e yi(βt x G i+β 0+ =1 ρ β l 2. (9 The term within the arument of the RLR loss function is the roup lasso penalty (9 which tends to promotes spareness in the roups (or factors when used to reularize convex risk functions [12], [13]. When the number of roups is G = p (each variable in its own roup, the perturbations are interval based, and the roup lasso penalty is equivalent to the l 1 -penalty. In this case the optimization problem (9 becomes, when ρ = ρ, for all = 1,..., p, n (1 + e yi(βt x i+β 0+ρ β l1. (10 The minimization problem in (10 was previously treated in the context of interval perturbations in [7]. A. Reularized Robust Loistic Reression Penalties such as the l 1 -norm are used in hih-dimensional data settins, as they tend to zero out many of elements in β, which may better represent the structure of the underlyin sinal. Many fields of research increasinly involve hihdimensional data measurements that are obtained under noisy measurement conditions, such as ene expression microarrays that measure the activity of thousands of enes by assayin the abundances of mrna in the sample. Here we develop new istic classifiers that have the combined advantaes of sparsity in variables and robustness to measurement uncertainty. We will assume that the roup structure of the reularization penalties coincide with the structure of uncertainty sets. For an arbitrary set of G disjoint roups, the followin reularized robust solution is min β,β 0 n (1 + e y if i + G =1 ρ β l2 + G λ β l2 =1 (11 with f i = β T x i + β 0. The presence of the additional rouplasso penalty should promote block-sparsity in β while bein robust to measurement error affectin the same variables in the roup. IV. COMPUTATIONS FOR REGULARIZED ROBUST LOGISTIC REGRESSION Here, we present a numerical solution to the eneral reularized RLR problem based on block co-ordinate radient descent with an active shootin step to speed up the converence time when confronted with sparse sinals or many roups. Since the penalized loss function in (11 is convex, denoted by L ρ,λ, we can iteratively obtain β via block coordinate radient descent. However, the radient of (11 does not exist at β = 0, and thus we must resort to sub-radient methods to identify optimality conditions [14], [13], [15]. The necessary and sufficient conditions for β to be a valid solution of (11 require X T A ρ y + (ρ tr (A ρ + λ β β l2 = 0, β 0 X T A β,ρy l2 ( ρ tr(a β,ρ + λ, β = 0 with A ρ = (I + K ρ 1, K ρ = dia (e yi(βt x G i+β 0 =1 ρ β l 2. Note that A β,ρ means that β is set to 0 in K ρ. The roup thresholdin that arises from these optimality conditions appears in roup lasso reularized problems [12], [13], [15] and to the authors knowlede, has not been previously extracted in the context of RLR [7]. While RLR uses a different loss function than the binomial deviance, the thresholdin conditions (above establish the relationship between reularizin a standard convex risk function with a non-differentiable penalty and the sparse solutions that tend to appear when applyin robust linear classifiers to uncertain data [7]. It is intuitive that the thresholdin conditions depend on both the uncertainty manitude ρ and the sparseness penalty parameter λ. The proposed block co-ordinate radient descent consists of updatin the th roup parameters by initially computin a Newton-step δβ (m [ = 2 β L (m ρ,λ ] 1 β L (m ρ,λ (12 followed by performin a backtrackin line search ([16] for appropriate step size ν (m > 0, and then updatin β (m+1 β (m+1 β (m + ν (m δβ (m. (13 The numerical solution to solvin (11 is outlined below in Alorithm 1. The active shootin [17] step updates the parameters that were non-zero after the initial step until converence. After this subset has convered, radient descent is performed over all the variables. This preferential update tends to reach the lobal minima faster when confronted with many roups. Alorithm 1: Active Shootin Block Coordinate Descent with Group Thresholdin 1 Initialize: a β (1 0 ν (0 0 δβ(0 0 with all parameters set to zero b β (1 ν (0 δβ (0, for = 1,..., G, with β 0 evaluated at β (0 0 and all other parameters set to zero

4 JOURNAL OF L A TEX CLASS FILES 4 2 Define the active set Λ = { : β (0 0} 3 β (m+1 0 β (m 0 + ν (m 0 δβ (m 0 with δβ (m 0 via (12, ν (m 0 by performin backtrackin, and β held at previous value 4 For Λ a if X T A β,ρy l2 ρ tr ( A β,ρ + λ, β (m+1 0 b else, evaluate δβ (m from (12 while holdin all other parameters at previous values, compute step size ν (m via backtrackin, and update β (m+1 β (m + ν (m δβ (m 5 Repeat steps 3 and 4 until some converence criteria met for active parameters in Λ. 6 If converence criteria satisfied, define Λ = {1,..., G} and repeat 3 and 4 until converence in all parameters. V. EMPIRICAL ESTIMATION OF UNCERTAINTY The manitude of the potential uncertainty is determined by ρ. There are situations when a researcher has prior knowlede on the value of ρ but more often this parameter must be estimated empirically. In modern biomedical experiments, in which ene expression microarrays are used to assay the activity of tens of thousands of enes, technical replicates are frequently obtained to assess the effect of of measurement uncertainty. We will estimate ρ usin a eneralization of the method in [18] based on quantiles. For rouped uncertainty sets, we estimate ρ via the followin ˆρ (α = inf τ P( x T x τ = α (14 where the distribution in (14 is taken with respect to a data set independent of the trainin data, such as technical replicates of a bioical experiment. Note that (14 reduces to interval based quantile estimates when the number of roups G = p. As the cumulative distribution function (CDF in (14 does not depend on class label y, we obtain (14 by P(z τ = P(z τ Y = yp(y = y (15 y { 1,+1} with z = x T x and data centered about their respective class centroids. The class priors are estimated empirically by ˆP(Y = y = m y /m, where m y and m are the number of replicate samples with label y and total number of replicate samples, respectively. The estimation in (14 can be assessed with respect to the empirical CDF of the data {x i, } n or approximated by the inverse-cdf of the χ p distribution [18] with p = I derees of freedom. The application of the proposed estimation of uncertainty bounds is presented below in the context of hih-dimensional ene expression data in which technical replicates are available. VI. ROBUST VS. RIDGE REGRESSION A. Robustness of Ride Loistic Reression One important question is how the proposed RLR formulation relates to Ride LR. Ride LR involves addin a squared l 2 -penalty to the binomial deviance loss function: n min β (1 + e yiβt x i + λ 2 β 2 l 2. (16 Our oal is to identify values of ρ as a function of λ for which both robust (1 and ride (16 produce approximately identical estimates of β. We bein by inspectin the optimality conditions for β. The radients for ride and robust, respectively, are iven as (intercept removed for clarity: β L λ = X T W y + λβ (17 β L ρ = X T β W ρ y + ρtr(w ρ. β l2 (18 where W = dia( e y i βt x i 1+e y i βt x i and W ρ = dia( e y i βt x i +ρ β l2 1+e y i βt x i +ρ β l2. If we linearize the simoidal terms in W and W ρ, the scaled radients can be approximated by the followin: ( 1 n 1 β L λ 4n XT X + (λ/ni β 1 2 δ x n with δ x n = n 1 (n + x + n x = H n,λ β 1 2 δ x n (19 n 1 β L ρ 1 4n XT Xβ 1 2 δ x n ρ2 β ( ρ + (1 βt δ x n β 1 2 β l2 2 2 β 2 l 2 δ x n = c β + a β ρ 2 + b β ρ. (20 Since the radient for ride reression can be approximated by the linear system of equations in (19, we can approximate β λ via ˆβ λ 1 2 H 1 n,λ δ x n. (21 Insertin (21 into (20, summin (20 over its elements by an inner product with a vector of 1 s, and solvin for ρ, produces the followin quadratic equation of ρ in terms of λ ( 2 1 T b ˆβλ ± 1 T b ˆβλ 4 1T a ˆβλ 1T c ˆβλ ρ λ 2 1 T. (22 a ˆβλ The positive value of ρ λ is the uncertainty manitude. This establishes equivalence between the solutions of ride and robust istic reression for a iven λ. B. Converence Rates of Ride and Robust We established conditions of equivalence between ride and robust LR estimators throuh a re-parameterization of ρ in terms of λ (22. It is also of interest to investiate the rate at which these methods convere under the proposed conditions of equivalence. For ease of exposition in the analysis, we will assume that the data has been centered and there are balanced sample sizes (n + = n. We bein by assumin that the radients of both ride LR (19 and RLR (20 are in some neihborhood about 0. The structure

5 JOURNAL OF L A TEX CLASS FILES 5 of the Hessian specify rates of converence throuh the eienvalues. Under the assumptions detailed above, the scaled Hessian for the ride case is obtained from differentiatin (19 is iven by n 1 2 βl λ 1 4n XT X + (λ/ni. (23 1 By differentiatin (20 and notin that 8 βt δ x n 1 2, we obtain the scaled Hessian for robust istic reression n 1 2 βl ρ ( 1 4n XT X ρ2 I with positive semi-definite matrix A ( 1 A = I 1 2 β l2 β 2 ββ T l 2 and positive semi-definite matrix B B = + ρ (A B (24 ( β l2 δ x n β T. (26 We are interested in extractin conditions on ρ such that the Hessian correspondin to the robust binomial deviance is larer than that correspondin to ride, i.e., H = 2 β L ρ 2 β L λ 0, when both methods yield the same β, or equivalently w T Hw 0, w. (27 One can directly derive necessary conditions that must be satisfied by H by enforcin that (27 holds for some choice of w. We take w = δ x, which is the riht eienvector of B, and substitute ˆβ λ (21 and ρ λ (22 for β and ρ, respectively in (27. The conditions such that robust istic reression converes faster to the solution of ride istic reression are ρ 2 λ + 2(1 ɛ λ ˆβ λ l2 ρ λ 4λ/n (28 where ɛ λ is iven by ( ( 2 ɛ 1 λ = ˆβT λ δ x δ x 2 l 2 ˆβ + 1. (29 λ 2 4 l 2 VII. NUMERICAL RESULTS A. Recovery of Reularization Path Under Sinal Corruption Many researchers in machine learnin and bioinformatics assess importance of various biomarkers based upon the orderin of l 1 -reularization paths. Thus, it is worthwhile to explore how the orderin of the variables within the reularization path chane as uncertainty is added. We first show a simple synthetic example there are n = 100 samples in each class with x R p, p = 22, with x N (µ y, Σ. The class centroids are iven by µ +1 = [ 1 2, 1 3, 1 2, 1 2, 1 2, 1 2, 0,, 0]T and µ 1 = [ 1 2, 1 3, 0,, 0, 0, 0, 0, 0]T. The structure of Σ is block diaonal, where diaonal components {σ ii } p are equal to one and the block structure affects only the variables x 1, x 2, x 3, x 4, x 5, x 6, where the off-diaonal elements in this block {σ ij } 6 j=1,j i are equal to 0.1. The other off-diaonal elements have covariance of zero. Fiure 2(a represents the l 1 -reularization path as a function of 10 λ on the oriinal data. In Fiure 2(b, x 3 has been perturbed such that x 3 x 3 +δ 3 with ρ δ 3 ρ, which leads to a shift in the orderin within the reularization path under normal l 1 -LR. Usin the perturbed data set and knowlede that ρ = 0.1 for x 3 and presented with interval uncertainty, the l 1 -reularized robust istic reression recovers the oriinal orderin of the variables (see Fiure 2(c. B. Human Rhino Virus Gene Expression Data Here we present numerical results on peripheral blood ene expression data set from a roup of n = 20 patients inoculated with the Human Rhino Virus (HRV, the typical aent of the common cold [19]. Half of the patients responded with symptoms (y = +1 and the other half did not (y = 1. The oriinal 12, 023 enes on the Affymetrix olionucleotide microarray were reduced to p = 129 differentially expressed enes controllin for a 20% False Discovery Rate (FDR usin the same methodoy as applied in ([20]. We will reularize both the robust binomial deviance and standard binomial deviance with an l 1 -penalty to control the sparsity of the resultin model forcin many elements in β to be zero. In this experiment there were approximately 20 microarray chips that were technical replicates. The technical replicates were used to estimate the interval uncertainty bounds usin (14. Since olionucleotide microarray devices detect the presence of mrna abundance by includin thousands of different probe sets (short sequences that bind to a particular mrna molecule, produced from a specific ene, we will adopt interval uncertainty across the p enes. The estimated interval uncertainty bounds, { ˆρ i (α} p, were obtained from (14 usin the empirical CDF of the independent technical replicate data with linear interpolation between sampled data. The CDF in (14 was computed by (15 with each conditional CDF centered about their class dependent sample mean and equal class priors ˆP(Y = +1 = ˆP(Y = 1 = 1/2. Technical replicates are enerated from the blood sample used to assess ene expression activity in the trainin data and thus, are commonly used to isolate the effect of measurement error. We explored the effect of interval uncertainty bounds resultin from quantiles of (14 at levels of α = {0.25, 0.50, 0.75, 0.95, 0.99}. Fiure 3 shows the l 1 -reularization paths obtained by solvin (11 on the trainin data for different interval uncertainty (includin no uncertainty correspondin to standard l 1 -LR and illustrates how robustness affects the orderin of the enes. We see from Fiure 3(a that the first ene to appear in the reularization path is the anti-viral defense ene RSAD2. RSAD2 persists at the first ene in the reularization path for α = {0, 0.25, 0.50, 0.75} (latter three not shown but disappears from the reularization path completely, alon with a few other enes, in Fiure 3(a, when α = {0.95, 0.99}.

6 JOURNAL OF L A TEX CLASS FILES 6 "#!"#!!!"# "# &!!"#!!"# "# % "#!"#!!!"# (a l 1 -Reularization path on oriinal data "# &!!"#!!"# "# % (b l 1 -Reularization path with bounded perturbations in x 3 and ρ 3 = 0.1 "#!"#!!!"# & & & '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&! '(*+,(& '(*+,(&% '(*+,(&- '(*+,(&. '(*+,(&# '(*+,(&/ '(*+,(&0 '(*+,(&1 '(*+,(&2 '(*+,(&%! '(*+,(&% '(*+,(&%% response to viral infection barely appears in Fiure 3(a, yet is the first ene to appear in the robust reularization paths that is common to both α = {0.95, 0.99} (see Fiures 3(b and 3(c. These results suest that when assessin variable importance via reularization paths, one should be aware of the effect of measurement uncertainty on the orderin of the variables. The robust formulation within this paper aims at minimizin the worst-case confiuration of perturbations on the istic reression loss function. As our oal is discriminatin between two phenotypes, it is of interest to compare the worst-case probability of error for LR and RLR on perturbed data sets, i.e., x i x i + δ i, δ i ˆρ i (α, after trainin on the oriinal data. The l 1 -tunin parameter λ for both standard l 1 -LR and l 1 -RLR was chosen to minimize the out-of sample probability of error via 5-fold cross validation. Given the cross validated value of λ for both methods, the models were then fit to the entire set of trainin data. 50,000 perturbed data matrices were then enerated subject to interval uncertainty bounds { ˆρ i (α} p for α = {0.25, 0.50, 0.75, 0.95, 0.99}. The bestcase and worst-case probability of error were recorded in Table I. We see for all values of α explored, the worst-case probability of error correspondin to l 1 -RLR is always less than or equal to that of l 1 -LR. When α = {0.95, 0.99}, the worst-case probability of error for l 1 -LR is 0.30 but reduces to 0.20 when usin l 1 -RLR with interval uncertainty estimated from the data usin (14. Correspondin to the worst-case probability of errors are the sensitivity and 1-specificity, iven in Table II. We see that l 1 -RLR achieves the same power as l 1 -LR at false alarm rates less than or equal to that of the non-robust method. The proposed method can be used to reduce classifier error sensitivity in lare scale classification problems. This can be important in practical applications such as biomarker discovery and predictive health and disease. TABLE I BEST AND WORST CASE PROBABILITY OF ERROR, P e, FOR THE HRV DATA SET α l 1 -LR l 1 -Robust LR min P e max P e min P e max P e "# &!!"#!!"# "# % (c l 1 -Robust reularization path with ρ 3 = 0.1 Fi. 2. Reularization paths as a function of 10 λ: robust recovers oriinal orderin after perturbation The three robust enes that persist across all explored values of α are ADI1, OAS1, and TUBB2A. Of these three, OAS1 (codes for proteins involved in the innate immune VIII. CONCLUSION Buildin on the results of [7] and [5], we have formulated the robust istic reression problem with roup structured uncertainty sets. By addin reularization penalties, one can enforce block-sparsity assumptions of the underlyin sinal. We have presented a block co-ordinate radient method with iterative rouped thresholdin for solvin the penalized RLR problems. The roup thresholdin is affected by both the roup-lasso penalty and the manitude of the worst-case uncertainty. Thus, RLR tends to promote thresholdin of hihly

7 JOURNAL OF L A TEX CLASS FILES 7 " # %!#!"!!!'!& # &' %&' %!%&' &'!# &' %&' % " 463"7 89:8# :1; <=17> 5;12# ;?5<3@?# AB>>#1 =@@!!!"!# % (* +!,! 0 % -./ (a α = #7%% 518 9:;:<7= 9>==#1!!!"!# % (* +!,! 0 % -./ (b α = 0.95 TABLE II SENSITIVITY AND 1-SPECIFICITY CORRESPONDING TO WORST CASE PROBABILITY OF ERROR FROM HRV DATA α l 1 -LR l 1 -Robust LR Sens. 1-Spec. Sens. 1-Spec approach was applied to a real ene expression data set where quantile estimation was applied to a set of technical replicates. The numerical results on this data set establish that the proposed approach can yield lower worst-case probability of error and lower false alarm rates. Such a ain in worstcase detection performance can improve the performance for predictive health and disease, and in particular for predictin patient phenotype. It will be interestin to explore the situation when the roup-structure of the uncertainty differs from that of the reularization penalty. This situation could potentially be solved by modifyin the sparse roup-lasso solution as in [21]. It is also of interest to explore the kernelization of this method in which one can study the propaation of bounded data uncertainty in a reproducin kernel Hilbert space. ACKNOWLEDGMENT The authors sincerely appreciate the thouhtful insiht of Ami Wiesel in developin the ideas presented in this paper. The authors also appreciate the valuable suestions made by Ji Zhu, Kerby Shedden, and Clayton Scott. The views, opinions, and/or findins contained in this article/presentation are those of the author/presenter and should not be interpreted as representin the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Aency or the Department of Defense. A (Approved for Public Release, Distribution Unlimited!%&' &' '6 789:# ;1< <3=:>8?@AA#1!!!"!# % (* +!,! 0 % -./ (c α = 0.99 Fi. 3. Reularization paths on the HRV data set as a function of 10 λ/λ max for different manitudes of interval uncertainty, as determined by the α quantile uncertain variables, essentially performin an initial step of variable selection. We have proposed an empirical approach to estimatin the uncertainty manitude usin quantiles. This REFERENCES [1] A. Wiesel, Y.C. Eldar, and A. Yeredor, Linear reression with Gaussian model uncertainty: Alorithms and bounds, IEEE Transactions on Sinal Processin, vol. 56, no. 6, pp , [2] A. Wiesel, Y.C. Eldar, and A. Beck, Maximum likelihood estimation in linear models with a Gaussian model matrix, IEEE Sinal Processin Letters, vol. 13, no. 5, pp. 292, [3] R.J. Carroll, C.H. Spieelman, K.K.G. Lan, K.T. Bailey, and R.D. Abbott, On errors-in-variables for binary reression models, Biometrika, vol. 71, no. 1, pp , [4] Sabine Van Huffel, Ivan Markovsky, Richard J. Vaccaro, and Torsten Söderström, Total least squares and errors-in-variables modelin., Sinal Processin, vol. 2007, [5] S. Chandrasekaran, G. H. Golub, M. Gu, and A. H. Sayed, Parameter estimation in the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl., vol. 19, no. 1, pp , [6] Laurent El Ghaoui and Hervé Lebret, Robust solutions to least-squares problems with uncertain data, SIAM J. Matrix Anal. Appl., vol. 18, no. 4, pp , 1997.

8 JOURNAL OF L A TEX CLASS FILES 8 [7] Laurent El Ghaoui, Gert R. G. Lanckriet, and Geores Natsoulis, Robust classification with interval data, Tech. Rep. UCB/CSD , EECS Department, University of California, Berkeley, Oct [8] Gert R. G. Lanckriet, Laurent E Ghaoui, Chiranjib Bhattacharyya, and Michael I. Jordan, A robust minimax approach to classification, Journal of Machine Learnin Research, vol. 3, [9] Seun-Jean Kim, Alessandro Manani, and Stephen P. Boyd, Robust fisher discriminant analysis, in In Advances in Neural Information Processin Systems. 2006, MIT Press. [10] Huan Xu, Shie Mannor, and Constantine Caramanis, Robustness, risk, and reularization in support vector machines, CoRR, vol. abs/ , [11] Aharon Ben-Tal, Laurent E. Ghaoui, and Arkadi Nemirovski, Robust Optimization (Princeton Series in Applied Mathematics, Princeton University Press, [12] Min Yuan, Min Yuan, Yi Lin, and Yi Lin, Model selection and estimation in reression with rouped variables, Journal of the Royal Statistical Society, Series B, vol. 68, pp , [13] Lukas Meier, Sara van de Geer, and Peter Bühlmann, The roup lasso for istic reression, Journal Of The Royal Statistical Society Series B, vol. 70, no. 1, pp , [14] Dimitri P. Bertsekas, Nonlinear Prorammin, Athena Scientific, [15] Volker Roth and Bernd Fischer, The roup-lasso for eneralized linear models: uniqueness of solutions and efficient alorithms, in ICML 08: Proceedins of the 25th international conference on Machine learnin, 2008, pp [16] Stephen Boyd and Lieven Vandenberhe, Convex Optimization, Cambride University Press, [17] Jie Pen, Pei Wan, Nenfen Zhou, and Ji Zhu, Partial correlation estimation by joint sparse reression models, Journal of the American Statistical Association, vol. 104, no. 486, pp , [18] A. Wiesel, Y.C. Eldar, and S. Shamai, Optimization of the mimo compound capacity, IEEE Transactions on Wireless Communications, vol. 6, no. 3, pp. 1094, [19] A.K. Zaas, M. Chen, J. Varkey, T. Veldman, A.O. Hero, J. Lucas, Y. Huan, R. Turner, A. Gilbert, R. Lambkin-Williams, et al., Gene Expression Sinatures Dianose Influenza and Other Symptomatic Respiratory Viral Infections in Humans, Cell Host & Microbe, vol. 6, no. 3, pp , [20] A. Swaroop, A.J. Mears, G. Fleury, and A.O. Hero, Multicriteria Gene Screenin for Analysis of Differential Expression with DNA Microarrays, EURASIP Journal on Advances in Sinal Processin, [21] J. Friedman, T. Hastie, and R. Tibshirani, A note on the roup lasso and a sparse roup lasso, arxiv: v1, 2010.

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION

A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION Carlos Eduardo Thomaz Department of Electrical Enineerin, Centro Universitario da FEI, FEI, Sao Paulo, Brazil

More information

Listwise Approach to Learning to Rank - Theory and Algorithm

Listwise Approach to Learning to Rank - Theory and Algorithm Fen Xia* [email protected] Institute of Automation, Chinese Academy of Sciences, Beijin, 100190, P. R. China. Tie-Yan Liu [email protected] Microsoft Research Asia, Sima Center, No.49 Zhichun Road, Haidian

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

On the Influence of the Prediction Horizon in Dynamic Matrix Control

On the Influence of the Prediction Horizon in Dynamic Matrix Control International Journal of Control Science and Enineerin 203, 3(): 22-30 DOI: 0.5923/j.control.203030.03 On the Influence of the Prediction Horizon in Dynamic Matrix Control Jose Manue l Lope z-gue de,*,

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,

More information

A Log-Robust Optimization Approach to Portfolio Management

A Log-Robust Optimization Approach to Portfolio Management A Log-Robust Optimization Approach to Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

A Multiagent Based System for Resource Allocation and Scheduling of Distributed Projects

A Multiagent Based System for Resource Allocation and Scheduling of Distributed Projects International Journal of Modelin and Optimization, Vol. 2, No., Auust 2012 A Multiaent Based System for Resource Allocation and Schedulin of Distributed Projects Sunil Adhau and M. L. Mittal Abstract In

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,

More information

SIMPLIFICATION OF WATER SUPPLY NETWORK MODELS THROUGH LINEARISATION

SIMPLIFICATION OF WATER SUPPLY NETWORK MODELS THROUGH LINEARISATION SIMPLIFICATIO OF WATER SUPPLY ETWORK MODELS THROUGH LIEARISATIO Tobias Maschler and Draan A. Savic [email protected] Report umber:99/ 999 Centre for Water Systems, University of Exeter, orth Park Road,

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

ON AN APPROACH TO STATEMENT OF SQL QUESTIONS IN THE NATURAL LANGUAGE FOR CYBER2 KNOWLEDGE DEMONSTRATION AND ASSESSMENT SYSTEM

ON AN APPROACH TO STATEMENT OF SQL QUESTIONS IN THE NATURAL LANGUAGE FOR CYBER2 KNOWLEDGE DEMONSTRATION AND ASSESSMENT SYSTEM 216 International Journal "Information Technoloies & Knowlede" Volume 9, Number 3, 2015 ON AN APPROACH TO STATEMENT OF SQ QUESTIONS IN THE NATURA ANGUAGE FOR CYBER2 KNOWEDGE DEMONSTRATION AND ASSESSMENT

More information

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery : Distribution-Oblivious Support Recovery Yudong Chen Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 7872 Constantine Caramanis Department of Electrical

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

A primal-dual algorithm for group sparse regularization with overlapping groups

A primal-dual algorithm for group sparse regularization with overlapping groups A primal-dual algorithm for group sparse regularization with overlapping groups Sofia Mosci DISI- Università di Genova [email protected] Alessandro Verri DISI- Università di Genova [email protected]

More information

Efficient variable selection in support vector machines via the alternating direction method of multipliers

Efficient variable selection in support vector machines via the alternating direction method of multipliers Efficient variable selection in support vector machines via the alternating direction method of multipliers Gui-Bo Ye Yifei Chen Xiaohui Xie University of California, Irvine University of California, Irvine

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

A Survey on Privacy Preserving Decision Tree Classifier

A Survey on Privacy Preserving Decision Tree Classifier A Survey on Classifier Tejaswini Pawar*, Prof. Snehal Kamalapur ** * Department of Computer Enineerin, K.K.W.I.E.E&R, Nashik, ** Department of Computer Enineerin, K.K.W.I.E.E&R, Nashik, ABSTRACT In recent

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Robust Cost Sensitive Support Vector Machine

Robust Cost Sensitive Support Vector Machine Shuichi Katsumata The University of Tokyo shuichi [email protected] Akiko Takeda The University of Tokyo [email protected] Abstract In this paper we consider robust classifications

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Impacts of Manufacturing Sector on Economic Growth in Ethiopia: A Kaldorian Approach

Impacts of Manufacturing Sector on Economic Growth in Ethiopia: A Kaldorian Approach Available online at http://www.journaldynamics.or/jbems Journal of Business Economics and Manaement Sciences Vol. 1(1), pp.1-8, December 2014 Article ID: JBEMS14/011 2014 Journal Dynamics Oriinal Research

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Optical Communications Research Group Department of Electronic and Computer Engineering University of Limerick, Ireland b

Optical Communications Research Group Department of Electronic and Computer Engineering University of Limerick, Ireland b Numerical Analysis of Pulse Pedestal and Dynamic Chirp Formation on Picosecond Modelocked Laser Pulses after Propaation throuh a Semiconductor Optical Amplifier Michael J. Connelly a Aislin M. Clarke b

More information

A Privacy Mechanism for Mobile-based Urban Traffic Monitoring

A Privacy Mechanism for Mobile-based Urban Traffic Monitoring A Privacy Mechanism for Mobile-based Urban Traffic Monitorin Chi Wan, Hua Liu, Bhaskar Krishnamachari, Murali Annavaram Tsinhua University, Beijin, China [email protected] University of Southern California,

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Understanding the Impact of Weights Constraints in Portfolio Theory

Understanding the Impact of Weights Constraints in Portfolio Theory Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris [email protected] January 2010 Abstract In this article,

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Revenue Management for Transportation Problems

Revenue Management for Transportation Problems Revenue Management for Transportation Problems Francesca Guerriero Giovanna Miglionico Filomena Olivito Department of Electronic Informatics and Systems, University of Calabria Via P. Bucci, 87036 Rende

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

A HyFlex Module for the Personnel Scheduling Problem

A HyFlex Module for the Personnel Scheduling Problem A HyFle Module for the Personnel Schedulin Problem Tim Curtois, Gabriela Ochoa, Matthew Hyde, José Antonio Vázquez-Rodríuez Automated Schedulin, Optimisation and Plannin (ASAP) Group, School of Computer

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

How To Write A Mathematical Model Of Ship Hydrodynamics In An Inland Shiphandlin Simulator Of Smu Insim

How To Write A Mathematical Model Of Ship Hydrodynamics In An Inland Shiphandlin Simulator Of Smu Insim Scientific Journals Maritime University of Szczecin Zeszyty Naukowe Akademia Morska w Szczecinie 14, 37(19) pp. 1 15 14, 37(19) s. 1 15 ISSN 1733-867 Ship manoeuvrin hydrodynamics in a new inland shiphandlin

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research

More information

Application of discriminant analysis to predict the class of degree for graduating students in a university system

Application of discriminant analysis to predict the class of degree for graduating students in a university system International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

More information

Fuzzy Differential Systems and the New Concept of Stability

Fuzzy Differential Systems and the New Concept of Stability Nonlinear Dynamics and Systems Theory, 1(2) (2001) 111 119 Fuzzy Differential Systems and the New Concept of Stability V. Lakshmikantham 1 and S. Leela 2 1 Department of Mathematical Sciences, Florida

More information

THE SIMPLE PENDULUM. Objective: To investigate the relationship between the length of a simple pendulum and the period of its motion.

THE SIMPLE PENDULUM. Objective: To investigate the relationship between the length of a simple pendulum and the period of its motion. THE SIMPLE PENDULUM Objective: To investiate the relationship between the lenth of a simple pendulum and the period of its motion. Apparatus: Strin, pendulum bob, meter stick, computer with ULI interface,

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

Graphical Modeling for Genomic Data

Graphical Modeling for Genomic Data Graphical Modeling for Genomic Data Carel F.W. Peeters [email protected] Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

A Robust Optimization Approach to Supply Chain Management

A Robust Optimization Approach to Supply Chain Management A Robust Optimization Approach to Supply Chain Management Dimitris Bertsimas and Aurélie Thiele Massachusetts Institute of Technology, Cambridge MA 0139, [email protected], [email protected] Abstract. We

More information