Advice Refinement in Knowledge Based SVMs
|
|
|
- Shannon Ford
- 5 years ago
- Views:
Transcription
1 Advice Refinement in Knowledge-Based SVMs Gautam Kunapuli Univ. of Wisconsin-Madison 1300 University Avenue Madison, WI Richard Maclin Univ. of Minnesota, Duluth 1114 Kirby Drive Duluth, MN Jude W. Shavlik Univ. of Wisconsin-Madison 1300 University Avenue Madison, WI Abstract Knowledge-based support vector machines (KBSVMs) incorporate advice from domain experts, which can improve generalization significantly. A major limitation that has not been fully addressed occurs when the expert advice is imperfect, which can lead to poorer models. We propose a model that extends KBSVMs and is able to not only learn from data and advice, but also simultaneously improves the advice. The proposed approach is particularly effective for knowledge discovery in domains with few labeled examples. The proposed model contains bilinear constraints, and is solved using two iterative approaches: successive linear programg and a constrained concave-convex approach. Experimental results demonstrate that these algorithms yield useful refinements to expert advice, as well as improve the performance of the learning algorithm overall. 1 Introduction We are primarily interested in learning in domains where there is only a small amount of labeled data but advice can be provided by a domain expert. The goal is to refine this advice, which is usually only approximately correct, during learning, in such scenarios, to produce interpretable models that generalize better and aid knowledge discovery. For learning in complex environments, a number of researchers have shown that incorporating prior knowledge from experts can greatly improve the generalization of the model learned, often with many fewer labeled examples. Such approaches have been shown in rule-learning methods [16], artificial neural networks (ANNs) [21] and support vector machines (SVMs) [10, 17]. One limitation of these methods concerns how well they adapt when the knowledge provided by the expert is inexact or partially correct. Many of the rule-learning methods focus on rule refinement to learn better rules, while ANNs form the rules as portions of the network which are refined by backpropagation. Further, ANN methods have been paired with rule-extraction methods [3, 20] to try to understand the resulting learned network and provide rules that are easily interpreted by domain experts. We consider the framework of knowledge-based support vector machines (KBSVMs), introduced by Fung et al. [6]. KBSVMs have been extensively studied, and in addition to linear classification, they have been extended to incorporate kernels [5], nonlinear advice [14] and for kernel approximation [13]. Recently, Kunapuli et al. derived an online version of KBSVMs [9], while other approaches such as that of Le et al. [11] modify the hypothesis space rather than the optimization problem. Extensive empirical results from this prior work establish that expert advice can be effective, especially for biomedical applications such as breast-cancer diagnosis. KBSVMs are an attractive methodology for knowledge discovery as they can produce good models that generalize well with a small amount of labeled data. Advice tends to be rule-of-thumb and is based on the expert s accumulated experience in the domain; it may not always be accurate. Rather than simply ignoring or heavily penalizing inaccurate rules, the effectiveness of the advice can be improved through refinement. There are two main reasons for this: first, refined rules result in the improvement of the overall generalization, and second, if the refinements to the advice are interpretable by the domain experts, it will help in the understanding of the phenomena underlying the applications for the experts, and consequently 1
2 Figure 1: (left) Standard SVM, trades off complexity and loss wrt the data; (center) Knowledge-based SVM, also trades off loss wrt advice. A piece of advice set 1 extends over the margin, and is penalized as the advice error. No part of advice set 2 touches the margin, i.e., none of the rules in advice set 2 are useful as support constraints. (right) SVM that refines advice in two ways: (1) advice set 1 is refined so that no part of is on the wrong side of the optimal hyperplane, imizing advice error, (2) advice set 2 is expanded until it touches the optimal margin thus maximizing coverage of input space. greatly facilitate the knowledge-discovery process. This is the motivation behind this work. KB- SVMs already have several desirable properties that make them an ideal target for refinement. First, advice is specified as polyhedral regions in input space, whose constraints on the features are easily interpretable by non-experts. Second, it is well-known that KBSVMs can learn to generalize well with small data sets [9], and can even learn from advice alone [6]. Finally, owing to the simplicity of the formulation, advice-refinement terms for the rules can be incorporated directly into the model. We further motivate advice refinement in KBSVMs with the following example. Figure 1 (left) shows an SVM, which trades off regularization with the data error. Figure 1 (center) illustrates KBSVMs in their standard form as shown in [6]. As mentioned before, expert rules are specified in the KBSVM framework as polyhedral advice regions in input space. They introduce a bias to focus the learner on a model that also includes the advice of the form x, (x advice region i) class(x) = 1. Advice regarding the regions for which class(x) = 1 can be specified similarly. In the KBSVM (Figure 1, center), each advice region contributes to the final hypothesis in a KBSVM via its advice vector, u 1 and u 2 (as introduced in [6]; also see Section 2). The individual constraints that touch or intersect the margin have non-zero u i j components. As a piece of advice region 1 extends beyond the margin, u 1 0; furthermore, analogous to data error, this overlap is penalized as the advice error. As no part of advice set 2 touches the margin, u 2 = 0 and none of its rules contribute anything to the final classifier. Again, analogous to support vectors, rules with non-zero u i j components are called support constraints [6]. Consequently, in the final classifier the advice sets are incorporated with advice error (advice set 1) or are completely ignored (advice set 2). Even though the rules are inaccurate, they are able to improve generalization compared to the SVM. However, simply penalizing advice that introduces errors can make learning difficult as the user must carefully trade off between optimizing data or advice loss. Now, consider an SVM that is capable of refining inaccurate advice (Figure 1, right). When advice is inaccurate and intersects the hyperplane, it is truncated such that it imizes the advice error. Advice that was originally ignored is extended to cover as much of the input space as is feasible. The optimal classifier has now imized the error with respect to the data and the refined advice and is able to further improve upon the performance of not just the SVM but also the KBSVM. Thus, the goal is to refine potentially inaccurate expert advice during learning so as to learn a model with the best generalization. Our approach generalizes the work of Maclin et al. [12], to produce a model that corrects the polyhedral advice regions of KBSVMs. The resulting mathematical program is no longer a linear or quadratic program owing to bilinear correction factors in the constraints. We propose two algorithmic techniques to solve the resulting bilinear program, one based on successive linear programg [12], and the other based on a concave-convex procedure [24]. Before we describe advice refinement, we briefly introduce our notation and KBSVMs. We wish to learn a linear classifier (w x = b) given l labeled data (x j, y j ) l j=1 with xj R n and labels y j {±1}. Data are collected row-wise in the matrix X R l n, while Y = diag(y) is the diagonal matrix of the labels. We assume that m advice sets (D i,d i, z i ) m i=1 are given in addition to the data (see Section 2), and if the i-th advice set has k i constraints, we have D i R ki n, d i R ki and z i = {±1}. The absolute value of a scalar y is denoted y, the 1-norm of a vector x 2
3 is denoted x 1 = n i=1 x i, and the entrywise 1-norm of a m n matrix A R p q is denoted A 1 = p q i=1 i=1 A ij. Finally, e is a vector of ones of appropriate dimension. 2 Knowledge-Based Support Vector Machines In KBSVMs, advice can be specified about every potential data point in the input space that satisfies certain advice constraints. For example, consider a task of learning to diagnose diabetes, based on features such as age, blood pressure, body mass index (bmi), plasma glucose concentration (gluc), etc. The National Institute for Health (NIH) provides the following guidelines to establish risk for Type-2 Diabetes 1 : a person who is obese (bmi 30) withgluc 126 is at strong risk for diabetes, while a person who is at normal weight (bmi 25) with gluc 100 is unlikely to have diabetes. This leads to two advice sets, one for each class: (bmi 25) (gluc 100) diabetes; (bmi 30) (gluc 126) diabetes, (1) where is the negation operator. In general, rules such as the ones above define a polyhedral region of the input space and are expressed as the implication D i x d i z i (w x b) 1, (2) where the advice label z i = +1 indicates that all points x that satisfy the constraints for the i-th advice set, D i x d i belong to class +1, while z = 1 indicates the same for the other class. The standard linear SVM formulation (without incorporating advice) for binary classification optimizes model complexity + λ data loss: w 1 + λe ξ, s.t. Y (Xw eb) + ξ e. (3) ξ 0,w,b The implications (2), for the i = 1,..., m, can be incorporated into (3) using the nonhomogeneous Farkas theorem of the alternative [6] that introduces advice vectors u i. The advice vectors perform the same role as the dual multipliers α in the classical SVM. Recall that points with non-zero α s are the support vectors which additively contribute to w. Similarly, the constraints of an advice set which have non-zero u i s are called support constraints. The resulting formulation is the KBSVM, which optimizes model complexity + λ data loss + µ advice loss: w,b,(ξ,u i,η i,ζ i) 0 i=1 (e η i + ζ i ) η i D i ui + z i w η i, d i u i z i b + ζ i 1, i = 1,...,m. In the case of inaccurate advice, the advice errors η i and ζ i soften the advice constraints analogous to the data errors ξ. Returning to Figure 1, for advice set 1, η 1, ζ 1 and u 1 are non-zero, while for advice set 2, u 2 = 0. The influence of data and advice is detered by the choice of the parameters λ and µ which reflect the user s trust in the data and advice respectively. 3 Advice-Refining Knowledge-based Support Vector Machines Previously, Maclin et al. [12] formulated a model to refine advice in KBSVMs. However, their model is limited as only the terms d i are refined, which as we discuss below, greatly restricts the types of refinements that are possible. They only consider refinement terms f i for the right hand side of the i-th advice set, and attempt to refine each rule such that D i x (d i f i ) z i (w x b) 1, i = 1,...,m. (5) The resulting formulation adds refinement terms into the KBSVM model (4) in the advice constraints, as well as in the objective. The latter allows for the overall extent of the refinement to be controlled by the refinement parameter ν > 0. This formulation was called Refining-Rules Support Vector Machine (RRSVM): w,b,f i,(ξ,u i,η i,ζ i) 0 i=1 (e η i + ζ i ) + ν m i=1 fi 1 η i D i ui + z i w η i, (d i f i ) u i z i b + ζ i 1, i = 1,...,m. 1 riskfortype2 (4) (6) 3
4 This problem is no longer an LP owing to the bilinear terms f i u i which make the refinement constraints non-convex. Maclin et al. solve this problem using successive linear programg (SLP) wherein linear programs arising from alternately fixing either the advice terms d i or the refinement terms f i are solved iteratively. We consider a full generalization of the RRSVM approach and develop a model where it is possible to refine the entire advice region Dx d. This allows for much more flexibility in refining the advice based on the data, while still retaining interpretability of the resulting refined advice. In addition to the terms f i, we propose the introduction of additional refinement terms F i into the model, so that we can refine the rules in as general a manner as possible: (D i F i )x (d i f i ) z i (w x b) 1, i = 1,...,m. (7) Recall that for each advice set we have D i R ki n and d i R ki, i.e., the i-th advice set contains k i constraints. The corresponding refinement terms F i and f i will have the same dimensions respectively as D i and d i. The formulation (6) now includes the additional refinement terms F i, and the formulation optimizes: w,b,f i,f i,(ξ,u i,η i,ζ i) 0 i=1 (e η i + ζ i ) + ν m ( ) i=1 Fi 1 + f i 1 η i (D i F i ) u i + z i w η i, (d i f i ) u i z i b + ζ i 1, i = 1,...,m. The objective function of (8) trades-off the effect of refinement in each of the advice sets via the refinement parameter ν. This is the Advice-Refining KBSVM (arksvm); it improves upon the work of Maclin et al. in two important ways. First, refining d alone is highly restrictive as it allows only for the translation of the boundaries of the polyhedral advice; the generalized refinement offered by arksvms allows for much more flexibility owing to the fact that the boundaries of the advice can be translated and rotated (see Figure 2). Second, the newly added refinement terms, F i ui, are bilinear also, and do not make the overall problem more complex; in addition to the successive linear programg approach of [12], we also propose a concave-convex procedure that leads to an approach based on successive quadratic programg. We provide details of both approaches next. 3.1 arksvms via Successive Linear Programg One approach to solving bilinear programg problems is to solve a sequence of linear programs while alternately fixing the bilinear variables. This approach is called successive linear programg, and has been used to solve various machine learning formulations, for instance [1, 2]. In this approach, which was also adopted by [12], we solve the LPs arising from alternatingly fixing the sources of bilinearity: (F i,f i ) m i=1 and {ui } m i=1. Algorithm 1 describes the above approach. At the t-th iteration, the algorithm alternates between the following steps: (Estimation Step) When the refinement terms, ( ˆF i t, ˆf i,t ) m i=1, are fixed the resulting LP becomes a standard KBSVM which attempts to find a data-estimate of the advice vectors {u i } m i=1 using the current refinement of the advice region: (D j ˆF j t)x (dj ˆf j,t ). (Refinement Step) When the advice-estimate terms {û i,t } m i=1 are fixed, the resulting LP solves for (F i,f i ) m i=1 and attempts to further refine the advice regions based on estimates from data computed in the previous step. Proposition 1 I. For ǫ = 0, the sequence of objective values converges to the value w 1 + λe ξ + µ m i=1 (e η i + ζ i ) + ν m ( ) i=1 Fi 1 + f i 1, where the data and advice errors ( ξ, η i, ζ i ) are computed from any accumulation point ( w, b,ū i, F i, f i ) of the sequence of iterates (ŵ t,ˆb t,û i,t, ˆF i t, ˆf i,t ) t=1 generated by Algorithm 1. II. Such an accumulation point satisfies the local imum condition ( w, b) u i 0 w,b,(ξ,η i ζ i 0) i=1 (e η i + ζ i ) subject to Y (Xw be) + ξ e, η i (D i F i ) u i + z i w η i, (d i f i ) u i z i b + ζ i 1, i = 1,...,m. (8) 4
5 Algorithm 1 arksvm via Successive Linear Programg (arksvm-sla) 1: initialize: t = 1, ˆF i 1 = 0, ˆf i,1 = 0 2: while feasible do 3: if x not feasible for (D i ˆF i t )x (d j ˆf i,t ) return failure 4: (estimation step) solve for {û i,t+1 } m i=1 w,b,(ξ,u i,η i,ζ i ) 0 i=1 (e η i + ζ i) t+1 5: (refinement step) solve for ( ˆF i, ˆf i,t+1 ) m i=1 w,b,f i,f i,(ξ,η i,ζ i ) 0 η i (D i ˆF t i ) u i + z iw η i, (d i ˆf i,t ) u i z ib + ζ i 1, i = 1,..., m. i=1 (e η i + ζ i) + ν m i=1 η i (D i F i) û i,t+1 + z iw η i, ( Fi 1 + f i 1 ) (d i f i ) û i,t+1 z ib + ζ i 1, i = 1,..., m. 6: (teration test) if ( j F t j F t+1 j + fj t f t+1 j ) ǫ then return solution 7: (continue) t = t + 1 8: end while Algorithm 2 arksvm via Successive Quadratic Programg (arksvm-sqp) 1: initialize: t = 1, ˆF i 1 = 0, ˆf i,1 = 0 2: while feasible do 3: if x not feasible for (D i ˆF i t )x (d j ˆf i,t ) return failure 4: solve for {û i,t+1 } m i=1 F i,f i,(u i 0) w,b,(ξ,η i ζ i 0) 5: (teration test) if j 6: (continue) t = t + 1 7: end while i=1 (e η i + ζ i) + ν ( ) m i=1 Fi 1 + f i 1 eqns (10 12), i = 1,..., m, j = 1,..., n ( F t j F t+1 j + f t j f t+1 j ) ǫ then return solution 3.2 arksvms via Successive Quadratic Programg In addition to the above approach, we introduce another algorithm (Algorithm 2) that is based on successive quadratic programg. In the constraint (D i F i ) u i + z i w η i 0, only the refinement term F i ui is bilinear, while the rest of the constraint is linear. Denote the j-th components of w and η i to be w j and ηj i respectively. A general bilinear term r s, which is non-convex, can be written as the difference of two convex terms: 1 4 r+s r s 2. Thus, we have the equivalent constraint D ij ui + z i w j ηj i F ij u i F ij + u i 2, (9) and both sides of the constraint above are convex and quadratic. We can linearize the right-hand side of (9) around some current estimate of the bilinear variables ( ˆF t ij, ûi,t ): D ij ui + z i w j η i j F ij u i ˆF t ij + ûi,t ( ˆF t ij + ûi,t ) ( (F ij ˆF t ij ) + (ui û i,t ) Similarly, the constraint (D i F i ) u i z i w η i 0, can be replaced by D ij ui z i w j η i j F ij + u i ˆF t ij ûi,t ( ˆF t ij ûi,t ) ( (F ij ˆF t ij ) (ui û i,t ) ). ), (10) (11) 5
6 Figure 2: Toy data set (Section 4.1) using (left) RRSVM (center) arksvm-sla (right) arksvm-sqp. Orange and green unhatched regions show the original advice. The dashed lines show the margin, w. For each method, we show the refined advice: vertically hatched for Class +1, and diagonally hatched for Class 1. while d i u i + z i b + 1 ζ i f i u i 0 is replaced by d i u i + z i b + 1 ζ i fi u i ˆf i,t + û i,t (ˆf i,t + û i,t ) ((f i,t ˆf ) i,t ) + (u i û i,t ). The right-hand sides in (10 12) are affine and hence, the entire set of constraints are now convex. Replacing the original bilinear non-convex constraints of (8) with the convexified relaxations results in a quadratically-constrained linear program (QCLP). These quadratic constraints are more restrictive than their non-convex counterparts, which leads the feasible set of this problem to be a subset of that of the original problem. Now, we can iteratively solve the resulting QCLP. At the t-th iteration, the restricted problem uses the current estimate to construct a new feasible point and iterating this procedure produces a sequence of feasible points with decreasing objective values. The approach described here is essentially the constrained concave-convex procedure (CCCP) that has been discovered and rediscovered several times. Most recently, the approach was described in the context of machine learning approaches by Yuille and Rangarajan [24], and Smola and Vishwanathan [19], who also derived conditions under which the algorithm converges to a local solution. The following convergence theorem is due to [19]. Proposition 2 For Algorithm 2, the sequence of objective values converges to the value w 1 + λe ξ + µ m i=1 (e η i + ζ i ) + ν m ( ) i=1 Fi 1 + f i 1, where ( w, b,ū i, F i, f i, ξ, η i, ζ i ) is the local imum solution of (8) provided that the constraints (10 12) in conjunction with the convex constraints Y (Xw eb) + ξ e, ξ 0, u i 0, ζ i 0 satisfy suitable constraint qualifications at the point of convergence of the algorithm. Both Algorithms 1 and 2 produce local ima solutions to the arksvm formulation (8). For either solution, the following proposition holds, which shows that either algorithm produces a refinement of the original polyhedral advice regions. The proof is a direct consequence of [13][Proposition 2.1]. Proposition 3 Let ( w, b,ū i, F i, f i, ξ, η i, ζ i ) be the local imum solution produced by Algorithm 1 or Algorithm 2. Then, the following refinement to the advice sets holds: (D i F i ) (d i f i ) z i ( w x b) ˆη i x ζ i, where η i ˆη i η i such that D iūi + w + ˆη i = 0. 4 Experiments We present the results of several experiments that compare the performance of three algorithms: RRSVMs (which only refine the d term in Dx d), arksvm-sla (successive linear programg) and arksvm-sqp (successive quadratic programg) with that of standard SVMs and KBSVMs. The LPs were solved using QSOPT 2, while the QCLPs were solved using SDPT-3 [22]. 4.1 Toy Example We illustrate the behavior of advice refinement algorithms discussed previously geometrically using a simple 2-dimensional example (Figure 2). This toy data set consists of 200 points separated by x 1 + x 2 = 2. There are two advice sets: {S 1 : (x 1, x 2 ) 0 z = +1}, {S 2 : (x 1, x 2 ) wcook/qsopt/ (12) 6
7 Testing Error (%) svm 10 kbsvm rrsvm 5 arksvm sla arksvm sqp Number of Training Examples Figure 3: Diabetes data set, Section 4.2; (left) Results averaged over 10 runs on a hold-out test set of 412 points, with parameters selected by five-fold cross validation; (right) An approximate decision-tree representation of Diabetes Rule 6 before and after refinement. The left branch is chosen if the query at a node is true, and the right branch otherwise. The leaf nodes classify the data point according to?diabetes. z = 1}. Both arksvms are able to refine knowledge sets such that the no part of S 1 lies on the wrong side of the final hyperplane. In addition, the refinement terms allow for sufficient modification of the advice sets Dx d so that they fill the input space as much as possible, without violating the margin. Comparing to RRSVMs, we see that refinement is restrictive because corrections are applied only to part of the advice sets, rather than fully correcting the advice. 4.2 Case Study 1: PIMA Indians Diabetes Diagnosis The Pima Indians Diabetes data set [4] has been studied for several decades and is used as a standard benchmark to test many machine learning algorithms. The goal is to predict the onset of diabetes in 768 Pima Indian women within the next 5 years based on current indicators (eight features): number of times pregnant, plasma glucose concentration (gluc), diastolic blood pressure, triceps skin fold test, 2-hour serum insulin, body mass index (bmi), diabetes pedigree function (pedf) and age. Studies [15] show that diabetes incidence among the Pima Indians is significantly higher among subjects with bmi 30. In addition, a person with impaired glucose tolerance is at a significant risk for, or worse, has undiagnosed diabetes [8]. This leads to the following expert rules: (Diabetes Rule 1) (gluc 126) diabetes, (Diabetes Rule 2) (gluc 126) (gluc 140) (bmi 30) diabetes, (Diabetes Rule 3) (gluc 126) (gluc 140) (bmi 30) diabetes, (Diabetes Rule 4) (gluc 140) diabetes. The diabetes pedigree function was developed by Smith et al. [18], and uses genetic information from family relatives to provide a measure of the expected genetic influence (heredity) on the subject s diabetes risk. The function also takes into account the age of relatives who do have diabetes; on average, Pima Indians are only 36 years old 3 when diagnosed with diabetes. A subject with high heredity who is at least 31 is at a significantly increased risk for diabetes in the next five years: (Diabetes Rule 5) (Diabetes Rule 6) (pedf 0.5) (age 31) diabetes, (pedf 0.5) (age 31) diabetes. Figure 3 (left) shows that unrefined advice does help initially, especially with as few as 30 data points. However, as more data points are available, the effect of the advice diishes. In contrast, the advice refining methods are able to generalize much better with few data points, and eventually converge to a better solution. Finally, Figure 3 (right) shows an approximate tree representation of Diabetes Rule 6 after refinement. This tree was constructed by sampling the space around refined advice region uniformly, and then training a decision tree that covers as many of the sampled points as possible. This naive approach to rule extraction from refined advice is shown here only to illustrate that it is possible to produce very useful domain-expert-interpretable rules from refinement. More efficient and accurate rule extraction techniques inspired by SVM-based rule extraction (for example, [7]) are currently under investigation. 7
8 Testing Error (%) svm kbsvm rrsvm arksvm sla arksvm sqp Number of Training Examples Figure 4: Wargus data set, Section 4.3; (left) An example Wargus scenario; (right) Results using 5-fold cross validation on a hold out test set of 1000 points. 4.3 Case Study 2: Refining GUI-Collected Human Advice in a Wargus Task Wargus 4 is a real-time strategy game in which two or more players gather resources, build bases and control units in order to conquer opposing players. It has been widely used to study and evaluate various machine learning and planning algorithms. We evaluate our algorithms on a classification task in the Wargus domain developed by Walker et al. [23] called tower-defense (Figure 4, left). Advice for this task was collected from humans via a graphical, human-computer interface (HCI) as detailed in [23]. Each scenario (example) in tower-defense, consists of a single tower being attacked by a group of enemy units, and the task is to predict whether the tower will survive the attack and defeat the attackers given the size and composition of the latter, as well as other factors such as the environment. The data set consists of 80 features including information about units (eg., archers, ballista, peasants), unit properties (e.g., map location, health), group properties (e.g., #archers, #footmen) and environmental factors (e.g.,?hasmoat). Walker et al. [23] used this domain to study the feasibility of learning from human teachers. To this end, human players were first trained to identify whether a tower would fall given a particular scenario. Once the humans learned this task, they were asked to provide advice via a GUI-based interface based on specific examples. This setting lends itself very well to refinement as the advice collected from human experts represents the sum of their experiences with the domain, but is by no means perfect or exact. The following are some rules provided by human domain experts : (Wargus Rule 1) (#footmen 3) (?hasmoat = 0) falls, (Wargus Rule 2) (#archers 5) falls, (Wargus Rule 3) (#ballistas 1) falls, (Wargus Rule 4) (#ballistas = 0) (#archers = 0) (?hasmoat = 1) stands. Figure 4 (right) shows the performance of the various algorithms on the Wargus data set. As with the previous case study, the arksvm methods are able to not only learn very effectively with a small data set, they are also able to improve significantly on the performances of standard knowledgebased SVMs (KBSVMs) and rule-refining SVMs (RRSVMs). 5 Conclusions and Future Work We have presented two novel knowledge-discovery methods: arksvm-sla and arksvm-sqp, that allow SVM methods to not only make use of advice provided by human experts but to refine that advice using labeled data to improve the advice. These methods are an advance over previous knowledge-based SVM methods which either did not refine advice [6] or could only refine simple aspects of the advice [12]. Experimental results demonstrate that our arksvm methods can make use of inaccurate advice to revise them to better fit the data. A significant aspect of these learning methods is that the system not only produces a classifier but also produces human-inspectable changes to the user-provided advice, and can do so using small data sets. In terms of future work, we plan to explore several avenues of research including extending this approach to the nonlinear case for more complex models, better optimization algorithms for improved efficiency, and interpretation of refined rules for non-ai experts
9 Acknowledgements The authors gratefully acknowledge support of the Defense Advanced Research Projects Agency under DARPA grant FA C-7606 and the National Institute of Health under NLM grant R01-LM Views and conclusions contained in this document are those of the authors and do not necessarily represent the official opinion or policies, either expressed or implied of the US government or of DARPA. References [1] K. P. Bennett and E. J. Bredensteiner. A parametric optimization method for machine learning. INFORMS Journal on Computing, 9(3): , [2] K. P. Bennett and O. L. Mangasarian. Bilinear separation of two sets in n-space. Computational Optimization and Applications, 2: , [3] M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems, volume 8, pages 24 30, [4] A. Frank and A. Asuncion. UCI machine learning repository, [5] G. Fung, O. L. Mangasarian, and J. W. Shavlik. Knowledge-based nonlinear kernel classifiers. In Sixteenth Annual Conference on Learning Theory, pages , [6] G. Fung, O. L. Mangasarian, and J. W. Shavlik. Knowledge-based support vector classifiers. In Advances in Neural Information Processing Systems, volume 15, pages , [7] G. Fung, S. Sandilya, and R. B. Rao. Rule extraction from linear support vector machines. In Proc. of the Eleventh ACM SIGKDD Intl. Conf. Knowledge Discovery in Data Mining, pages 32 40, [8] M. I. Harris, K. M. Flegal, C. C. Cowie, M. S. Eberhardt, D. E. Goldstein, R. R. Little, H. M. Wiedmeyer, and D. D. Byrd-Holt. Prevalence of diabetes, impaired fasting glucose, and impaired glucose tolerance in U.S. adults. Diabetes Care, 21(4): , [9] G. Kunapuli, K. P. Bennett, A. Shabbeer, R. Maclin, and J. W. Shavlik. Online knowledge-based support vector machines. In Proc. of the European Conference on Machine Learning, pages , [10] F. Lauer and G. Bloch. Incorporating prior knowledge in support vector machines for classification: A review. Neurocomputing, 71(7 9): , [11] Q. V. Le, A. J. Smola, and T. Gärtner. Simpler knowledge-based support vector machines. In Proc. of the 23rd International Conference on Machine Learning, pages , [12] R. Maclin, E. W. Wild, J. W. Shavlik, L. Torrey, and T. Walker. Refining rules incorporated into knowledge-based support vector learners via successive linear programg. In AAAI Twenty-Second Conference on Artificial Intelligence, pages , [13] O. L. Mangasarian, J. W. Shavlik, and E. W. Wild. Knowledge-based kernel approximation. Journal of Machine Learning Research, 5: , [14] O. L. Mangasarian and E. W. Wild. Nonlinear knowledge-based classification. IEEE Transactions on Neural Networks, 19(10): , [15] M. E. Pavkov, R. L. Hanson, W. C. Knowler, P. H. Bennett, J. Krakoff, and R. G. Nelson. Changing patterns of Type 2 diabetes incidence among Pima Indians. Diabetes Care, 30(7): , [16] M. Pazzani and D. Kibler. The utility of knowledge in inductive learning. Mach. Learn., 9:57 94, [17] B. Schölkopf, P. Simard, A. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In Advances in Neural Information Processing Systems, volume 10, pages , [18] J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. of the Symposium on Comp. Apps. and Medical Care, pages IEEE Computer Society Press, [19] A. J. Smola and S. V. N. Vishwanathan. Kernel methods for missing variables. In Proc. of the Tenth International Workshop on Artificial Intelligence and Statistics, pages , [20] S. Thrun. Extracting rules from artificial neural networks with distributed representations. In Advances in Neural Information Processing Systems, volume 8, [21] G. G. Towell and J. W. Shavlik. Knowledge-based artificial neural networks. AIJ, 70(1 2): , [22] R. H. Tütüncü, K. C. Toh, and M. J. Todd. Solving semidefinite-quadratic-linear programs using SDPT3. Mathematical Programg, 95(2), [23] T. Walker, G. Kunapuli, N. Larsen, D. Page, and J. W. Shavlik. Integrating knowledge capture and supervised learning through a human-computer interface. In Proc. Fifth Intl. Conf. Knowl. Capture, [24] A. L. Yuille and A. Rangarajan. The concave-convex procedure (CCCP). In Advances in Neural Information Processing Systems, volume 13,
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Massive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
Data Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
Operations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]
A Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Linear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Monotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
A Linear Programming Approach to Novelty Detection
A Linear Programming Approach to Novelty Detection Colin Campbell Dept. of Engineering Mathematics, Bristol University, Bristol Bristol, BS8 1 TR, United Kingdon C. [email protected] Kristin P. Bennett
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Comparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
Proximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Equilibrium computation: Part 1
Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING
THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING 1. Introduction The Black-Scholes theory, which is the main subject of this course and its sequel, is based on the Efficient Market Hypothesis, that arbitrages
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Chapter 13: Binary and Mixed-Integer Programming
Chapter 3: Binary and Mixed-Integer Programming The general branch and bound approach described in the previous chapter can be customized for special situations. This chapter addresses two special situations:
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Discrete Optimization
Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
