Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
|
|
|
- Horace Watkins
- 10 years ago
- Views:
Transcription
1 Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage Network Provider Health Prize Round 3 Milestone as required by the rules. The entry is a weighted sum (called blend in this document) of multiple runs, each of which was generated by one of the following methods: Regularized greedy forest (RGF) [6]. Gradient boosting decision tree (GBDT) [5]. Linear models with L 2 regularization, trained for the residual of either RGF or GBDT predictions. Some linear model runs used features generated by Alternating structure optimization (ASO) [1]. Random forests [3] trained for the residual of either RGF or GBDT predictions. The difference among the runs produced by the same algorithm is the features. Among these, RGF is a method to learn tree ensembles, like GBDT. Our original motivation to enter the competition was to test RGF in a competitive setting. In our experiments, RGF consistently produced more accurate models than GBDT, but blending RGF and GBDT runs produced even more accurate models. 2 Algorithms All the models were trained for minimizing square error with the log-scale target (log(x + 1)). Corresponding references should be consulted for the content of the algorithms. This section describes how we used the algorithms. More detailed information required for replicating the results will be given in the Appendix. 2.1 Regularized greedy forest (RGF) Implementation of RGF is available at which is open software issued under GNU Public License V3. As we performed regularization on leaf-only models with the extension (described in [6]), there were two L 2 regularization parameters to be set: one for weight optimization and the other for tree learning. All the RGF runs set these parameters to 0.5 and 0.005, respectively. The model size in terms of the number of leaf nodes in the forest (tree ensemble) was set to unless otherwise specified. The other parameters were set to the default of the system. 2.2 Gradient boosting decision tree (GBDT) A well-known implementation of GBDT is the R package gbm [7] though we used our own implementation for convenience. The shrinkage parameter, tree size (in terms of the number of leaf nodes), and the minimum training data points per node were set to 0.01, 20, and 100, respectively. We did not perform data sampling. The model size in terms of the number of leaf nodes in the forest was set to unless otherwise specified. 1
2 2.3 Random forests combined with RGF/GBDT We trained random forests for the residual of either an RGF run or a GBDT run. That is, the training target for the i-th data point was set to y i f(x i ) where y i is the original target (DaysInHospital in the log-scale) and f(x i ) is the prediction made by an RGF (or GBDT) model, which serves as base prediction. The prediction on test data was done by adding to the base prediction the residual prediction made by a random forest model. Parameters used for random forest training are described in the Table 8 in the Appendix. 2.4 Linear models combined with RGF/GBDT and ASO The linear models were trained for the residual of either an RGF run or a GBDT run. More precisely, for the i-th data point, let x i be the feature vector and let ỹ i be the residual of the base prediction (i.e., ỹ i = y i f(x i )). Then the training objective of the linear models was to obtain weight vector ŵ = arg min w [ 1 n n i=1 (w x i ỹ i ) 2 + λw w ]. The L 2 regularization parameter λ was set to In some runs, we used Alternating structure optimization (ASO) [1] to generate additional features. Essentially, ASO learns new features through dimensionality reduction of predictors that are trained for auxiliary tasks. At a high level, the idea underlying ASO is multi-view learning as analyzed in [2]. The auxiliary tasks are in the form of predicting one view of the features (target view) based on another view (feature view); for example, to predict how many claims with PlaceSvc=Ambulance the member has, using the Vendor information as only input (i.e., zeroing out other features). The ASO procedure we performed was as follows. 1. Define m auxiliary tasks. 2. Train linear predictors for the auxiliary tasks, which results in m weight vectors, and let W be a matrix whose columns are the m weight vectors. 3. Compute W s singular value decomposition, and let U k be a matrix whose columns are the left singular vectors of W corresponding to the k largest singular values. 4. Compute Z = U k X where X is the original feature matrix whose columns are feature vectors. Then the columns of Z are the new additional feature vectors of k dimensions. The auxiliary tasks we defined will be given in Appendix A.3. 3 Features Generation of individual features mostly follows or extends [4]. However, notable differences are that we did not do any elaborate feature selection, and that we combined the extracted features quite differently most of our runs used the features generated by aggregating claims in a one-year period and a two-year period simultaneously. Details will be given in Appendix A. 4 Blending and post processing 4.1 Blending to produce the winning submission The nominated entry was a blend of five runs, and their public Leaderboard scores and blend weights are shown below. name public score weight run# run# run# run# run#
3 The blend weights were determined by a slight variation (described later) of Section 7.1 of [8], which approximately solves ridge regression on the test data using the Leaderboard scores. As we understand it, this blending method was also used by the previous milestone winners. Although [8] should be consulted for details, it is worth mentioning that the center piece of this approximation is the fact that in the solution to ridge regression (X X + nλi) 1 X y on the n test data points, the only unknown term X y (because the target y is unknown) can be closely approximated using the Leaderboard scores. That is, if we let p (j) be the j-th run (predictions), then the j-th component of vector X y is p (j) y, and we have: p (j) y = 1 ( p (j) y 2 p (j) 2 y 2). 2 p (j) y 2 can be approximated by n s j where s j is the Leaderboard score of the j-th run p (j), and y 2 can be approximated using the all-zero benchmark provided by the organizer. Our slight variation was to train for the residual, using the average of the five runs b = j=1 p(j) as base prediction (introduced in Section 2.3), and to use p (j) b in place of p (j). In this variation the j-th component of the unknown term is (p (j) b) (y b) instead of p (j) y and its approximation simply reduces to the approximation of p (i) y (i = 1,..., 5). The regularization parameter λ was set to Blending to produce the five runs Each of the five runs was a blend of multiple runs. This blending was done by ridge regression trained on the training data using 5-fold cross validation results. In this blending, we again used the average of the input runs as base prediction and trained for residual. The features were the differences of each run p from the base prediction b and { { pi b three vectors per run given by: v 1,i = i if p i < 0.5 pi b, v 0 otherwise 2,i = i if p i 0.5, and v 0 otherwise 3,i = { pi b i if p i 1, where subscript i indicates the i-th vector component. The regularization parameter was 0 otherwise set to for Run#2, for Run#5, and for the rest. The individual runs blended into five runs will be described in Appendix C. 4.3 Post processing Post processing was applied to the final blend. Let S a be the set of test data points for which Age is missing. Its complement S a is the set of test data points for which Age is given. Similarly, define S s and S s for Sex. Our post processing was to match the average of the predictions within these four sets to the true average first within S s and S s and then within S a and S a. The true average of the targets within a subset can be estimated from the Leaderboard scores by noting the following. Let s be the Leaderboard score of a constant model in [4], which sets a constant for the data points in S and 0 for the data points in S. Then we have: n s 2 i S (c y i ) 2 + i S y 2 i = S c 2 2c i S y i + n yi 2. where c is the constant in log-scale. On the right-hand side, i S y i in the second term is the desired quantity, the third term can be approximated using the all-zero benchmark, and the first term is known. Finally, we convert the log-scale prediction values to the desired scale and truncate the results into [0, 15]. i=1 References [1] Rie Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6: , [2] Rie Ando and Tong Zhang. Two-view feature generation model for semi-supervised learning. In Proceedings of the 24th International Conference on Machine Learning (ICML),
4 [3] Leo Breiman. Random forests. Machine Learning, 45:5 32, [4] Phil Brierley, David Vogel, and Randy Axelrod. Heritage Provider Network Health Prize Round 1 Milestone Prize: How we did it Team Market Makers [5] Jerome Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, [6] Rie Johnson and Tong Zhang. Learning nonlinear functions using regularized greedy forest. Technical report, arxiv: v5, [7] Greg Ridgeway. Package gbm [8] Andreas Töscher and Michael Jahrer. The BigChaos solution to the Netflix Grand Prize. BPC BigChaos.pdf,
5 A Features We first define several feature sets, which will be combined to compose datasets in the next section. See [4] for the definition of velocity, range, and AdmissionRisk ; and conversion of categorical values to numerical values such as DSFS and LengthOfStay. A.1 Features derived from the member information: m1 A.2 Features derived from Claim data Notation Feature set: m1 Age Age missing Age.0-9 Age Age Age Age Age Age Age Age.80- Male Female Gender unknown ClaimsTruncated 1y : aggregate the claim data in a one-year period. 2y : aggregate the claim data in a two-year period. numeric count: count of claims for that member in which the corresponding category appears. Blank (missing value) was treated as a category only for Vendor, ProviderID, and PCP, and ignored (i.e., not counted) for others. The numbers in the parentheses are the number of features corresponding to the description. For example, PlaceSvc count (8) in the first line means that there are eight features each of which is the count of one of eight PlaceSvc categories ( Ambulance, Home, and so on) for the member. If omitted, the number of features is one. ratio: count divided by the number of claims for the member. Blank (missing value) was treated as a category. A.2.1 Feature sets t1, t3, w1, w2, and w3 Description Type Feature set t1 t3 w1 w2 w3 PlaceSvc count (8) numeric 1y 1y 2y 2y 2y Specialty count (12) numeric 1y 1y 2y 2y 2y ProcedureGroup count (17) numeric 1y 1y 2y 2y 2y PCGroup count (45) numeric 1y 1y 2y 2y 2y ProviderID count (14700) numeric 1y 1y 2y 2y Vendor count (6388) numeric 1y 1y 2y 2y PCP count (1360) numeric 1y 1y 2y 2y 5
6 PlaceSvc Specialty count (83) numeric 1y 1y 2y 2y PlaceSvc ProcedureGroup count (116) numeric 1y 1y 2y 2y PlaceSvc PCGroup count (320) numeric 1y 1y 2y 2y Specialty ProcedureGroup count (163) numeric 1y 1y 2y 2y Specialty PCGroup count (471) numeric 1y 1y 2y 2y ProcedureGroup PCGroup count (609) numeric 1y 1y 2y 2y Specialty distinctive count numeric 1y 1y 2y 2y 2y PlaceSvc distinctive count numeric 1y 1y 2y 2y 2y PCGroup distinctive count numeric 1y 1y 2y 2y 2y ProcedureGroup distinctive count numeric 1y 1y 2y 2y 2y ProviderID distinctive count numeric 1y 1y 2y 2y Vendor distinctive count numeric 1y 1y 2y 2y PCP distinctive count numeric 1y 1y 2y 2y PlaceSvc ratio (9) numeric 1y 1y 2y 2y 2y Specialty ratio (13) numeric 1y 1y 2y 2y 2y PCGroup ratio (46) numeric 1y 1y 2y 2y 2y ProcedureGroup ratio (18) numeric 1y 1y 2y 2y 2y DSFS min numeric 1y 1y 2y 2y DSFS max numeric 1y 1y 2y 2y 2y DSFS average numeric 1y 1y 2y 2y DSFS standard deviation numeric 1y 1y 2y 2y DSFS range numeric 1y 1y 2y 2y DSFS> k?: k = 1, 2,..., 11 (11) 2y CharlsonIndex min numeric 1y 1y 2y 2y CharlsonIndex max numeric 1y 1y 2y 2y 2y CharlsonIndex average numeric 1y 1y 2y 2y CharlsonIndex.standard deviation numeric 1y 1y 2y 2y CharlsonIndex range numeric 1y 1y 2y 2y CharlsonIndex sum numeric 1y 1y claim counts numeric 1y 1y 2y 2y 2y LengthOfStay min numeric 1y 1y 2y 2y LengthOfStay max numeric 1y 1y 2y 2y 2y LengthOfStay avg numeric 1y 1y 2y 2y LengthOfStay standard deviation numeric 1y 1y 2y 2y LengthOfStay #missing numeric 1y 1y 2y 2y LengthOfStay #suppressed numeric 1y 1y 2y 2y 2y LengthOfStay #valid numeric 1y 1y 2y 2y LengthOfStay sum numeric 1y 1y 2y 2y 2y DrugCount min numeric 1y 1y 2y 2y DrugCount max numeric 1y 1y 2y 2y DrugCount average numeric 1y 1y 2y 2y DrugCount range numeric 1y 1y 2y 2y DrugCount #entry numeric 1y 1y 2y 2y 2y DrugCount sum numeric 1y 1y 2y 2y 2y DrugCount velocity numeric 1y 1y DSFS-in-DrugCount> k?: k = 1,..., 11 (11) 2y LabCount.min numeric 1y 1y 2y 2y LabCount.max numeric 1y 1y 2y 2y LabCount.avg numeric 1y 1y 2y 2y LabCount.range numeric 1y 1y 2y 2y 6
7 LabCount.#entry numeric 1y 1y 2y 2y 2y LabCount.sum numeric 1y 1y 2y 2y 2y LabCount.velocity numeric 1y 1y DSFS-in-LabCount> k?: k = 1,..., 11 (11) 2y AdmissionRiskL70.max numeric 1y 1y 2y 2y 2y AdmissionRiskL70.avg numeric 1y 1y 2y 2y AdmissionRiskG70.max numeric 1y 1y 2y 2y 2y AdmissionRiskG70.avg numeric 1y 1y 2y 2y AdmissionRiskL70.sum numeric 1y 1y 2y 2y 2y AdmissionRiskG70.sum numeric 1y 1y 2y 2y 2y PlaceSvc LengthOfStay (54) numeric 1y 2y Specialty LengthOfStay (38) numeric 1y 2y ProcedureGroup LengthOfStay (119) numeric 1y 2y PCGroup LengthOfStay (290) numeric 1y 2y PlaceSvc DSFS (96) numeric 1y 2y Specialty DSFS (144) numeric 1y 2y ProcedureGroup DSFS (204) numeric 1y 2y PCGroup DSFS (539) numeric 1y 2y LengthOfStay DSFS (119) numeric 1y 2y A.2.2 Features mainly meant for linear models: l1 and l2 Feature set: l1 and l2 Description Discretization interval Type l1 l2 PlaceSvc count (8) numeric 1y Specialty count (12) numeric 1y ProcedureGroup count (17) numeric 1y PCGroup count (45) N/A numeric 1y ProviderID count (14700) numeric 1y Vendor count (6388) numeric 1y PCP count (1360) numeric 1y PlaceSvc Specialty count (83) numeric 1y PlaceSvc ProcedureGroup count (116) numeric 1y PlaceSvc PCGroup count (320) numeric 1y N/A Specialty ProcedureGroup count (163) numeric 1y Specialty PCGroup count (471) numeric 1y ProcedureGroup PCGroup count (609) numeric 1y DSFS count (12) numeric 1y CharlsonIndex count (4) numeric 1y LengthOfStay count (11) N/A numeric 1y DSFS-in-DrugCount count (12) numeric 1y DSFS-in-LabCount count (12) numeric 1y Specialty distinctive count> k? (8) 1y PlaceSvc distinctive count> k? (8) 1y PCGroup distinctive count> k? (8) 1y ProcedureGroup distinctive count> k? (8) 1y ProviderID distinctive count> k? (8) k = 1, 2, 3, 4, 5, 10, 15, 20 1y Vendor distinctive count> k? (8) 1y PCP distinctive count> k? (8) 1y 7
8 Vendor distinctive count> k? (8) 1y PCP distinctive count> k? (8) 1y #claim> k? (10) k = 1, 2, 3, 4, 5, 10, 15, 20, 30, 40 1y DSFS.max> k? (12) k = 1, 2,..., 12 1y CharlsonIndex.max> k? (4) k = 0, 2, 4, 6 1y DSFS-in-DrugCount.max> k? (12) 1y k = 1, 2,..., 12 DSFS-in-LabCount.max> k? (12) 1y DrugCount.#entry> k? (7) 1y k = 0, 1, 2, 4,..., 10 LabCount.#entry> k? (7) 1y DrugCount.sum> k? (10) k = 0, 1, 3, 5, 10, 20,..., 60 1y LabCount.sum> k? (12) k = 0, 1, 3, 5, 10, 20,..., 80 1y {Male Female?} PlaceSvc count (24) numeric 1y {Male Female?} Specialty count (36) numeric 1y N/A {Male Female?} ProcedureGroup count (51) numeric 1y {Male Female?} PCGroup count (135) numeric 1y (Age> k?) {Male Female?} (27) 1y (Age> k?) PlaceSvc count (72) numeric 1y (Age> k?) Specialty count (108) k = 0, 10,..., 80 numeric 1y (Age> k?) ProcedureGroup count (153) numeric 1y (Age> k?) PCGroup count (405) numeric 1y A.2.3 Feature singletons: DiH, CT, CY The feature sets defined below consist of one feature. A.2.4 Feature set: p1 Feature set Description type DiH DaysInHospital in the previous year; 1 if unknown. numeric CT ClaimsTruncated in the previous year; 1 if unknown. {1, 0, 1} CY Count of years in which the member has any claim numeric Feature set p1 is an extension of pcp prob of [4]. Let f(x) denote the value of feature f for the member x, and let d(x) be member x s DaysInHospital in the next year. Considering the probability that d( ) > 0 conditioned on the value of f( ) according to distribution D, we define { Pr(d(x) > 0 f(x) = 0)x D if f(m) = 0 q f (m) = Pr(d(x) > 0 f(x) 0) x D otherwise Let F be a set of count features derived from the same original field, e.g., a set of eight count features derived from PlaceSvc. Then for each member m, we derive the following three features: max q f (m), f F min q f (m), f F q f (m). The conditional probabilities were estimated from the Y1 data with the Y2 target without smoothing. To cope with the issue of rare events, we merged the features whose count of the event (either f(m) = 0 or f(m) 0 ) is less than 10 into one feature only for this purpose. We applied this procedure to PlaceSvc, Specialty, ProcedureGroup, PrimaryConditionGroup, bigrams of these four, Vendor, ProviderID, and PCP, which resulted in 39 features. f F 8
9 A.3 ASO features: aso We had four instances of ASO. In instance#1 and #2, the auxiliary tasks are in the form of predicting one view (target view) based on another view (feature view) as follows: Instance#1. Target views: Table 3. Feature views: Table 4. Instance#2. Target views: Table 3. Feature views: Table 6. The pairs of target view and feature view that overlap (e.g., PlaceSvc counts and PlaceSvc Specialty counts ) are excluded. In #3 and #4, the prediction targets of the auxiliary tasks are the DaysInHospital predictions made by one view (indirect-target view), and the features are another view (feature view). Instance#3. Indirect-target views: Table 5. Feature views: Table 4. Instance#4. Indirect-target views: Table 5. Feature views: views in Table 6. The pairs of indirect-target view and feature view that overlap are excluded. Note that construction of auxiliary problems in instance#1 and #2 follows the unsupervised strategy whereas that of #3 and #4 follows the partially-supervised strategy; these strategies are discussed in Sections 4.2 of [1]. The auxiliary tasks in #3 and #4 require as preprocessing training linear predictors to predict DaysInHospital from the indirect target views. For the training on the auxiliary tasks (and preprocessing), all the Y1 Y3 data was used in instance#1 and #2, and Y1 data was used with Y2 DaysInHospital in #3 and #4. The L 2 regularization parameter was set to The dimensionality for these four ASO instances were set to 70, 50, 11, and 12, respectively. We call the concatenation of the features generated by the four instances (therefore of dimensionality 143= ) feature set aso. PlaceSvc counts Specialty counts ProcedureGroup counts PCGroup counts all the features in m1 all the features derived from CharlsonIndex in t1 all the features derived from LengthOfStay in t1 all the AdmissionRisk features in t1 all the features derived from DSFS in t1 all the features derived from DrugCount in t1 all the features derived from LabCount in t1 Table 3: Target views for ASO instance#1 and #2. Vendor counts ProviderID counts PCP counts Table 4: Feature views for ASO instance#1 and #3. One view per line. 9
10 all the features derived from PlaceSvc in l1 all the features derived from Specialty in l1 all the features derived from ProcedureGroup in l1 all the features derived from PCGroup in l1 all the features in m1all the features derived from CharlsonIndex in l1 all the features derived from LengthOfStay in l1 all the features derived from DSFS in l1 all the features derived from DrugCount in l1 all the features derived from LabCount in l1 all the features in l1 excluding those derived from Vendor, ProviderID, or PCP Table 5: Indirect-target views for ASO instance#3 and #4. One view per line. PlaceSvc Specialty counts PlaceSvc ProcedureGroup counts PlaceSvc PCGroup counts Specialty ProcedureGroup counts Specialty PCGroup counts ProcedureGroup PCGroup counts PlaceSvc LengthOfStay counts Specialty LengthOfStay counts ProcedureGroup LengthOfStay counts PCGroup LengthOfStay counts PlaceSvc DSFS counts Specialty DSFS counts ProcedureGroup DSFS counts PCGroup DSFS counts LengthOfStay DSFS counts Table 6: Feature views for ASO instance#2 and #4. One view per line. B Datasets Datasets, which were the actual input to training and application of the models, were generated from the feature sets introduced above. There are 21 types of datasets. Some runs used them without change, and some runs added or removed certain features, which will be described later. Notation The target year is the year associated with DaysInHospital used as either the training target or test target (i.e., the values we predict). The test target year is always Y4. The training target year is shown in the table. Y 1 indicates the year before the target year, and Y 2 indicates the year two years before the target year. For example, when Y3 is the training target year, Y 1 is Y2 and Y 2 is Y1. To apply the models to the test data, the test target year is Y4, therefore, Y 1 is Y3 and Y 2 is Y2. For easier understanding, consider the feature sets such as t1 and t3 to be feature generators applied to the claim data. For example, t1(y 1 ) below means application of feature generator t1 to the claim data in year Y 1. w1(y 1,Y 2 ) applies w1 to the claim data in the two-year period from Y 2 to Y 1. 10
11 Dataset names Dataset content Training target year t1 m1+t1(y 1 ) Y3 t1w1 m1+t1(y 1 )+w1(y 1,Y 2 ) Y3 t1w2 m1+t1(y 1 )+w2(y 1,Y 2 ) Y3 t1w3 m1+t1(y 1 )+w3(y 1,Y 2 ) Y3 t1t1 m1+t1(y 1 )+t1(y 2 ) Y3 t1t1w1 m1+t1(y 1 )+t1(y 2 )+w1(y 1,Y 2 ) Y3 t1t1w2 m1+t1(y 1 )+t1(y 2 )+w2(y 1,Y 2 ) Y3 t1t1w3 m1+t1(y 1 )+t1(y 2 )+w3(y 1,Y 2 ) Y3 t1* m1+t1(y2); m1+t1(y1) Y3;Y2 t3 m1+t3(y 1 ) Y3 t3w1 m1+t3(y 1 )+w1(y 1,Y 2 ) Y3 t3w2 m1+t3(y 1 )+w2(y 1,Y 2 ) Y3 t3w3 m1+t3(y 1 )+w3(y 1,Y 2 ) Y3 t3t3 m1+t3(y 1 )+t3(y 2 ) Y3 t3t3w1 m1+t3(y 1 )+t3(y 2 )+w1(y 1,Y 2 ) Y3 t3t3w2 m1+t3(y 1 )+t3(y 2 )+w2(y 1,Y 2 ) Y3 t3t3w3 m1+t3(y 1 )+t3(y 2 )+w3(y 1,Y 2 ) Y3 t3* m1+t3(y2); m1+t1(y1) Y3;Y2 u1 m1+l1(y 1 )+l2(y 1 ) Y3 u1* m1+l1(y2)+l2(y2); m1+l1(y1)+l2(y1) Y3;Y2 u1u1 m1+l1(y 1 )+l2(y 1 )+l1(y 2 )+l2(y 2 ) Y3 Note that the number of data points in t1*, t3*, and u1* is about twice that of the other datasets, and they serve only as training data with a concatenation of Y3 target and Y2 target. In these datasets, there are two data points for the members who have claims in both years. Other datasets serve as training data with the Y3 target and as test data for predicting the Y4 target. C Runs Each of the five runs shown in Section 4 is a blend of multiple runs. Tables 7 12 show the individual runs blended into those five runs. Notation F <n denotes a set of features that have non-zero values for the data points fewer than n. Datasets Add... Remove... Method t1, t1w1, t1w2, t1w3, t1t1, t1t1w1, t1t1w2, t1t1w3, t1* None None RGF, GBDT t3, t3w1, t3w2, t3w3, t3t3, t3t3w1, t3t3w2, t3t3w3, t3* None None RGF, GBDT t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH & p1 F <10 RGF, GBDT t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH AdmRisk & F <100 RGF t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH LoS.#supp & F <100 RGF t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH UnkGender & F <100 RGF u1* None None RGF, GBDT u1u1 None F <10 RGF, GBDT t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH F <100 RGF, GBDT Table 7: Run#1 part1/2: RGF and GBDT 11
12 Datasets Remove... Base Parameters for random forests t3, t3t3, t3w1, t3w3 F <10 R(t1*), G(t1*) m = 50, r = 0.33, n = 50 t1w1, t1w2, t1t1w1, t1t1w2 None R(t1*), R(t3t3w3) m = 100, r = 0.3, n = 50 t1w3, t1t1w3 None R(t1*), R(t3t3w3) m = 100, r = 0.2, n = 50 t3w1, t3w2, t3w3, t3t3w1, t3t3w2, t3t3w3 None R(t1*), R(t3t3w3) m = 100, r = 0.2, n = 50 Table 8: Run#1 part2/2: Random forests trained for residual of RGF or GBDT. m: the minimum training data points per node; r: feature sampling ratio; n: the number of trees. No data sampling. In the Base column, R(x) is an RGF run applied to dataset x, and G(x) is a GBDT run applied to dataset x. Datasets Add... Remove... Method t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH F <100 RGF, GBDT t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH& CT F <100 RGF, GBDT t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 CT F <100 RGF, GBDT Table 9: Run#2: RGF and GBDT Datasets Add. Rmv. Conversion Base u1 aso F <10 x log(x + 1) R(t1*), G(t1*), R(t3*), G(t3*), R(t3t3w3), G(t3t3w3) u1* aso F <10 x log(x + 1) R(t1*), G(t1*), R(t3*), G(t3*) u1 aso F <10 x min(1, log(x + 1)) R(t1*), G(t1*), R(t3*), G(t3*), R(t3t3w3), G(t3t3w3) u1* aso F <10 x min(1, log(x + 1)) R(t1*), G(t1*), R(t3*), G(t3*) u1, u1* F <10 x min(1, log(x + 1)) R(t1*), G(t1*), R(t3*), G(t3*) u1, u1* F <10 x log(x + 1) R(t1*), G(t1*), R(t3*), G(t3*) Table 10: Run#3: Linear models trained for residual of RGF or GBDT. In the Base column, R(x) is an RGF run applied to dataset x, and G(x) is a GBDT run applied to dataset x. Datasets Add... Remove... Method t3w1, t3w3, t3t3, t3t3w1, t3t3w3 CY& DiH& p1 F <100 RGF t1w1, t1w3, t1t1, t1t1w1, t1t1w3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 CY F <10 RGF, GBDT Table 11: Run#4: RGF and GBDT. For each combination of datasets and methods, two runs were performed: one with model size (in terms of the number of leaf nodes in the forest) and the other with model size Datasets Add... Remove... Method t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 DiH& CT F <100 RGF, GBDT t3, t3w1, t3w3, t3t3, t3t3w1, t3t3w3 CT F <100 RGF, GBDT Table 12: Run#5 RGF and GBDT 12
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)
THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS by XIAO CHENG (Under the Direction of Jeongyoun Ahn) ABSTRACT Big Data has been the new trend in businesses.
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Heritage Provider Network Health Prize. Round 1 Milestone Prize. How We Did It Team Market Makers
Heritage Provider Network Health Prize Round 1 Milestone Prize How We Did It Team Market Makers Phil Brierley Tiberius Data Mining [email protected] David Vogel Voloridge Investment Management [email protected]
Feature Engineering and Classifier Ensemble for KDD Cup 2010
JMLR: Workshop and Conference Proceedings 1: 1-16 KDD Cup 2010 Feature Engineering and Classifier Ensemble for KDD Cup 2010 Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G. McKenzie, Jung-Wei
Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering
IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
The Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems
Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Classification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Package acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
Benchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
IST 557 Final Project
George Slota DataMaster 5000 IST 557 Final Project Abstract As part of a competition hosted by the website Kaggle, a statistical model was developed for prediction of United States Census 2010 mailing
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Applied Multivariate Analysis - Big data analytics
Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix [email protected] http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
Nominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Multivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Measurement and Measurement Scales
Measurement and Measurement Scales Measurement is the foundation of any scientific investigation Everything we do begins with the measurement of whatever it is we want to study Definition: measurement
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/
Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Package trimtrees. February 20, 2015
Package trimtrees February 20, 2015 Type Package Title Trimmed opinion pools of trees in a random forest Version 1.2 Date 2014-08-1 Depends R (>= 2.5.0),stats,randomForest,mlbench Author Yael Grushka-Cockayne,
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
Data Mining Essentials
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
IBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Predictive Modeling and Big Data
Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
The Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Council of Ambulance Authorities
Council of Ambulance Authorities National Patient Satisfaction Survey 2015 Prepared for: Mojca Bizjak-Mikic Manager, Data & Research The Council of Ambulance Authorities Prepared by: Dr Svetlana Bogomolova
Chapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Virtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
Getting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
