partykit: A Toolkit for Recursive Partytioning
|
|
|
- Miles Owens
- 10 years ago
- Views:
Transcription
1 partykit: A Toolkit for Recursive Partytioning Achim Zeileis, Torsten Hothorn
2 Overview Status quo: R software for tree models New package: partykit Unified infrastructure for recursive partytioning Classes and methods Interfaces to rpart, J48,... Illustrations Future: Next steps
3 R software for tree models Status quo: The CRAN task view on Machine Learning at lists numerous packages for tree-based modeling and recursive partitioning, including rpart (CART), tree (CART), mvpart (multivariate CART), RWeka (J4.8, M5, LMT), party (CTree, MOB), and many more... Related: Packages for tree-based ensemble methods such as random forests or boosting, e.g., randomforest, gbm, mboost, etc.
4 R software for tree models Moreover: Some tree algorithms with R packages that are not on CRAN, e.g., STIMA, and TINT. And further tree algorithms/software without R interface, e.g., QUEST, GUIDE, LOTUS, CRUISE,... Currently: All algorithms/software have to deal with similar problems but provide different solutions without reusing code.
5 R software for tree models Challenge: For implementing new algorithms in R, code is required not only for fitting the tree model on the learning data but also representing fitted trees, printing trees, plotting trees, computing predictions from trees.
6 R software for tree models Question: Wouldn t it be nice if there were an R package that provided code for representing fitted trees, printing trees, plotting trees, computing predictions from trees?
7 R software for tree models Answer: The R package partykit provides unified infrastructure for recursive partytioning, especially representing fitted trees, printing trees, plotting trees, computing predictions from trees!
8 partykit: Unified infrastructure Design principles: Toolkit for recursive partytitioning. One agnostic base class which can encompass an extremely wide range of different types of trees. Subclasses for important types of trees, e.g., trees with constant fits in each terminal node. Nodes are recursive objects, i.e., a node can contain child nodes. Keep (learning) data out of the recursive node and split structure. Basic printing, plotting, and predicting for raw node structure. Customization via suitable panel or panel-generating functions. Coercion from existing objects (rpart, J48, etc.) to new class. Use simple/fast S3 classes and methods.
9 partykit: Base classes Class constructors: Generate basic building blocks. partysplit(varid, breaks = NULL, index = NULL,...) where breaks provides the breakpoints wrt variable varid; index determines to which kid node observations are assigned. partynode(id, split = NULL, kids = NULL,...) where split is a partysplit and kids a list of partynode s. party(node, data, fitted = NULL,...) where node is a partynode and data the corresponding (learning) data (optionally without any rows) and fitted the corresponding fitted nodes. Additionally: All three objects have an info slot where optionally arbitrary information can be stored.
10 partykit: Further classes and methods Further classes: For trees with constant fits in each terminal node, both inheriting from party. constparty : Stores full observed response and fitted terminal nodes in fitted; predictions are computed from empirical distribution of the response. simpleparty : Stores only one predicted response value along with some summary details (such as error and sample size) for each terminal node in the corresponding info. Methods: Display: print(), plot(), predict(). Query: length(), width(), depth(), names(), nodeids(). Extract: [, [[, nodeapply(). Coercion: as.party().
11 partykit: Illustration Intention: Solution: Illustrate several trees using the same data. Not use the iris data (or something from mlbench). Use Titanic survival data (oh well... ). In case you are not familiar with it: Survival status, gender, age (child/adult), and class (st, 2nd, 3rd, crew) for the 22 persons on the ill-fated maiden voyage of the Titanic. Question: Who survived? Or how does the probability of survival vary across the covariates?
12 partykit: Interface to rpart CART: Apply rpart() to preprocessed ttnc data (see also below). R> rp <- rpart(survived ~ Gender + Age + Class, data = ttnc) Standard plot: R> plot(rp) R> text(rp) Visualization via partykit: R> plot(as.party(rp))
13 partykit: Interface to rpart Gender=a Age=b Class=c No No Class=c Yes No Yes
14 partykit: Interface to rpart Gender Male Female 2 Age 7 Class Adult Child 4 Class 3rd st, 2nd, Crew 3rd st, 2nd Node 3 (n = 667) Node 5 (n = 48) Node 6 (n = 6) Node 8 (n = 96) Node 9 (n = 274)
15 partykit: Interface to rpart R> rp n= 22 node), split, n, loss, yval, (yprob) * denotes terminal node ) root 22 7 No ( ) 2) Gender=Male No ( ) 4) Age=Adult No ( ) * 5) Age=Child No ( ) ) Class=3rd 48 3 No ( ) * ) Class=st,2nd 6 Yes (..) * 3) Gender=Female Yes ( ) 6) Class=3rd 96 9 No ( ) * 7) Class=st,2nd,Crew Yes ( ) *
16 partykit: Interface to rpart R> as.party(rp) Model formula: Survived ~ Gender + Age + Class Fitted party: [] root [2] Gender in Male [3] Age in Adult: No (n = 667, err = 2.3%) [4] Age in Child [5] Class in 3rd: No (n = 48, err = 27.%) [6] Class in st, 2nd: Yes (n = 6, err =.%) [7] Gender in Female [8] Class in 3rd: No (n = 96, err = 45.9%) [9] Class in st, 2nd, Crew: Yes (n = 274, err = 7.3%) Number of inner nodes: 4 Number of terminal nodes: 5
17 partykit: Interface to rpart Prediction: Compare rpart s C code and partykit s R code for (artificially) large data set. R> nd <- ttnc[rep(:nrow(ttnc), ), ] R> system.time(p <- predict(rp, newdata = nd, type = "class")) user system elapsed R> system.time(p2 <- predict(as.party(rp), newdata = nd)) user system elapsed R> table(rpart = p, party = p2) party rpart No Yes No 9 Yes 29
18 partykit: Interface to J48 J4.8: Open-source implementation of C4.5 in RWeka. R> j48 <- J48(Survived ~ Gender + Age + Class, data = ttnc) Results in a tree with multi-way splits which previously could only be displayed via Weka itself or Graphviz but not in R directly. Now: R> j48p <- as.party(j48) R> plot(j48p) Or just a subtree: R> plot(j48p[])
19 partykit: Interface to J48 Gender Male Female 2 Class Class st 2nd 3rdCrew 3 Age 6 Age st 2nd 3rd Crew Child Adult Child Adult n = 5 n = 75 n = n = 68 n = 5 n = 862 n = 45 n = 6 n = 96 n = 23
20 partykit: Interface to J48 Class st 2nd 3rd Crew Node 2 (n = 45) Node 3 (n = 6) Node 4 (n = 96) Node 5 (n = 23)
21 partykit: Further interfaces PMML: Predictive Model Markup Language. XML-based format exported by various software packages including SAS, SPSS, R/pmml. Here, reimport CART tree. R> pm <- pmmltreemodel("ttnc.xml") evtree: Evolutionary learning of globally optimal trees, directly using partykit. R> set.seed(7) R> ev <- evtree(survived ~ Gender + Age + Class, data = ttnc, + maxdepth = 3) CTree: Conditional inference trees ctree() are reimplemented more efficiently within partykit. CHAID: R package on R-Forge, directly using partykit. (Alternatively, use SPSS and export via PMML.)
22 partykit: PMML Gender Male Female 2 Age 7 Class Adult Child 3rd st, 2nd, Crew 3 No (n = 667, err = 2.3%) 4 Class 8 No (n = 96, err = 45.9%) 9 Yes (n = 274, err = 7.3%) 3rd st, 2nd 5 No (n = 48, err = 27.%) 6 Yes (n = 6, err =.%)
23 partykit: evtree Class st, 2nd, Crew 3rd 2 Gender Female Male 4 Age Child Adult Node 3 (n = 274) Node 5 (n = 6) Node 6 (n = 25) Node 7 (n = 76)
24 Next steps To do: The current code is already fairly well-tested and mostly stable. However, there are some important items on the task list. Extend/smooth package vignette. Add xtrafo/ytrafo to ctree() reimplementation. Switch mob() to new party class. Create new subclass modelparty that facilitates handling of formulas, terms, model frames, etc. for model-based trees.
25 Next steps Model-based recursive partitioning: Trees with parametric models in each node (e.g., based on least squares or maximum likelihood). Splitting based on parameter instability tests. Illustration: Logistic regression (or logit model), assessing differences in the effect of preferential treatment for women or children. R> library("party") R> mb <- mob(survived ~ Treatment Age + Gender + Class, + data = ttnc, model = glinearmodel, family = binomial(), + control = mob_control(alpha =.)) Odds ratio of survival given treatment differs across subsets (slope), as does the survival probability of male adults (intercept). R> coef(mb) (Intercept) TreatmentPreferential\n(Female Child)
26 Next steps Class p <. 3rd {st, 2nd, Crew} 3 Class p <. 2nd {st, Crew} Node 2 (n = 76) Node 4 (n = 285) Node 5 (n = 2) Normal (Male&Adult) Preferential (Female Child) Normal (Male&Adult) Preferential (Female Child) Normal (Male&Adult) Preferential (Female Child)
27 Computational details Software: All examples have been produced with R 2.4. and packages partykit.-2, rpart 3.-5, RWeka -9, evtree.-, party All packages are freely available under the GPL from Data: The contingency table Titanic is transformed to the data frame ttnc using the following code. R> data("titanic", package = "datasets") R> ttnc <- as.data.frame(titanic) R> ttnc <- ttnc[rep(:nrow(ttnc), ttnc$freq), :4] R> names(ttnc)[2] <- "Gender" R> ttnc <- transform(ttnc, Treatment = factor( + Gender == "Female" Age == "Child", + levels = c(false, TRUE), + labels = c("normal\n(male&adult)", "Preferential\n(Female Child)") + )) The last transformation also adds the treatment variable for the model-based recursive partitioning analysis.
28 References Hothorn T, Zeileis A (2). partykit: A Toolkit for Recursive Partytioning. R package vignette version.-2. URL Hothorn T, Hornik K, Zeileis A (26). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 5(3), doi:.98/6866x33933 Grubinger T, Zeileis A, Pfeiffer KP (2). evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R. Working Paper 2-2, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universität Innsbruck. URL Zeileis A, Hothorn T, Hornik K (28). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 7(2), doi:.98/6868x3933
Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups
Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and
Open-Source Machine Learning: R Meets Weka
Open-Source Machine Learning: R Meets Weka Kurt Hornik, Christian Buchta, Michael Schauerhuber, David Meyer, Achim Zeileis http://statmath.wu-wien.ac.at/ zeileis/ Weka? Weka is not only a flightless endemic
Benchmarking Open-Source Tree Learners in R/RWeka
Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
R: A Free Software Project in Statistical Computing
R: A Free Software Project in Statistical Computing Achim Zeileis http://statmath.wu-wien.ac.at/ zeileis/ [email protected] Overview A short introduction to some letters of interest R, S, Z Statistics
Predictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
Data Science with R Ensemble of Decision Trees
Data Science with R Ensemble of Decision Trees [email protected] 3rd August 2014 Visit http://handsondatascience.com/ for more Chapters. The concept of building multiple decision trees to produce
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Open-Source Machine Learning: R Meets Weka
Open-Source Machine Learning: R Meets Weka Kurt Hornik Christian Buchta Achim Zeileis Weka? Weka is not only a flightless endemic bird of New Zealand (Gallirallus australis, picture from Wekapedia) but
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Data mining on the EPIRARE survey data
Deliverable D1.5 Data mining on the EPIRARE survey data A. Coi 1, M. Santoro 2, M. Lipucci 2, A.M. Bianucci 1, F. Bianchi 2 1 Department of Pharmacy, Unit of Research of Bioinformatic and Computational
Package missforest. February 20, 2015
Type Package Package missforest February 20, 2015 Title Nonparametric Missing Value Imputation using Random Forest Version 1.4 Date 2013-12-31 Author Daniel J. Stekhoven Maintainer
Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
Model-based Recursive Partitioning
Model-based Recursive Partitioning Achim Zeileis Wirtschaftsuniversität Wien Torsten Hothorn Ludwig-Maximilians-Universität München Kurt Hornik Wirtschaftsuniversität Wien Abstract Recursive partitioning
Guide to Credit Scoring in R
Credit Scoring in R 1 of 45 Guide to Credit Scoring in R By DS ([email protected]) (Interdisciplinary Independent Scholar with 9+ years experience in risk management) Summary To date Sept 23 2009, as Ross
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Package hybridensemble
Type Package Package hybridensemble May 30, 2015 Title Build, Deploy and Evaluate Hybrid Ensembles Version 1.0.0 Date 2015-05-26 Imports randomforest, kernelfactory, ada, rpart, ROCR, nnet, e1071, NMOF,
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
REPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
Implementing a Class of Structural Change Tests: An Econometric Computing Approach
Implementing a Class of Structural Change Tests: An Econometric Computing Approach Achim Zeileis http://www.ci.tuwien.ac.at/~zeileis/ Contents Why should we want to do tests for structural change, econometric
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Package bigrf. February 19, 2015
Version 0.1-11 Date 2014-05-16 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
A Methodology for Predictive Failure Detection in Semiconductor Fabrication
A Methodology for Predictive Failure Detection in Semiconductor Fabrication Peter Scheibelhofer (TU Graz) Dietmar Gleispach, Günter Hayderer (austriamicrosystems AG) 09-09-2011 Peter Scheibelhofer (TU
Big Data Decision Trees with R
REVOLUTION ANALYTICS WHITE PAPER Big Data Decision Trees with R By Richard Calaway, Lee Edlefsen, and Lixin Gong Fast, Scalable, Distributable Decision Trees Revolution Analytics RevoScaleR package provides
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
Package acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : [email protected] 1 Aims To introduce the basic concepts of data mining
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
Visualizing Categorical Data in ViSta
Visualizing Categorical Data in ViSta Pedro M. Valero-Mora Universitat de València-EG Email: [email protected] Forrest W. Young University of North Carolina Email: [email protected] Michael Friendly York University
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES
TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES Research in Progress Shmueli, Galit, Indian School of Business, Hyderabad, India, [email protected] Yahav, Inbal, Bar
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
Consensus Based Ensemble Model for Spam Detection
Consensus Based Ensemble Model for Spam Detection Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Information Security Submitted By Paritosh
KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE
POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE Most Effective Modeling Application Designed to Address Business Challenges Applying a predictive strategy to reach a desired business
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1. 15.7 Analytics and Data Mining 1
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1 15.7 Analytics and Data Mining 15.7 Analytics and Data Mining 1 Section 1.5 noted that advances in computing processing during the past 40 years have
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Data mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest
Nathalie Villa-Vialaneix Année 2014/2015 M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest This worksheet s aim is to learn how
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
Applied Multivariate Analysis - Big data analytics
Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix [email protected] http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
Fortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 3: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2014
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes
Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes Achim Zeileis http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation Exchange rate regimes Exchange rate regression What is the
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
How To Find Seasonal Differences Between The Two Programs Of The Labor Market
Using Data Mining to Explore Seasonal Differences Between the U.S. Current Employment Statistics Survey and the Quarterly Census of Employment and Wages October 2009 G. Erkens BLS G. Erkens, Bureau of
Data Science with R. Introducing Data Mining with Rattle and R. [email protected]
http: // togaware. com Copyright 2013, [email protected] 1/35 Data Science with R Introducing Data Mining with Rattle and R [email protected] Senior Director and Chief Data Miner,
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC
Technical Paper (Last Revised On: May 6, 2013) Big Data Analytics Benchmarking SAS, R, and Mahout Allison J. Ames, Ralph Abbey, Wayne Thompson SAS Institute Inc., Cary, NC Accurate and Simple Analysis
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
Monitoring Structural Change in Dynamic Econometric Models
Monitoring Structural Change in Dynamic Econometric Models Achim Zeileis Friedrich Leisch Christian Kleiber Kurt Hornik http://www.ci.tuwien.ac.at/~zeileis/ Contents Model frame Generalized fluctuation
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
Modelling and added value
Modelling and added value Course: Statistical Evaluation of Diagnostic and Predictive Models Thomas Alexander Gerds (University of Copenhagen) Summer School, Barcelona, June 30, 2015 1 / 53 Multiple regression
Analysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Package MDM. February 19, 2015
Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package
Data Mining Introduction
Data Mining Introduction Bob Stine Dept of Statistics, School University of Pennsylvania www-stat.wharton.upenn.edu/~stine What is data mining? An insult? Predictive modeling Large, wide data sets, often
Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
Contents WEKA Microsoft SQL Database
WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Applied Predictive Modeling in R R. user! 2014
Applied Predictive Modeling in R R user! 204 Max Kuhn, Ph.D Pfizer Global R&D Groton, CT [email protected] Outline Conventions in R Data Splitting and Estimating Performance Data Pre-Processing Over
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Chapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts
How To Understand How Weka Works
More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course
Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering
IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
user! 2013 Max Kuhn, Ph.D Pfizer Global R&D Groton, CT [email protected]
Predictive Modeling with R and the caret Package user! 23 Max Kuhn, Ph.D Pfizer Global R&D Groton, CT [email protected] Outline Conventions in R Data Splitting and Estimating Performance Data Pre-Processing
