Feature Selection with Monte-Carlo Tree Search



Similar documents
Supervised Feature Selection & Unsupervised Dimensionality Reduction

Monte-Carlo Methods. Timo Nolle

Data Mining - Evaluation of Classifiers

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

D-optimal plans in observational studies

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Maschinelles Lernen mit MATLAB

Better credit models benefit us all

Monte-Carlo Tree Search (MCTS) for Computer Go

Classification of Bad Accounts in Credit Card Industry

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Result Analysis of the NIPS 2003 Feature Selection Challenge

Reinforcement Learning

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

MHI3000 Big Data Analytics for Health Care Final Project Report

Reinforcement Learning

Characterizing Task Usage Shapes in Google s Compute Clusters

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Towards better accuracy for Spam predictions

LDIF - Linked Data Integration Framework

Social Media Mining. Data Mining Essentials

Bayesian networks - Time-series models - Apache Spark & Scala

Distributed forests for MapReduce-based machine learning

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

Anomaly detection. Problem motivation. Machine Learning

Making Sense of the Mayhem: Machine Learning and March Madness

The Artificial Prediction Market

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Decompose Error Rate into components, some of which can be measured on unlabeled data

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

Data Mining Practical Machine Learning Tools and Techniques

Statistics Graduate Courses

Classification and Prediction

Data Mining Techniques for Prognosis in Pancreatic Cancer

An Overview of Knowledge Discovery Database and Data mining Techniques

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Model Combination. 24 Novembre 2009

Employer Health Insurance Premium Prediction Elliott Lui

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

The Scientific Data Mining Process

Azure Machine Learning, SQL Data Mining and R

ClusterOSS: a new undersampling method for imbalanced learning

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Machine Learning using MapReduce

CHAPTER FIVE RESULT ANALYSIS

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

diagnosis through Random

Trading regret rate for computational efficiency in online learning with limited feedback

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Data Preprocessing. Week 2

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Data, Measurements, Features

Creating a NL Texas Hold em Bot

Paper DV KEYWORDS: SAS, R, Statistics, Data visualization, Monte Carlo simulation, Pseudo- random numbers

Stock Market Forecasting Using Machine Learning Algorithms

Chapter 7. Feature Selection. 7.1 Introduction

Data quality in Accounting Information Systems

The UCT Algorithm Applied to Games with Imperfect Information

Data Mining: An Overview. David Madigan

Predict Influencers in the Social Network

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Bootstrapping Big Data

Monday Morning Data Mining

Beating the MLB Moneyline

How To Bet On An Nfl Football Game With A Machine Learning Program

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Performance Metrics for Graph Mining Tasks

OPTIMIZATION AND FORECASTING WITH FINANCIAL TIME SERIES

Semi-Supervised Support Vector Machines and Application to Spam Filtering

8. Machine Learning Applied Artificial Intelligence

Intelligent Heuristic Construction with Active Learning

An Introduction to Data Mining

A Dynamical Systems Approach for Static Evaluation in Go

Pricing and calibration in local volatility models via fast quantization

Mimicking human fake review detection on Trustpilot

Online Planning for Ad Hoc Autonomous Agent Teams

Predicting borrowers chance of defaulting on credit loans

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics

CS Introduction to Data Mining Instructor: Abdullah Mueen

Random Forest Based Imbalanced Data Cleaning and Classification

Nonparametric statistics and model selection

Predicting outcome of soccer matches using machine learning

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Data Mining for Knowledge Management. Classification

Data Mining: A Preprocessing Engine

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

Transcription:

Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1

Agenda 1 Feature Selection 2 Feature Selection as a Markov Decision Process 3 Feature UCT Selection 4 Experimental Validation 5 Summary and Outlook 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 2

Motivation Less data Reduced generalization error Better understanding less to store and collect faster to process less noise (less irrelevant features) simpler hypothesis spaces (less redundant features) easier to understand easier to visualize 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 3

Supervised Approaches Filter Wrapper Embedded independently rank features with score function, select top n no correlations or redundancy explore superset of feature, measure generalization error of all subsets whole combinatorial optimization problem combine feature selection and learning no correlations or redundancy exploration vs. exploitation dilemma 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 5

FS as a Markov Decision Process f 1 f 2 f 3 f 1 f 2 f 3 f f 3 f 1 f 3 f 1 2 f 2 f 1,f 2 f 1,f 3 f 2,f 3 f 3 f 2 f 1 f 1,f 2, f 3 Goal: find optimal policy 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 6

Finding an Optimal Policy Following Bellman s optimality principle optimal, but intractable (state space exponential in #features) Why not cast problem into 1-player game and use MCTS with UCT? 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 7

Feature Selection as a 1-Player Game Formalize FS as Markov Decision Process MDP can be solved with Reinforcement Learning Cast problem as 1-player game Use MCTS with UCT! 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 8

Restrict number of arms UCB1-tuned instead of UCB1 limit exploration term by including empirical variance of rewards Continuous heuristic set c e to very small value Discrete heuristic consider only T F b children (b < 1) progressive widening no. of considered children no. of visits 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 9

Rapid Action Value Estimation (RAVE) AMAF heuristic incorporate additional knowledge gained within search associate RAVE score to each size of feature set: µ l-rave g-rave 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 10

Selection of New Nodes Discrete heuristic select top-ranked feature after RAVE whenever integer part of T F b is incremented Continuous heuristic replace UCB1-tuned formula by 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 12

Instant Reward Function k-nearest neighbor (k-nn) z 1 z 2 z 3 z 2 z 1 z n Area under the ROC curve (AUC) * aka Mann Whitney Wilcoxon sum of ranks test AUC * Note that 0 really is the minimum as we do not simply predict a class which we could change. Instead we want to find a feature set with minimum generalization error 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 13

Feature UCT Selection (FUSE) 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 15

FUSE and FUSE R Output Search tree (most visited path) RAVE score Algorithm FUSE RAVE score guides FUSE exploration FUSE R FUSE helps build RAVE score, indicating feature relevance FS approach Wrapper Filter 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 16

Experimental Validation Data set Samples Features Properties Madelon 2,600 500 XOR-like Arcene 200 10,000* disjunction of overlapping sub concepts Colon 62 2,000 easy * only top 2000 are considered for FUSE and CFS, ranked after their ANOVA score Baseline approaches Correlation-based Feature Selection (CFS) RandomForest-based Gini score (Gini-RF) * Lasso RAND R average RAVE score built from random 20-feature subsets 200,000 iterations Gaussian SVM as end learner (5-fold CV optimized hyper-parameters) * with 1,000 trees 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 17

Results Madelon Arcene FUSE algorithms best of both worlds detect feature interdependencies (like Gini-RF, better with few features) filter out redundant features (like CFS, better with many features) 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 20

Results (contd.) all equal on colon FUSE vs. FUSE R : FUSE does not control depth of search tree efficiently FUSE R better discrete vs. continuous: same performance with optimal parameters discrete more robust due to less parameters Performance on Madelon dataset FUSE R converges more slowly than FUSE but improves after 10,000 iterations FUSE R is faster by an order of magnitude than RAND R runtime 45 minutes (Arcene: 5min, Colon: 4min) * * on Intel Core 2 2.6GHz CPU with 2GB memory, only considering FS on the training set 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 21

Summary and Outlook Contributions formalized FS task as a Reinforcement Learning problem proposed efficient approximation for optimal policy used UCT to define FUSE algorithm according to benchmark state of the art, but costly Future directions extend to multi-class problems extend to mixed (continuous and discrete) search spaces combine FUSE with other end learners reconsider instant reward extend to feature construction 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 22

Critical Evaluation original approach for FS promising validation results However many degrees of freedom interdependencies not fully understood problem is simply shifted inherits problems from k-nn when working with high dimensionality skewed class distributions extensions probably further increase computational costs RF, Lasso as wrappers is fair for comparison, but unlike (usually) used in practice 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 23

Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 Thank you! Questions??????? See next slide for sources 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 24

Sources Auer et. al.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning. 2002. Gaudel, Romaric; Sebag, Michèle: Feature Selection as a One-Player Game. In: Proceedings of the 27th International Conference on Machine Learning. 2010. Gelly, Sylvain; Silver, David: Combining Online and Offline Knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning. 2007. Guyon, Isabelle; Elisseeff, André: An Introduction to Feature Extraction. In: Guyon, Isabelle et. al. (editors): Feature Extraction. 2006. Guyon, Isabelle; Elisseeff, André: An Introduction to Variable and Feature Selection. In: Journal of Machine Learning Research. 2003. Helmbold, David P.; Parker-Wood, Aleatha: All-Moves-As-First Heuristics in Monte-Carlo Go. In: Proceedings of the 2009 International Conference on Artificial Intelligence. 2009. Kocsis, Levente et. al.: Bandit based Monte-Carlo Planning. In: Proceedings of the 17th European Conference on Machine Learning. 2006. Sebag, Michele: Monte Carlo Tree Search: From Playing Go to Feature Selection. Presentation. 2010. http://theconnectivist-img.s3.amazonaws.com/wp-content/uploads/2013/10/airplane- 1300x724.jpg last accessed: 17.01.2015 10:00pm. 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 25