Active Learning for Interactive Visualization



Similar documents
L10: Linear discriminants analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Data Visualization by Pairwise Distortion Minimization

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

What is Candidate Sampling

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Support Vector Machines

Performance Analysis and Coding Strategy of ECOC SVMs

Logistic Regression. Steve Kroon

Dimensionality Reduction for Data Visualization

An Interest-Oriented Network Evolution Mechanism for Online Communities

Forecasting the Direction and Strength of Stock Market Movement

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

Learning from Multiple Outlooks

Single and multiple stage classifiers implementing logistic discrimination

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

An Alternative Way to Measure Private Equity Performance

A Fast Incremental Spectral Clustering for Large Data Sets

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

SVM Tutorial: Classification, Regression, and Ranking

The OC Curve of Attribute Acceptance Plans

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Performance Management and Evaluation Research to University Students

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

A Probabilistic Theory of Coherence

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Improved SVM in Cloud Computing Information Mining


Fast Fuzzy Clustering of Web Page Collections

Gender Classification for Real-Time Audience Analysis System

ERP Software Selection Using The Rough Set And TPOSIS Methods

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Mining Multiple Large Data Sources

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Online Inference of Topics with Latent Dirichlet Allocation

Lecture 5,6 Linear Methods for Classification. Summary

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

BERNSTEIN POLYNOMIALS

1 Example 1: Axis-aligned rectangles

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Realistic Image Synthesis

DEFINING %COMPLETE IN MICROSOFT PROJECT

A Multi-mode Image Tracking System Based on Distributed Fusion

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Project Networks With Mixed-Time Constraints

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Searching for Interacting Features for Spam Filtering

The Application of Fractional Brownian Motion in Option Pricing

Review of Hierarchical Models for Data Clustering and Visualization

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Credit Limit Optimization (CLO) for Credit Cards

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Gaining Insights to the Tea Industry of Sri Lanka using Data Mining

The Greedy Method. Introduction. 0/1 Knapsack Problem

Visualization of high-dimensional data with relational perspective map

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

A Simple Approach to Clustering in Excel

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Design and Development of a Security Evaluation Platform Based on International Standards

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Calculating the high frequency transmission line parameters of power cables

Sketching Sampled Data Streams

Eye Center Localization on a Facial Image Based on Multi-Block Local Binary Patterns

J. Parallel Distrib. Comput.

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

Bayesian Cluster Ensembles

Multi-View Regression via Canonical Correlation Analysis

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

Joint Optimization of Bid and Budget Allocation in Sponsored Search

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style

Transcription:

Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However, a vsualzaton that s automatcally generated mght be dfferent to how a user wants to arrange the objects n vsualzaton space. By allowng users to relocate objects n the embeddng space of the vsualzaton, they can adjust the vsualzaton to ther preference. We propose an actve learnng framework for nteractve vsualzaton whch selects objects for the user to relocate so that they can obtan ther desred vsualzaton by re-locatng as few as possble. The framework s based on an nformaton theoretc crteron, whch favors objects that reduce the of the vsualzaton. We present a concrete applcaton of the framework to the Laplacan egenmap vsualzaton method. We demonstrate expermentally that the framework yelds the desred vsualzaton wth fewer user nteractons than exstng methods. 1 Introducton Wth the emergence of large and hgh dmensonal data sets, the task of data vsualzaton has become ncreasngly mportant n both machne learnng and data mnng. Vsualzaton s helpful for analyzng and explorng large-scale complex data; t allows one to combne human abltes, such as vsual percepton, creatvty and general knowledge, wth the abltes of machnes, that s large memores and fast calculaton, to the task of understandng data (Kem et al., 00). One applcaton of vsualzaton s n nformaton retreval, where users can search objects ntutvely n the vsualzaton space (Venna et al., 010). Appearng n Proceedngs of the 16 th Internatonal Conference on Artfcal Intellgence and Statstcs (AISTATS) 013, Scottsdale, AZ, USA. Volume 31 of JMLR: W&CP 31. Copyrght 013 by the authors. A large number of vsualzaton methods have been, such as mult-dmensonal scalng (Torgerson, 1958), Isomap (Tenenbaum et al., 000), locally lnear embeddng (Rowes and Saul, 000), stochastc neghbor embeddng (Hnton and Rowes, 00), and Laplacan egenmap (Belkn and Nyog, 003). These algorthms map objects from a hgh dmensonal observaton space to a low dmensonal vsualzaton space. They fnd an embeddng such that objects n the vsualzaton space preserve ther parwse dstances from the hgh-dmensonal observaton space. Therefore smlar objects are automatcally located closer together n the vsualzaton space. However, a vsualzaton that s generated automatcally n such a manner may dffer from the user s desred vsualzaton who may want to locate objects wth a partcular meanng to partcular areas of vsualzaton space. For example, when vsualzng mages the user may desre clusters of mages of anmals to be located n one regon of vsualzaton space, nanmate objects n another, and sceneres n another. Alternatvely f the objects exhbt a natural orderng, such as dgts or letters, then the user may wsh to preserve ths orderng n vsualzaton space. To address ths problem, nteractve vsualzaton systems have been (Wlls, 1999; Johansson and Johansson, 009; Paulovch et al., 011; Endert et al., 011). Here, we consder nteractve systems n whch users can re-locate objects to obtan ther desred vsualzaton. When there are a large number of objects t s dffcult for users to select whch objects to relocate; f many of the moves are redundant, that s, they provde no new nformaton about the user s desred vsualzaton, then even after many queres the vsualzaton may not reflect the ntended result. The goal of ths paper s to select objects to re-locate so that the user can obtan ther desred vsualzaton by movng as few as possble. For ths purpose we propose an actve learnng framework for vsualzaton. Actve learnng (Cohn et al., 1996) s a machne learnng framework for selectng objects that mprove performance wth mnmum possble labelngs. Actve learnng methods are useful when the cost for obtanng la- 34

Actve Learnng for Interactve Vsualzaton beled data s hgh. Most actve learnng algorthms were n supervsed learnng settngs (McCallum and Ngam, 1998; Tong and Koller, 00). We develop an nformaton theoretc actve learnng crteron that selects objects to re-locate so as to reduce the of the vsualzaton the most. We present our actve vsualzaton framework wth the wdely used Laplacan egenmap method for nonlnear dmensonalty reducton and vsualzaton (Belkn and Nyog, 003). Here, we can analytcally calculate the objectve functon for selectng objects, permttng the algorthm to be used n a fast, onlne, nteractve system. Note, however, that we can use many other vsualzaton methods wthn our framework. The paper s organzed as follows: In Secton we propose an actve learnng framework for vsualzaton based on an nformaton theoretc crteron. In Secton 3 we present an mplementaton of the framework wth the Laplacan egenmap vsualzaton method. In Secton 4 we outlne related work. In Secton 5 we demonstrate the effectveness of the framework by comparng to exstng methods. Fnally, we present concludng remarks and a dscusson of future work n Secton 6. Actve Vsualzaton The task of vsualzaton s, gven a set of observatons X = {x n } N n=1, to fnd an embeddng Y = {y n } N n=1 that reveals structure n the data when vewed by the user. Here, x n R D s the feature vector of object n n the observaton space, and y n R K s the locaton of object n n the vsualzaton space. Normally the observaton dmensonalty s much hgher than the vsualzaton dmensonalty D K. The vsualzaton dmensonalty s typcally K = or K = 3; although ths constrant arses from our techncal ablty to vew objects n hgher dmensonal space, the framework s mathematcally and computatonally applcable wth any vsualzaton dmensonalty. In an actve learnng settng, the algorthm sequentally selects objects for the user to re-locate n R K, from a gven set of N objects. The ground truth locatons for the selected objects are obtaned from an oracle,.e. the user, who places the objects wthn an nteractve vsualzaton envronment. Gven the desred locaton of the selected object obtaned from user feedback, the system changes the vsualzaton of all of the objects, ncorporatng the new nformaton. Let Y s be the data that has been labeled by the user, or the set of locatons of the selected objects that are assocated wth the ground truth locatons n vsualzaton space, and Y u = Y \ Y s be the unlabeled data, or the set of locatons of the unselected objects. The nformaton theoretc approach to actve learnng selects objects that reduce the about the parameters, measured by Shannon s entropy (Cover et al., 1991; Lndley, 1956). In the context of vsualzaton, the varables of nterest are the locatons of the unlabeled data n vsualzaton space, we may thnk of these as the parameters about whch we want optmally usng actve learnng. Therefore, the objectve s to select an object from the pool of unlabeled data that maxmzes the decrease n the entropy of our dstrbuton over the locatons of the remanng unlabeled data as follows: arg max H[p(Y u\ Y s )] E p(y Y s)h[p(y u\ y, Y s )], (1) where Y u\ s unlabeled data excludng object, H[p( )] represents dfferental entropy of the probablty dstrbuton p, and E p( ) represents expectaton under dstrbuton p. For notatonal smplcty we omt the set of observatons X from the condtonng of all of the probablty dstrbutons, e.g. p(y u\ Y s ) should read p(y u\ Y s, X). The frst term, H[p(Y u\ Y s )] = p(y u\ Y s ) log p(y u\ Y s )dy u\, () s the entropy of the dstrbuton over the unlabeled data gven the labeled data; that s, t represents the system s n the locaton of the unlabeled data n the vsualzaton space. The second term, E p(y Y s )H[p(Y u\ y, Y s )] = p(y Y s ) p(y u\ y, Y s ) log p(y u\ y, Y s )dy u\ dy, (3) s the entropy of the dstrbuton over the unlabeled objects after obtanng the true locaton of object, where we take ts expectaton over the locaton of object to be quered, y, because we do not know yet ts true locaton. Further dscusson for the exact form of (1) s gven n Secton 4. We can gan useful ntuton about (1) by rearrangng the objectve functon as follows: arg max H[p(y Y s )] E p(yu\ Y s)h[p(y Y u\, Y s )], (4) where we use an nsght that the objectve n (1) s equvalent to the mutual nformaton between the unlabeled data and the locaton of the selected object, I(Y u\, y ), gven the labeled data. The frst term n 343

Tomoharu Iwata, Nel Houlsby, Zoubn Ghahraman (4) favors objects about whch we have hgh ; ths term alone corresponds to a classc objectve known as samplng or maxmum entropy samplng (Sebastan and Wynn, 000). The second term has a separate role; t penalzes objects that have hgh entropy f all of the remanng objects Y u\ were observed. Ths means that f the unobserved objects were seen, we would be confdent n the locaton of object, alternatvely put, the term favors objects that are hghly correlated wth the remanng unobserved objects. In summary (4) seeks objects about whose locaton n vsualzaton space we are uncertan, but also correlate wth the remanng unlabeled objects, and hence ther label provdes nformaton about the other unlabeled ponts locatons also. If we know that we wll query the user wth a number of objects J, then t s optmal to maxmze our queryng strategy over the entre set J. When we select multple objects to place the objectve functon becomes arg max J H[p(Y u\j Y s )] E p(yj Y s )H[p(Y u\j Y J, Y s )]. (5) However, for our actve learnng crteron, and sequental decson makng tasks n general, ths problem s NP-hard. As s common n actve learnng we take a myopc, or greedy approach, performng optmzaton of (4) assumng that each query s the last. However, the mutual nformaton functon s submodular, and the myopc strategy s known to perform nearoptmally for submodular functons (Guestrn et al., 005; Dasgupta, 005; Golovn and Krause, 010). Intutvely, ths means that t satsfes the property of dmnshng returns ; that s the gan n nformaton when addng new labeled data pont to a smaller pool of observatons Ys small s greater than, or equal to, the gan n nformaton when addng the data pont to a larger pool Ys large. 3 Laplacan egenmap based actve vsualzaton We present the procedures of our actve learnng framework for use wth the Laplacan egenmap vsualzaton method (Belkn and Nyog, 003). The Laplacan egenmap s wdely used for dmensonalty reducton, and t benefts from havng a crteron for vsualzaton whch can be globally optmzed. In ths settng we can analytcally calculate the objectve functon for selectng objects to re-locate wth our actve learnng framework. 3.1 Laplacan egenmap We outlne frst the orgnal Laplacan egenmap algorthm. The Laplacan egenmap s a nonlnear dmensonalty reducton method that has localtypreservng propertes based on spectral technques. Frstly, a k-nearest neghbor graph s constructed by usng observatons X based on the Eucldean dstance. One may also use a ɛ-neghborhood graph nstead of k-nearest neghbor graph. Secondly, we set the weght between objects and j so that w j = 1 f they are connected, w j = 0 otherwse. Fnally, embeddng locatons Y that mnmze the followng functon are obtaned by solvng a generalzed egenvalue problem, arg mn Y tr(y LY), s.t.ydy = I, (6) where D s a dagonal matrx wth D = j w j, L = D W s the Laplacan matrx of W, and W s an N N matrx whose element s w j. 3. Probablstc nterpretaton In order to employ our actve learnng framework we need a probablstc nterpretaton of the Laplacan egenmap from whch we can calculate the relevant entropes and expectatons. We make the Laplacan postve defnte by addng a small dagonal matrx Λ = L+αI. When the nose level α s small, we can approxmate the mnmzaton of the objectve functon for the Laplacan egenmap, tr(y LY), by maxmzng the lkelhood of the followng Gaussan dstrbuton: p(y) = N (0, Λ 1 ), (7) where N (µ, Λ 1 ) represents a Gaussan wth mean µ and precson, or nverse covarance, Λ. The relaton between the graph Laplacan and Gaussan Markov felds s further dscussed n (Zhu et al., 003b). 3.3 Actve vsualzaton We present the procedures of our actve learnng framework based on objectve functon (4) wth the probablstc nterpretaton of the Laplacan egenmap. Wthout loss of generalty we sort the locaton vector Y nto labeled then unlabeled data. We then partton the precson matrx Λ nto four parts correspondng to the labeled and unlabeled data as follows: ( ) Λss Λ Λ = us. (8) Λ su Λ uu 344

Actve Learnng for Interactve Vsualzaton By usng the fact that p(y Y s ) = N ( (Λ 1 uu Λ us Y s ), (Λ 1 uu ) ), (9) and that the entropy of a Gaussan wth dmensonalty K s H[N (µ, Λ 1 )] = 1 log Λ + K (log(π) + 1), (10) the frst term of (4) s obtaned by H[p(y Y s )] = log (Λ 1 uu ) + K (log(π) + 1). (11) Smlarly, the second term s obtaned by E p(yu\ Y s )H[p(y Y u\, Y s )] = log Λ + K (log(π) + 1). (1) Therefore, (4) based on the Laplacan egenmap becomes: arg max log (Λ 1 uu ) + log Λ. (13) Snce we can calculate analytcally the entropy, the condtonal dstrbuton, and the margnal dstrbuton of a Gaussan, we can calculate the objectve functon for actve learnng wth the Laplacan egenmap analytcally. We note that locally lnear embeddng (LLE) (Rowes and Saul, 000) can also been nterpreted by a Gaussan model (Verbeek and Vlasss, 006). Therefore, we may use exactly the same expressons as for the Laplacan egenmap f we were to use LLE for vsualzaton. When we use vsualzaton methods that are not modeled by a Gaussan, we can use smlar procedures by explotng the Laplace approxmaton. After the selected object s re-located by the user we need to re-calculate the vsualzaton gven the new labeled datapont, and show t to the user before selectng the next pont to label. For labeled objects, we use the ground truth locaton gven by the user. For the locatons for unlabeled objects, we estmate ther locatons as follows: Ŷ u = Λ 1 uu Λ us Y s, (14) because the dstrbuton of unlabeled data condtoned on the labeled data s gven by p(y u Y s ) = N (Ŷu, Λ 1 uu ), (15) from the probablstc nterpretaton of the Laplacan egenmap. The estmated locatons Ŷu can be seen as a sem-supervsed Laplacan egenmap vsualzaton result, where we have label nformaton for some objects. 3.4 Learnng Hyperparameters We can estmate hyperparameters, such as the number of neghbors and the nose level, α, by maxmzng the lkelhood p(y s ) = N (0, (Λ 1 ) ss ) gven the labeled data (Verbeek and Vlasss, 006). When the hyperparameters are fxed, nspecton of (13) reveals that the optmal object does not depend on the locaton of the supervsed data n vsualzaton space, just whch ones have been selected. Ths s not a general property of our actve framework (1). A consequence of ths s that we can pre-compute the optmal (myopc) set of objects to be presented to the user, before the user has re-located any objects. However, when we update the hyperparameters the mappng changes, the precson matrces Λ become mplct functons of the supervsed data, and so we must wat for the user to move each object before computng the optmal new object to present. 4 Related Work Let us frst consder our objectve n ts reformulated form (4). Suppose we were to consder only the frst term, the objectve would become arg max H[p(y Y s )], (16) that s we would select the object whose predctve dstrbuton has hghest entropy, or. Ths corresponds to one of the most ubqutous strateges n actve learnng, samplng (Lews and Gale, 1994; Sebastan and Wynn, 000; Settles, 009), whch selects the object for whch one s least certan how to label. When the measure used s Shannon s entropy, ths corresponds exactly to (16). Ths strategy s used n (Verbeek and Vlasss, 006) n the context of the locally lnear embeddng for semsupervsed regresson. In the context of vsualzaton ths strategy consders only the n object to be selected. However, our strategy (1) consders the of all of the unlabeled objects; the second term n (4) favors objects that assst n determnng the locaton of other unlabeled objects. We demonstrate expermentally the advantage of our framework over the samplng n Secton 5. Now let us consder the ntal formulaton of our objectve (1). It may seem sensble to mnmze the absolute value of the entropy of the unseen data, that s to consder only the second term n (1), arg mn E p(y Y s )H[p(Y u\ y, Y s )], (17) rather than the expected decrease n predctve entropy. However, ths crteron turns out to be equva- 345

Tomoharu Iwata, Nel Houlsby, Zoubn Ghahraman lent to samplng (16) because E p(y Y s )H[p(Y u\ y, Y s )] = H[p(Y u Y s )] H[p(y Y s )], (18) and the frst term of the left hand sde H[p(Y u Y s )] does not depend on. Many nformaton theoretc algorthms for actve learnng were n the context of supervsed learnng, where the objectve functon s equal to the change n entropy of model parameters after recevng the label (Lndley, 1956; MacKay, 199; Guestrn et al., 005; Houlsby et al., 011). The crteron n supervsed learnng s gven by arg max H[p(θ D)] E p(t x,d)h[p(θ t, x, D)], (19) where θ s a set of parameters, D s a tranng data set, x s an nput varable to be labeled, and t s ts target varable. If we were to nterpret the unknown locatons n vsualzaton space Y u\ as our parameters of nterest θ, the pont to be labeled y as the target varable t, and the labeled ponts Y s as the tranng data D, then ths classcal nformaton theoretc approach for supervsed learnng (19) becomes equvalent to our objectve functon (1). Fnally, an alternatve approach to actve learnng s to use decson theory n whch one selects objects that reduce the expected loss at test-tme (Roy and McCallum, 001; Zhu et al., 003a), that s, n a Bayesan framework to mnmze the Bayes posteror rsk. In the context of actve vsualzaton, the decson task at hand s to select the locaton of the unlabeled objects. If we were to choose the log-loss on the probablty of placng Y u at a partcular locaton as our loss functon, then the optmal Bayesan decson at test-tme (vsualzaton-tme) s to place the objects at the MAP estmate of ther locatons, as we do n our framework (14). Ths corresponds to a Bayes rsk equal to the expected entropy over unlabeled data. If one seeks to maxmze the decrease n Bayes rsk then we arrve agan at our objectve (1). It s nterestng that n our context of actve vsualzaton (1) has both an nformaton-, and decson-theoretc nterpretaton, where n general these approaches result n dfferent algorthms. 5 Experments 5.1 Settng We evaluated our actve learnng framework on one synthetc data set, and fve real data sets: Wne, Irs, Vowel, Glass and Mnst, whch are obtaned from LIB- SVM mult-class data sets (Chang and Ln, 011). The synthetc data set, Synth, was generated as follows: 1) for a ground truth vsualzaton we located objects n a two-dmensonal grd and added small Gaussan nose as shown n Fgure (a), and ) we generated observaton feature vectors wth a Gaussan process latent varable model (Lawrence, 004), usng the ground truth as the latent varables. For the fve real data sets we generated the ground truth vsualzaton by usng class nformaton. In all of the real data sets, each object has a class label; we located objects around a crcle, ordered accordng to ther class, and added Gaussan nose. The set up s depcted n Fgure (b) and (c), the color of each node represents ts class. We summarze the statstcs of data sets used for evaluaton n Table 1. We compared our actve vsualzaton framework wth samplng for actve vsualzaton, as descrbed n (Verbeek and Vlasss, 006), and a samplng baselne method. We used the Laplacan egenmap for the vsualzaton method. 5. Results The performance metrc used was the average mean squared error between the estmated and true locatons. To obtan statstalcally meanngful results, an average was taken over 1000 expermental runs wth each data set, each usng a dfferent ground truth vsualzaton. The nose parameter was set to α = 10 3. We selected the optmal number of neghbors k from the set {,, 0} usng maxmum lkelhood as descrbed n Secton 3.4; k was update after every batch of fve labeled objects was obtaned. Fgure 1 shows the results. For all of the methods, as the number of labeled data ponts ncreases, the error decreases. However, n most cases, our method decreases the error faster than and samplng. Ths ndcates the mportance of consderng the relatonshp between the pont to be labeled and the remanng unlabeled ponts that s represented by the second term n (4), whch s not consdered n samplng. Table shows the statstcal sgnfcance of the results when the number of neghbors k s updated based on the maxmum lkelhood estmaton (a), and t s fxed at k = 3 (b). In both of the cases, the method acheved the lowest error for all data sets. And except for Irs and Vowel data sets when k = 3, the method was sgnfcantly better than and samplng. Fgure 3 shows the vsualzaton attaned usng the Laplacan egenmap n an unsupervsed settng wth Synth, Wne and Mnst data sets. We used three neghbors for constructng the neghbor graph. The 346

Actve Learnng for Interactve Vsualzaton Table 1: The statstcs of data sets used for evaluaton. Synth Wne Irs Vowel Glass Mnst number of objects N 400 178 150 58 14 1000 observed dmensonalty D 100 13 4 10 9 784 number of classes C - 3 3 11 7 10 3.5 3 3.5 3 1.8.5 1.5 1 0.5.5 1.5 1 0.6 0.4 5 10 15 0 5 30 5 10 15 0 5 30 5 10 15 0 5 30 (a) Synth (b) Wne (c) Irs 1.6 1.4 1. 1 0.8 1.9 1.8 1.7 1.6 1.5 1.4 1.3 3. 3.8.6.4. 3.5 1. 1.8 1.5 1.1 1.6 5 10 15 0 5 30 5 10 15 0 5 30 5 10 15 0 5 30 (d) Vowel (e) Glass (e) Mnst Fgure 1: Average between the estmated locatons and the true locatons for dfferent numbers of labeled objects acheved by the method, samplng, and samplng. We also show error bars depctng the standard devaton of the method, but omt them from the other methods for vsual clarty (see Table for statcal sgnfcance). Table : Average gven ten labeled objects when (a) the number of neghbors k s updated based on the maxmum lkelhood estmaton, and (b) t s fxed at k = 3. Values n bold typeface are statstcally better (at the 5% level) from those n normal typeface as ndcated by a pared t-test. (a) number of neghbors s updated Synth Wne Irs Vowel Glass Mnst Proposed method 0.395 0.649 0.464 1.833 1.77 1.585 Uncertanty samplng 0.585 0.84 0.53 1.906 1.769.073 Random samplng 0.788 0.888 0.704 1.864.050 1.946 (b) number of neghbors s fxed at k = 3 Synth Wne Irs Vowel Glass Mnst Proposed method 0.38 0.648 0.455 1.80 1.770 1.594 Uncertanty samplng 0.597 0.834 0.456 1.80.66.073 Random samplng 0.89 0.898 0.698 1.876.044 1.951 goal s to obtan a vsualzaton that s smlar to the ground truth (Fgure ) by labelng as few objects as possble. Wthout any labeled objects, the locatons dffer greatly from the ground truth as shown Fgure 3. Fgure 4 shows the vsualzaton when 0 objects are labeled by samplng (top), samplng (mddle) and our actve learnng framework (bottom). The samplng method sometmes selects 347

Tomoharu Iwata, Nel Houlsby, Zoubn Ghahraman (a) Synth (b) Wne (c) Mnst Fgure : Ground truth, or user s desred vsualzaton. In the Synth data set (a), the color smlarty of each node related to the closeness n the ground truth vsualzaton. In the Wne (b) and Mnst (c) data sets, the color of each node represents the class nformaton. (a) Synth (b) Wne (c) Mnst Fgure 3: Vsualzaton results by the unsupervsed Laplacan egenmap n Synth (a), Wne (b) and Mnst (c) data sets. The color of each node s the same as those n Fgure. objects located close together n vsualzaton space, or smlar objects, whch s not effectve because the locatons can be nferred easly by usng the locatons of those smlar objects. Uncertanty samplng tends to select objects that are located at the edges of set of objects, as shown n Fgure 4 (a) mddle. Ths s because the entropy of objects that are located as far from other objects s hgh (Ramakrshnan et al., 005; Guestrn et al., 005). On the other hand, our method selects a dverse set of objects by maxmzng the decrease of the for unlabeled data, and we can obtan vsualzatons that are more smlar to the ground truth wth fewer labels than and samplng. 6 Concluson We have an actve learnng framework for data vsualzaton based on an nformaton theoretc crteron where the object that reduces the of the unlabeled data s selected. We have confrmed expermentally that our framework can obtan the user s desred vsualzaton wth fewer labeled objects than exstng actve vsualzaton methods. Although our results have been encouragng, our framework can be further mproved upon n a number of ways. Frstly, we plan to use other vsualzaton methods wth our framework, such as the Gaussan process latent varable model (Lawrence, 004) and stochastc neghbor embeddng (Hnton and Rowes, 00). Secondly, we would lke to extend our framework to ncorporate other types of supervsed nformaton. In the current framework, a user re-locates objects to ndcate ts desred locaton. However, the user mght want to provde nformaton about the desred vsualzaton by selectng two objects that should be located close together, or far apart. 348

Actve Learnng for Interactve Vsualzaton Random samplng Uncertanty samplng Proposed framework (a) Synth (b) Wne (c) Mnst Fgure 4: Vsualzaton results wth 0 labeled objects selected by samplng (top), samplng (mddle) and the method (bottom) n Synth (a), Wne (b) and Mnst (c) data sets. The shows the locaton selected. References M. Belkn and P. Nyog. Laplacan egenmaps for dmensonalty reducton and data representaton. Neural computaton, 15(6):1373 1396, 003. C.-C. Chang and C.-J. Ln. Lbsvm: A lbrary for support vector machnes. ACM Trans. Intell. Syst. Technol., (3):7:1 7:7, 011. D. Cohn, Z. Ghahraman, and M. Jordan. Actve learnng wth statstcal models. Journal of Artfcal Intellgence Research, 4:19 145, 1996. T. Cover, J. Thomas, J. Proaks, M. Saleh, and R. Morelos-Zaragoza. Elements of Informaton Theory. edton: John Wley & Sons Inc, 1991. S. Dasgupta. Analyss of a greedy actve learnng strat- 349

Tomoharu Iwata, Nel Houlsby, Zoubn Ghahraman egy. Advances n neural nformaton processng systems, 17:337 344, 005. A. Endert, C. Han, D. Mat, L. House, S. Leman, and C. North. Observaton-level nteracton wth statstcal models for vsual analytcs. In Vsual Analytcs Scence and Technology (VAST), 011 IEEE Conference on, pages 11 130. IEEE, 011. D. Golovn and A. Krause. Adaptve submodularty: A new approach to actve learnng and stochastc optmzaton. In Proceedngs of Internatonal Conference on Learnng Theory (COLT), 010. C. Guestrn, A. Krause, and A. Sngh. Near-optmal sensor placements n gaussan processes. In Proceedngs of the nd nternatonal conference on Machne learnng, pages 65 7, 005. G. Hnton and S. Rowes. Stochastc neghbor embeddng. Advances n neural nformaton processng systems, 15:833 840, 00. N. Houlsby, F. Huszár, Z. Ghahraman, and M. Lengyel. Bayesan actve learnng for classfcaton and preference learnng. arxv preprnt arxv:111.5745, 011. S. Johansson and J. Johansson. Interactve dmensonalty reducton through user-defned combnatons of qualty metrcs. Vsualzaton and Computer Graphcs, IEEE Transactons on, 15(6):993 1000, 009. D. Kem et al. Informaton vsualzaton and vsual data mnng. IEEE transactons on Vsualzaton and Computer Graphcs, 8(1):1 8, 00. N. Lawrence. Gaussan process latent varable models for vsualsaton of hgh dmensonal data. Advances n Neural Informaton Processng Systems, 16:39 336, 004. D. Lews and W. Gale. A sequental algorthm for tranng text classfers. In Proceedngs of the 17th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages 3 1. Sprnger-Verlag New York, Inc., 1994. D. Lndley. On a measure of the nformaton provded by an experment. The Annals of Mathematcal Statstcs, pages 986 1005, 1956. D. MacKay. Informaton-based objectve functons for actve data selecton. Neural computaton, 4(4):590 604, 199. A. McCallum and K. Ngam. Employng em n poolbased actve learnng for text classfcaton. In Proceedngs of ICML-98, 15th Internatonal Conference on Machne Learnng, pages 350 358, 1998. F. V. Paulovch, D. Eler, J. Poco, C. P. Botha, R. Mnghm, and L. Nonato. Pece wse laplacanbased projecton for nteractve data exploraton and organzaton. Computer Graphcs Forum, 30:1091 1100, 011. N. Ramakrshnan, C. Baley-Kellogg, S. Tadepall, and V. Pandey. Gaussan processes for actve data mnng of spatal aggregates. In Proceedngs of the Ffth SIAM Internatonal Conference on Data Mnng, volume 119, page 47. Socety for Industral Mathematcs, 005. S. Rowes and L. Saul. Nonlnear dmensonalty reducton by locally lnear embeddng. Scence, 90 (5500):33 36, 000. N. Roy and A. McCallum. Toward optmal actve learnng through monte carlo estmaton of error reducton. In Internatonal Conference on Machne Learnng, pages 441 448, 001. P. Sebastan and H. Wynn. Maxmum entropy samplng and optmal Bayesan expermental desgn. Journal of the Royal Statstcal Socety: Seres B (Statstcal Methodology), 6(1):145 157, 000. B. Settles. Actve learnng lterature survey. Techncal report, Unversty of Wsconsn, Madson, 009. J. Tenenbaum, V. De Slva, and J. Langford. A global geometrc framework for nonlnear dmensonalty reducton. Scence, 90(5500):319 33, 000. S. Tong and D. Koller. Support vector machne actve learnng wth applcatons to text classfcaton. The Journal of Machne Learnng Research, :45 66, 00. W. Torgerson. Theory and methods of scalng. Wley, 1958. J. Venna, J. Peltonen, K. Nybo, H. Ados, and S. Kask. Informaton retreval perspectve to nonlnear dmensonalty reducton for data vsualzaton. The Journal of Machne Learnng Research, 11:451 490, 010. J. Verbeek and N. Vlasss. Gaussan felds for semsupervsed regresson and correspondence learnng. Pattern Recognton, 39(10):1864 1875, 006. G. Wlls. Ncheworksnteractve vsualzaton of very large graphs. Journal of Computatonal and Graphcal Statstcs, 8():190 1, 1999. X. Zhu, J. Lafferty, and Z. Ghahraman. Combnng actve learnng and sem-supervsed learnng usng gaussan felds and harmonc functons. In ICML 003 workshop on The Contnuum from Labeled to Unlabeled Data n Machne Learnng and Data Mnng, pages 58 65, 003a. X. Zhu, J. Lafferty, and Z. Ghahraman. Semsupervsed learnng: From Gaussan felds to Gaussan processes. Techncal report, Carnege Mellon Unversty, 003b. 350