Learning from Multiple Outlooks



Similar documents
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

What is Candidate Sampling

L10: Linear discriminants analysis

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Forecasting the Direction and Strength of Stock Market Movement

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Support Vector Machines

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

1 Example 1: Axis-aligned rectangles

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Single and multiple stage classifiers implementing logistic discrimination

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

BERNSTEIN POLYNOMIALS

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Gender Classification for Real-Time Audience Analysis System

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

This circuit than can be reduced to a planar circuit

Logistic Regression. Steve Kroon

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Active Learning for Interactive Visualization

8 Algorithm for Binary Searching in Trees

An Alternative Way to Measure Private Equity Performance

Detecting Global Motion Patterns in Complex Videos

Extending Probabilistic Dynamic Epistemic Logic

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Calculation of Sampling Weights

Performance Analysis and Coding Strategy of ECOC SVMs

Prediction of Disability Frequencies in Life Insurance

Data Visualization by Pairwise Distortion Minimization

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

Lecture 5,6 Linear Methods for Classification. Summary

A Fast Incremental Spectral Clustering for Large Data Sets

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

Project Networks With Mixed-Time Constraints

Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Traffic State Estimation in the Traffic Management Center of Berlin

Recurrence. 1 Definitions and main statements

DEFINING %COMPLETE IN MICROSOFT PROJECT

Enterprise Master Patient Index

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Financial market forecasting using a two-step kernel learning method for the support vector regression

Sketching Sampled Data Streams

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How To Calculate The Accountng Perod Of Nequalty

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

An Interest-Oriented Network Evolution Mechanism for Online Communities

Searching for Interacting Features for Spam Filtering

The Greedy Method. Introduction. 0/1 Knapsack Problem

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Realistic Image Synthesis

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

J. Parallel Distrib. Comput.

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Dynamic Pricing for Smart Grid with Reinforcement Learning

A Secure Password-Authenticated Key Agreement Using Smart Cards

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Mining Multiple Large Data Sources

Brigid Mullany, Ph.D University of North Carolina, Charlotte

ONE of the most crucial problems that every image

Loop Parallelization

where the coordinates are related to those in the old frame as follows.

An MILP model for planning of batch plants operating in a campaign-mode

Section 5.4 Annuities, Present Value, and Amortization

INSTITUT FÜR INFORMATIK

Disagreement-Based Multi-System Tracking

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Ants Can Schedule Software Projects

Politecnico di Torino. Porto Institutional Repository

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

Transcription:

Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l Abstract We propose a novel problem formulaton of learnng a sngle task when the data are provded n dfferent feature spaces. Each such space s called an outlook, and s assumed to contan both labeled and unlabeled data. Theobjectvestotakeadvantageofthedata from all the outlooks to better classfy each of the outlooks. We devse an algorthm that computes optmal affne mappngs from dfferent outlooks to a target outlook by matchng moments of the emprcal dstrbutons. We further derve a probablstc nterpretaton of the resultng algorthm and a sample complexty bound ndcatng how many samples are needed to adequately fnd the mappng. We report the results of extensve experments on actvty recognton tasks that show the value of the proposed approach n boostng performance. 1. Introducton It s often the case that a learnng task relates to multple representatons, to whch we refer as outlooks. Samples belongng to dfferent outlooks may have varyng feature representatons and dstnct dstrbutons. Furthermore, the outlooks are not related through correspondng nstances, but just by the common task. Multple outlooks may be found n many real lfe problems. For example, n actvty recognton when data from dfferent users, representng the outlooks, are collected from dfferent sensors. Note that each outlook may have a totally dfferent feature representatons, Appearng n Proceedngs of the 28 th Internatonal Conference on Machne Learnng, Bellevue, WA, USA, 2011. Copyrght 2011 by the author(s)/owner(s). whle the recognton task s common to all outlooks. The ablty to learn from these dfferent representatons s formulated by multple outlook learnng. A dfferent example for multple outlooks learnng s classfcaton of document corpora wrtten n dfferent languages. In ths case, each language represents a dfferent outlook. In these stuatons, the transformatons between the outlooks are unknown and feature or sample correspondence s not avalable. Consequently, t s rather dffcult to learn the task at hand whle explotng the nformaton n dfferent representatons. The goal of multple outlook learnng s to use the nformaton n all avalable outlooks to mprove the learnng performance of the task. We propose to approach ths learnng problem n a two step procedure. Frst, we map the emprcal dstrbutons of the dfferent outlooks one to another. After the outlooks dstrbutons are matched, a generc classfcaton algorthm can be appled usng the avalable examples from all the outlooks. Ths approach allows to transfer an outlook of whch we have lttle nformaton to another where we have more nformaton. That s, mappng the data to the same space effectvely enlarges our sample sze and may also gve us a better representaton of the problem. We show that a classfer learned n the resultng space may outperform each sngle classfer. In general, matchng multple dstrbutons, wthout feature algnment or assumng a parametrc model, s a dffcult task. Therefore, we propose to match the emprcal moments of the dstrbutons as an approxmaton. We present an algorthm for fndng one such mappng. The algorthm s objectve s to fnd the optmal affne transformatons of the outlooks spaces, whle mantanng sometry wthn classes. From a geometrc pont of vew, our algorthm s based on matchng the centers and the man drectons of the outlooks sample dstrbutons. One vrtue of the algorthm s ts smple closed form soluton.

Learnng from Multple Outlooks 2. Related work Learnng from multple outlooks s related to other setups such as doman adaptaton, multple vew learnng and manfold algnment. The man challenge n these setups, as n ours, s that the tranng and test data are drawn from dfferent dstrbutons. Doman adaptaton tres to resolve a common scenaro when some changes have been made to the test dstrbuton, whle the labelng functon of the domans remans more or less the same. Some authors portray ths stuaton by assumng a sngle hypothess may classfy both domans well (Bltzer et al., 2007), whle others assume the target s posteror probablty s equal for the domans (Shmodar, 2000; Huang et al., 2007). The latter assumpton s also referred to as the covarate shft problem. Algorthms for doman adaptaton may be roughly dvded to three categores. One approach s to rewegh the tranng nstances so they better resemble the test dstrbuton (Shmodar, 2000; Huang et al., 2007). Such algorthms are derved from the covarate shft assumpton, whch s n some sense one of the outlook mappng goals. A dfferent approach s to combne the classfers learnt n each doman (Mansour et al., 2009). Last, some works suggest to change the feature representaton of the domans. Ths may be carred out by choosng a subset of features (Satpal & Sarawag, 2007), combnaton of features (Daumé III, 2007), or by fndng some structural correspondence between features n dfferent domans (Bltzer et al., 2006). All the descrbed approaches ental an ntal common feature representaton for the domans. Thus doman adaptaton s a specal case of the multple outlook problem, for the case of outlooks wth a common feature space. In Secton 6 we show that our approach can also be appled to ths problem. Multple outlook learnng s also closely related to the mult-vew setup (Rüpng & Scheffer, 2005). In ths setup, each vew contans the same set of samples represented by dfferent features. Clearly, any multple vew data s also some nstance of a multple outlook data wth the added requrement that each sample has observatons from multple outlooks. One common approach s to map a pattern matrx of each vew to a consensus pattern by matchng correspondng nstances (Long et al., 2008; Hou et al., 2010). Note that n the multple outlook framework each outlook contans a unque set of samples, thus sample to sample correspondence s mpossble. Amn et al. (2009) consders the case when correspondence s mssng for some nstances, but assumes the exstence of a mappng functons between the vews. Mult-vew learnng s sometmes referred to as manfold algnment. In manfold algnment we look for a transformaton of two data sets wth sample parwse correspondence that mnmzes the dstance between them, n an unsupervsed (Wang & Mahadevan, 2008) or a sem-supervsed (Ham et al., 2005) manner. Wang & Mahadevan (2009) present manfold algnment wthout parwse correspondence. To our knowledge, ths s the only work on manfold algnment that does not assume a parwse matchng of the samples. The algorthm presented n ths work s not orgnally suted for classfcaton as our algorthm. 3. Mappng Two Outlooks 3.1. Problem Settng The learner s gven two outlooks belongng to separate nput spaces X 1 and X 2 of dmenson d 1 and d 2 respectvely, wth a common target Y = {1,...,c}. We assume that all example pars of a gven outlook j = 1,2 are ndependently drawn from an unknown dstrbuton D j, whch s unque to each outlook. Denote by X (1) and X (2) the data matrces of class of outlook 1 and 2, respectvely. We use superscrpts to denote the outlooks ndex, and subscrpts to denote the classfcaton class. 3.2. Multple Outlook MAPpng algorthm In ths secton we present our man Multple Outlook MAPpng algorthm () for matchng the representatons of two outlooks. Throughout the dervatons outlook 2 s mapped to outlook 1, whch s sometmes referred to as the fnal outlook. Our goal s to map an outlook where we have ample labeled data, to an outlook where lttle labeled nformaton s avalable. As a prelmnary step to the mappng algorthm scalng s appled. The scalng s appled to each of the outlooks separately, and ams to normalze the features of all outlooks to the same range. Note that ths stage may be done usng unlabeled data when avalable. Next, we use the labeled samples to match the two outlooks. The goal of ths stage s to map the scaled representatons by rotaton and translaton. Specfcally, the mappng s performed by translatng the means of each class to zero, rotatng the classes to ft each other well, and then translatng the means of the mapped outlook to the fnal outlook. { } c Let ˆµ (1), ˆµ (2) be the set of emprcal means of =1 the outlooks. We translate the emprcal means of each

Learnng from Multple Outlooks class of both outlooks to zero: ˆX (j) = X (j) ˆµ (j) = 1,...,c, j = 1,2. (1) Next, we turn to matchng the man drectons of the classes by rotaton. Note that a rotaton matrx may be defned n many manners. We search for mappngs n the set of all orthonormal matrces (rotaton and reflecton). Our choce of mappng by rotaton s motvated by ts sometry property, whch allows us to mantan the relatve dstance between the samples. We construct utlzaton matrces for each of the outlooks as follows. Defne D (j) as the utlzaton matrx of outlook j and class. D (1) and D (2) are concatenated matrces constructed from the h mn(d 1,d 2 ) prncpal drectons of the correspondng outlook and class. That s, the h egenvectors of the emprcal covarance matrces largest egenvalues. ˆΣ (1), ˆΣ (2) correspondng to the h Usng the utlzaton matrces we fnd the rotaton matchng the outlooks by solvng the followng optmzaton problem: c {R } = argmn R D (2) (2) {R } =1 D (1) 2 F subject to: R T R = I = 1,...,c, where F s the Frobenus norm. To gan some ntuton on Problem (2) we dsassemble a term n the sum of the objectve functon arg mnr D (2) D (1) 2 h = argmax F l=1 v (1)T l Rv (2) l, where v (j) l (l = 1,...,h) are the prncpal drectons of the th class of outlook j. We obtan that Problem (2) s equvalent to maxmzaton of the sum of nner products between the prncpal drectons of outlook 1 and the rotated prncpal drectons of outlook 2, whch n turn mples mnmzaton of the frst h prncpal angles between the classes of both outlooks. Although Problem (2) s not convex t can be solved n closed form. For the solutons constructed n ths stage we borrow technques from the lterature of Procrustes Analyss(Gower & Djksterhus, 2004). Problem(2) s equvalent to arg max R c =1 ( ) tr R D (2) D (1)T subject to: R T R = I = 1,...,c. (3) Problem (3) s separable, thus each component n the sum may be optmzed separately. In the followng dervatons we drop the subscrpt for brevty. Algorthm 1 Matchng two outlooks Input: emprcal moments ˆµ (j), j. for = 1 to c do ˆX (j) X (2) = X (j) ˆµ (j) j = 1,2. = MatchByRotaton( (2) X + ˆµ (1). X (2) Mapped = end for Output: X (2) Mapped Algorthm 2 MatchByRotaton ˆX (1), ˆX (2) ). Input: matrces ˆX (1), ˆX (2). Construct matrces D (1),D (2). Compute SVD factorzaton D (2) D (1)T = USV T. R = VU T. Output: X(2) = ˆX (2) R T. Let USV T be the sngular value decomposton (SVD) of D (2) D (1)T. Defne Z = V T RU. Then, ( tr RD (2) D (1)T) = tr ( RUSV T) = tr(zs) = d z kk σ k k=1 d σ k, =k where σ k s the k-th sngular value of D (2) D (1)T. The upper bound s attaned for R = VU T snce n that case Z = I (Algorthm 2). After the rotaton, we translate the classes to match the orgnal means of the fnal outlook. The above dervaton gves rse to an algorthm that matches two gven outlooks. The algorthm s descrbed n Algorthm 1. Remark 1. Each outlook need not have the same dmenson. In ths case, the orthonormal constrant can not be obtaned as R s no longer a square matrx. However, ths problem can be easly solved. Suppose that D (1) and D (2) have dfferent numbers of rows. Then, smply add rows of zeros to the smaller dmensonal confguraton untl the dmensons are equalzed. In ths manner, we embed the smaller confguraton n the space of the larger one. Remark 2. Algorthm 1 does not rely on any correspondng nstances n both outlooks. However, when avalable, such nstances may ad the mappng accuracy and can be easly ncorporated nto the algorthm. It s possble to do so by addng columns of the correspondng nstances to the utlzaton matrces.

Learnng from Multple Outlooks 4. Extenson to Multple outlooks We present an extenson of Algorthm 1 to the case of multple outlooks. The multple outlook scenaro allows us to use the nformaton avalable n all the outlooks to allow better learnng of each one. To do so, we transform all the outlooks one to another. As for two outlooks, we begn by translatng the means of each class of all the outlooks to zero. In the rotaton step, the optmal rotatons are found by solvng mn {R (j) c R (k) D (k) R (j) D (j) 2 } F =1 k<j subject to: R (j)t R (j) = I,j. (4) Observe that Algorthm 2 produces an optmal soluton wth zero error, as there s always a perfect rotaton between two sets of h orthogonal vectors. Therefore, one optmal soluton of (4), whch attans an objectve value of zero, s to rotate all outlooks to a chosen fnal outlook. Namely, for m outlooks m 1 rotaton matrces are computed for each class. Fnally, shft the means of the rotated outlooks to those of the fnal outlook. If we want to swtch the choce of fnal outlook, all we need to do s apply the nverse mappng of the relevant outlook to all mapped outlooks. For example, to swtch from outlook s to k one needs to apply the followng transformaton: X (k) 5. Analyss = R (k) 1 ( X (s) ) ˆµ (s) + ˆµ (k) In ths secton we gve a probablstc robust nterpretaton of the rotaton process, and prove a sample complexty bound on the convergence of the estmated rotaton matrx. 5.1. Probablstc Interpretaton In ths secton we dscuss the effect of addng random nose to the utlty matrces on the optmal rotaton between two outlooks (Problem (2)). We do not assume knowledge of the probablty dstrbuton of the nose. Instead, we use ts bounded total value for some chosen confdence level. We show that the soluton to the nosed problem s bounded by the sum of the soluton to the orgnal problem and a constant value that depends on the nose. Notably, the nose only has an addtve effect to the bound. Let be the addtve random uncertanty to the utlty matrx D (2) for some class. Suppose that. ths uncertanty follows an unknown jont dstrbuton P. Ths uncertanty may be portrayed by a chance-constraned extenson of Problem (2) 1 : mn τ (5) R T R=I,τ { } R(D Pr (2) P + ) D (1) F τ 1 η, where η [0,1] s the desred confdence level. Optmzaton of the chance constraned problem s natural, as t obtans, wth hgh probablty, the optmal rotaton. However, despte ther ntutve probablstc form, chance constraned problems are generally ntractable (Shapro et al., 2009), thus we approxmate Problem (5) as follows. We defne ρ = nf α{pr P ( F α) 1 η} and obtan that wth probablty at least 1 η R(D (2) + ) D (1) F max F ρ R(D (2) + ) D (1) F. Therefore, Problem (5) s upper bounded by the followng mnmax problem mn max R(D (2) + ) D (1) F. (6) R T R=I F ρ Ths s the robust verson to the orgnal rotaton problem, wth the uncertanty set U = { F ρ } 2. Next, we construct the robust counterpart of (6). Theorem 1. Problem (6) s equvalent to ( ) RD mn (2) D (1) F +ρ. R T R=I The proof s provded n Harel & Mannor (2011). The theorem shows that Problem (2) s robust to a perturbatonofatotalboundedvalue. Thats, forabounded nose, the only dfference between the soluton to the orgnal problem and ts robust verson (Problem (6)) s an addtve constant ρ. From a probablstc pont of vew, the soluton of ths problem also provdes a bound on the chance constraned problem n (5). 5.2. Sample complexty bounds We next provde a bound for the sample complexty of the rotaton step of the algorthm. 1 Snce Problem (2) s separable, the extenson s done to each class separately. We drop the subscrpt, representng the class, from the followng dervatons for brevty. 2 The orgnal rotaton problem was actually the square of the Frobenus error. However, the two problems are equvalent snce takng the square does not change the soluton.

Learnng from Multple Outlooks Assumpton 1. (Gaussan Mxture) Each outlook s generated by a unque mxture of c Gaussan dstrbutons, where c s the number of classes. The samples of each outlook are realzatons of x c =1 w f (x), where f (x) N(µ,Σ ) and c =1 w = 1. We further assume that Exx T 1 for each component. Theorem 2. Suppose that Assumpton 1 holds. For each outlook, let δ,ɛ,ɛ (0,1), ( = 1,..,c) and suppose that the number of samples for each class satsfes: Then n C dh2 ɛ 2 ( ) ( ) log 2 32dh 2 ɛ 2 log 2 4hd. δ ( ) P ˆR R ɛ 1 δ, where, ˆR s the estmated rotaton matrx found by Algorthm 2, d s the dmenson and C s a constant. The proof of the theorem s provded n Harel & Mannor (2011). Note that the sample complexty of the mappng algorthm s domnated by the rotaton stage. In practce, the number of chosen prncpal drectons h s usually small. Also note that the bound on the norm of the second moment n Assumpton 1 s acheved by the scalng stage. 6. Experments In ths secton we demonstrate our framework on actvty recognton data, n whch dfferent users represent dfferent outlooks. In ths applcaton, the multple outlooks setup allows for valuable flexblty n real lfe recordngs. For example, some users may use a smple sensor confguraton for recordngs, whle others use a complex sensor board of multple sensors. Also, ths setup may resolve problems of varyng samplng rates when usng dfferent hardware and workloads. Inourexpermentswetesttwosetups: adomanadaptaton setup and a multple outlook setup. For the doman adaptaton setup a common feature representaton s used, whle for the multple outlook setup a unque feature space s used for each user. 6.1. Data set descrpton and feature extracton The data set used for the experments was collected by Subramanya et al. (2006) usng a customzed wearable sensor system. The system ncludes a 3-axs accelerometer, phototransstors for measurng lght, barometrc pressure sensors, and GPS data. The data consst of recordngs from 6 partcpants who were asked to perform a varety of actvtes and record the labels. We used the followng labels: walkng, runnng, gong upstars, gong downstars and lngerng. After removng data wth obvous annotaton errors the data conssts of about 50 hours of recordng, dvded approxmately evenly among the 6 users. For each user the actvtes are roughly dvded nto 40% walkng, 40 50% lngerng, 2 5% runnng, 2 3% gong upstars, and 2 3% gong downstars. See (Subramanya et al., 2006) for further detals on the sensor system and the recordngs. From the raw data we extracted wndowed samples as follows. From the accelerometer data we used the x-axes measurements sampled at 512Hz, whch we decmated to 32Hz. The barometrc pressure sampled at 7.1Hz, was smoothed and nterpolated to 32Hz. Next, we appled a two-second sldng wndow over each sgnal usng a wndow of approprate length. From each wndow a feature vector s extracted contanng the Fourer coeffcents of the accelerometer data, the mean of the gradent of the barometrc pressure, and the mean values of the lght sgnals. All together we obtaned 20-35 thousand samples for each user wth 37 features. As explaned n Secton 3.2, before mappng the outlooks scalng should be appled to all the outlooks. For all the experments, we scale the data to [0,1]. To reduce the senstvty of the scalng to outlers we frst collapse the extreme two percentle of the data to the value of the extreme remanng values (also known as Wnsorzaton). Scalng parameters are chosen on the tranng data and appled to the test data. Ths preprocessng was appled to all baselne classfers. 6.2. Doman Adaptaton Setup As mentoned above, multple outlook learnng may also be appled for doman adaptaton. We tested both standard doman adaptaton of two domans, as well as multple source doman adaptaton. For the two doman problem we adopted the commonly used termnology n doman adaptaton of source and target domans. We appled Algorthm 1 for dfferent fractons of target labeled data and fully labeled source data. The performance was computed by 10-fold crossvaldaton, each fold contanng random samples from each class accordng to ts fracton n the complete set. The only parameter of the algorthm h was chosen on a random splt. We test the success of the mappng algorthm by classfcaton of the target test data wth a classfer traned on the mapped source data, denoted as the classfer (no target data was used for tranng). Ths s a mult-class classfcaton problem, wth fve possble labels. We use a mult-class SVM classfer wth an

Learnng from Multple Outlooks RBF-kernel(C = 64,γ = 5 3 )obtanedbylibsvm software (Chang & Ln, 2001). The data are unevenly dstrbuted among the fve classes, therefore we use the balanced error rate () as a performance measure: = 1 c c =1 1 n e, where e and n are the numbers of errors and number of samples n class respectvely, and c s the number of classes. We compare the classfer to the followng baselnes: a target only classfer, traned on the avalable labeled target data (); a source only classfer, traned on the source data (SRC); a classfer traned on all avalable labeled data of target and source (ALL); and the doman adaptaton algorthm presented n (Daumé III, 2007) (FEDA). We also add the optmal error, obtaned by tranng on the fully labeled target data (). The results are presented n Fgure 1. It can be observed that the classfer outperforms the baselne classfers for most fractons of target labeled data. The algorthm performs well across all sets of users, for example, for 5% labeled data t s sgnfcantly better (p-value< 0.05) than the, SRC and FEDA classfers for all sets, and sgnfcantly better than the ALL classfer for 18 out of 30 possble sets (see Table 1 n Harel & Mannor (2011)). In the next experment we consder mxtures of m source domans wth some labeled data (both tranng and test sets are mxtures). We use the extenson to multple outlooks presented n Secton 4 to fnd the mappngs of the sources to each outlook. We test the classfcaton performance on each component of the mxture wth a classfer traned on all the mapped sources. The fnal performance measure s the mean averaged on all the sources. As n the prevous experment, the evaluaton was done by 10-fold cross-valdaton, wth the same classfer. The baselnes are smlar, wth the change of the to the mean value of multple classfers traned n each doman, and the ALL baselne to a classfer traned on all sources (the SRC classfer was not relevant). The experment was performed on all 20 trplet combnatons. Sample results are presented n Fgure 2. These trends were consstent across users, for example, for 15% of labeled data the algorthm outperforms all other classfers for 15 of the combnatons (p-value< 0.05). In the 5 remanng combnatons, the algorthm performed sgnfcantly better than the and FEDA algorthms, and equally well as the ALL classfer (see Table 2 n Harel & Mannor (2011)). For 3 The parameters were chosen on the target classfcaton problem. Common parameters were chosen for clear performance comparson of the dfferent classfers. 5 5 5 5 SRC ALL FEDA 0 0.05 0.1 5 (a) User 5 User 3 SRC ALL FEDA 0.1 0 0.05 0.1 5 (b) User 6 User 2 Fgure 1. Doman adaptaton setup for 2 domans. 5 5 5 SRC FEDA 0.1 0 0.05 0.1 5 Fgure 2. Doman adaptaton setup for multple outlooks: users 1,2 and 5. larger portons of labeled data the algorthm also obtaned smaller error than the ALL classfer (pvalue< 0.05). The effect of the ALL classfer may be a result of some regularzaton obtaned from tranng on data from smlar yet dfferent domans. 6.3. Multple Outlook Setup We conducted three types of experments for the multple outlook setup, each wth a dfferent feature representaton. The experments setup was smlar to the prevous experments wth some adjustments to the baselnes: the SRC, ALL and FEDA baselnes were no longer relevant, as the outlooks features dffer. In the frst experment we tested the multple outlook algorthm on two outlooks for the case of dfferent sen-

Learnng from Multple Outlooks 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0.05 0.1 (a) User 3 User 2 0 0.05 0.1 (b) User 4 User 3 0 0.05 0.1 (a) User 3 User 1 0 0.05 0.1 (b) User 2 User 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0.05 0.1 (c) User 2 User 6 0 0.05 0.1 (d) User 1 User 5 0 0.05 0.1 (c) User 4 User 6 0 0.05 0.1 (d) User 3 User 4 Fgure 3. Two outlooks wth dfferent sensors. Fnal outlook: accelerometer and pressure. Mapped outlook: accelerometer, pressure and lght sensors. The mssng features n the fnal outlook are replaces by nose. sors and added nose features. For the mapped outlook we used full feature representaton (37 features). For the target outlook we used the accelerometer s and pressure features, and excluded the lght measurements. Instead of the lght features we added features wth Gaussan random nose (N(0, 1)). The experment was performed on all par combnatons. For 5% labeled data of the learned outlook, the mean of the was 4.5% (±2.7%) lower than that of the classfer. The results for four user pars are presented n Fgure 3. These results show that the mappng was successful, as tranng on the mapped data outperforms tranng on partal data n the target outlook. In Fg. 3(c) the algorthm has lower error than the classfer for some fractons; ths may be a result of the added nformaton n the lght features. In the second experment we tred to learn from two outlooks wth a dfferent number of features resultng from dfferent samplng rates. Specfcally, for the learned outlook we kept the full feature representaton as descrbed n Secton 6.1, whle for the mapped outlook we used the same type of features but wth 30Hz samplng rate nstead of 32Hz. Ths resulted n 37 features n the target outlook and 35 n the mapped one. Note that our algorthm may be easly modfed for ths scenaro; see Remark 1 n Secton 3.2. For 5% labeled data the algorthm had on aver- Fgure 4. Multple outlook learnng for two outlooks wth dfferent samplng rates. age 5.9% (±2.4%) lower than the classfer. Fgure 4 presents the results on four user pars. In Fgs. 4(a) and 4(c) the algorthm has lower error than the classfer. Observe that ths s possble snce the balanced error rate s presented, whch treats the error n dfferent classes equally (namely, the classfer does not outperform the nonbalanced error). In the thrd experment we constructed the feature representaton of each outlook from the 33 accelerometer s features to whch we added 10 features of Gaussan nose(n(0,1)). Wethenrandomlypermutedtheorder of the features of each outlook. For ths experment, we used samples belongng to the walkng, runnng and lngerng classes, as we dd not use the full feature set. The experment was performed for two outlooks as well as for multple outlooks. The results ndcate the performance boost from especally for the runnng actvty. Due to space lmtatons we provde the results n Harel & Mannor (2011). 7. Future Work Our proposed approach s a frst step n developng the methodology for learnng from multple outlooks. Ths approach may be extended to many nterestng drectons. Frst, n ths paper we only consdered affne mappngs between the outlooks and a natural extenson s to consder rcher classes of transformatons such as pecewse lnear mappngs. Also, our ap-

Learnng from Multple Outlooks proach s batch n the sense that frst all the data have to be processed and then the classfcaton algorthm can be used. A dfferent extenson of practcal nterest would be to develop an onlne verson of the proposed approach that takes samples one by one and gradually mproves the mappng. Fnally, a major applcaton doman, of ndependent nterest, s natural language processng. Here the challenge would be to use a language where labels are abundant to better classfy n a dfferent language. The man obstacle here seems to be the nature of representaton: language data are often represented as sparse vectors whch may call for a dfferent type of transformatons between the outlooks. References Amn, M., Usuner, N., and Goutte, C. Learnng from Multple Partally Observed Vews an Applcaton to Multlngual Text Categorzaton. In Advances n Neural Informaton Processng Systems, 2009. Bltzer, J., McDonald, R., and Perera, F. Doman adaptaton wth structural correspondence learnng. In Proceedngs of the 2006 Conference on Emprcal Methods n Natural Language Processng, pp. 120 128. Assocaton for Computatonal Lngustcs, 2006. ISBN 1932432736. Bltzer, J., Crammer, K., Kulesza, A., Perera, F., and Wortman, J. Learnng bounds for doman adaptaton. In Advances n Neural Informaton Processng Systems, volume 20, pp. 129 136. Cteseer, 2007. Chang, C. and Ln, C. LIBSVM: a lbrary for support vector machnes, 2001. Software avalable at http://www.cse.ntu.edu.tw/~cjln/lbsvm. Daumé III, H. Frustratngly Easy Doman Adaptaton. In Proceedngs of the 45th Annual Meetng of the Assocaton for Computatonal Lngustcs ACL, volume 1, pp. 256 263. Assocaton for Computatonal Lngustcs, 2007. Gower, JC and Djksterhus, G.B. Procrustes Problems. Oxford Unversty Press, USA, 2004. Ham, J., Lee, D., and Saul, L. Semsupervsed algnment of manfolds. In Proceedngs of the Annual Conference on Uncertanty n Artfcal Intellgence, Z. Ghahraman and R. Cowell, Eds, volume 10, pp. 120 127. Cteseer, 2005. Harel, M. and Mannor, S. Learnng from Multple Outlooks, 2011. http://arxv.org/abs/1005.0027v1. Hou, C., Zhang, C., Wu, Y., and Ne, F. Multple vew sem-supervsed dmensonalty reducton. Pattern Recognton, 43(3):720 730, 2010. ISSN 0031-3203. Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., and Scholkopf, B. Correctng sample selecton bas by unlabeled data. In Advances n Neural Informaton Processng Systems, volume 19, pp. 601. Cteseer, 2007. Long, B., Yu, P.S., and Zhang, Z.M. A general model for multple vew unsupervsed learnng. In Proceedngs of the 8th SIAM Internatonal Conference on Data Mnng (SDM 08), Atlanta, Georga, USA, 2008. Mansour, Y., Mohr, M., and Rostamzadeh, A. Doman adaptaton wth multple sources. In Advances n Neural Informaton Processng Systems, volume 21, pp. 1041 1048. Cteseer, 2009. Rudelson, M. and Vershynn, R. Samplng from large matrces: An approach through geometrc functonal analyss. Journal of the ACM (JACM), 54 (4):21, 2007. Rüpng, S. and Scheffer, T. Learnng wth multple vews. In Proceedng of the Internatonal Conference on Machne Learnng Workshop on Learnng wth Multple Vews, 2005. Satpal, S. and Sarawag, S. Doman adaptaton of condtonal probablty models va feature subsettng. In Proceedngs of Prncples of Data Mnng and Knowledge Dscovery, pp. 224 235. Sprnger, 2007. Shapro, A., Dentcheva, D., Ruszczyńsk, A., and Ruszczyńsk, A.P. Lectures on stochastc programmng: modelng and theory. Socety for Industral Mathematcs, 2009. ISBN 089871687X. Shmodar, H. Improvng predctve nference under covarate shft by weghtng the log-lkelhood functon. Journal of Statstcal Plannng and Inference, 90:227 244, 2000. Stewart, G.W and Sun, J.G. Matrx Perturbaton Theory. Academc Press, 1990. Subramanya, A., Raj, A., Blmes, J., and Fox, D. Recognzng actvtes and spatal context usng wearable sensors. In Proceedngs of the Conference on Uncertanty n Artfcal Intellgence. Cteseer, 2006. Wang, C. and Mahadevan, S. Manfold algnment usng Procrustes analyss. In Proceedngs of the 25th Internatonal Conference on Machne Learnng, pp. 1120 1127. ACM, 2008. Wang, C. and Mahadevan, S. Manfold algnment wthout correspondence. In Proceedngs of the 21st Internatonal Jont Conferences on Artfcal Intellgence, 2009.