A New Instance Weighting Method for Transfer Learning

Transcription

1 A New Instance Weighting Method for Transfer Learning BARIŞ KOÇER, AHMET ARSLAN Department of Computer Engineering Selçuk University Alaaddin Keykubat Kampusu, Muhendislik Mimarlık Fakültesi, Selçuklu / Konya TURKEY bariskocer@selcuk.edu.tr, ahmetarslan@selcuk.edu.tr Abstract: - Abstract knowledge which will be transferred from one task to other can be parameters, feature representation or training instance. Instance transfer is the intuitively appealing way for knowledge transfer but instances of the source task must be reweighted for target task. We developed a novel instance transferring and weighting method for inductive transfer learning problems using genetic algorithms. Results are better than one of the state of art instance transfer methods graph transfer [11]. Key-Words: Transfer learning; genetic algorithms; neural networks; instance transfer. 1 Introduction Traditional supervised machine learning techniques start from scratch with some training data and try to make predictions based on same distributed training data. If the distribution changes for example due to outdated training data, traditional machine learning task needs to be re-trained with new training data. However in real life, it may be very difficult to obtain more and more training data for every time when the distribution changed. A human uses its past experiences, instead of searching lots of new training data, while solving a new problem. This ability is the source of human intelligence to quickly adapt to new situations with sparse training data. In real life transfer learning can be described as using past experiences which is related to current problem in order to solve the problem more quickly. In machine learning, transfer learning can be described as extracting and transferring abstract knowledge from tasks to another related task which need extra training resource. So transfer learning techniques are inspired from human intelligence to adapt new situations. The need for transfer learning in machine learning is first discussed in a workshop on Learning to learn [1] and a good comprehensive survey for transfer learning is prepared by Pan and Yang [2]. There are two kinds of tasks in transfer learning. First one is source task which knowledge will be extracted from and mostly has enough training data and second one is target task which has insufficient labeled or unlabeled training. So target task needs auxiliary data or knowledge Main Problems of Transfer Learning: A transfer learning method should solve the problems below for efficient abstract knowledge transfer: -Determining of tasks relatedness: In order to have performance improvement in target task source and target tasks must be relevant. Otherwise all attempts for knowledge transfer will fail, even will reduce the performance of target task and this side effect is named as negative transfer -Determining knowledge transfer method: This is the main step of transfer learning methods because in this step best knowledge transfer strategy should be selected which will transfer as much useful knowledge as possible. -Determining knowledge transfer amount: If first and second problems are solved next step is determining how many knowledge will be transferred. For example in instance transfer approach if too many instances from source task is transferred than target task will bias through source task and negative transfer will happen. -Enabling reusability: This is second priority problem compared to first three. If transferred knowledge is in a reusable format similar target task can combine this knowledge with new knowledge and may have better performance. ISBN:

2 1.2. Transfer Learning Settings: Transfer learning problem can be categorized by absence or presence of labeled data for target or source task. Inductive transfer learning: When there is a little labeled data for target task and there are lots of labeled or unlabeled data for source task this type of transfer learning problems are named as inductive transfer learning. Transductive Transfer learning: When there is lots of labeled data for source task but there is no labeled data for target task this type of transfer learning problems are categorized in transductive transfer learning. Unsupervised Transfer learning: In this setting there is no labeled data for both source and target task but there much more unlabeled data for source task than target task Transfer Approaches: Each different transfer learning setting need different transfer approaches. Transfer approaches for transfer learning can be categorized in three main titles. Instance transfer: This approach is intuitively appealing and based on transferring or weighting suitable part of labeled source data to target task. This approach can be applied to both inductive and transductive transfer settings. Feature representation transfer: Finding a common feature representation which reduces differences between source and target task is another transfer learning approach and can be applied in all transfer learning approaches. Parameter transfer: Discovering shared priors of source and target task or transferring shared parameters between source and target task may improve performance of target task. This type of approach is named as parameter transfer and can only be applied in inductive transfer learning setting The rest of the paper is organized as follows. In Section 2 related transfer learning work are summarized, in Section 3 developed method is introduced, in Section 4 experimental settings and performance comparisons of the method against the graph transfer[11] are illustrated, in Section 5 experimental result are discussed and in Section 6 comparative analysis are included. 2. Related work: Transfer learning methods are applied to different problem domains. For example in reinforcement learning, skill transfer [3], action schema transfer [4] and control knowledge transfer [5] are good samples studies. We have focused instance transfer approach in inductive transfer setting. Inductive transfer setting can be described with indoor wi-fi localization task which is training a model in an indoor environment which is split into fixed cell. This task is simply predicting cell coordinate of a receiver using signal strength of access points. The problem which transfer learning becomes necessary is labeled data for indoor wi-fi localization task become out of date easily due new obstacles or reflection in time [6] or hardware of receiver or access points is changed [7]. Thus instead of obtaining new training data again transfer learning methods can be applied to adapt old training data to new situation. Transfer learning can also be used for obtaining good performance with sparse labeled data in wi-fi localization task. For example training data may be obtained only for a little part of a very big indoor environment and it is very difficult to obtain full map of the environment [9] so when transfer learning methods are applied, amount of labeled data need to build localization model is significantly reduced. We tested our method with text categorization task. One of the prior works about text categorization with transfer learning is studied by Dai et al. [10] which uses expectation maximization based Naïve Bayes classifier and another method proposed by Eaton et al. [11] which uses graph based transferability measurement and extracts transfer parameters from source tasks. However these text categorization algorithms are old works, these works are latest examples for instance transfer approach used in text categorization. Another sample for text categorization is proposed by Dai et al. [12]. This is a text categorization method which is modified version of AdaBoost algorithm to leverage old labeled data with the help of a little newly labeled data to build an accurate classification model. In this work we have developed a genetic algorithm based transfer learning method and only example for genetic algorithms used in transfer learning is [13]. In this early work knowledge which will be transferred is not evaluated for target task and knowledge which will be transferred selected randomly from solution pool. We modified this algorithm for classification task and modified it for better performance. ISBN:

3 3. Instance weighting via genetic algorithms: We focus on instance transfer approach for inductive transfer learning in this work because there is not any instance selection or weighting algorithm that uses the power of genetic algorithms. We used hybrid genetic algorithms with artificial neural network (ANN). Although we used a usual hybrid algorithm, this work is not a new GA/ANN hybrid algorithm instead this is an instance weighting approach which uses this hybrid algorithm. Two different set of data are used as usual in inductive transfer learning. One of them is labeled data of source tasks DS and the other is labeled target task data DT. Aim of the genetic algorithm is determining best ANN weights and weights for each DS instances, which used by ANN as learning rate. Fitness of the each individual is calculated by predictive accuracy of the ANN after trained by DS (with weights in individual) +DT (with weight 1) 1 epoch and using DT as test data. "1" is used as weight for DT data instances in order to establish bias toward to target task. We choose 1 for DT instances because we used 1 as biggest weight while choosing random initial weights. At this point we update network weights of individuals after 1 epoch fitness evaluation so we both use the exploitation ability of genetic algorithms and exploration ability of neural networks. So DS weights and ANN weights are coded into individual and when fitness evaluation is needed ANN weights are placed to corresponding place in ANN and ANN is trained with labeled data of source task DS by using each instances weight which is extracted from individual as learning rate. Thus instances which have bigger weight values affect more than small weight valued instances in fitness evaluation. Training data of target task DT is also used for fitness evaluation with maximum weight value in order to provide bias the network to target task. So in this hybrid model neural network is trained with DS + DT but we used maximum available weight of 1 for DT training instances in order to provide bias through to target task and fitness values are calculated by predictive accuracy of the trained network on DT dataset. After pre-determined count of generations is generated, best fitness valued individual is trained in an artificial neural network for pre-determined count of epochs using DS + DT dataset. Figure 1. Flowchart of proposed instance weighting method. 4.Experimental Setting Proposed method is tested on text categorization task. Text categorization is assigning a text document to one of the pre-determined main categories. We used genetic algorithm artificial neural network hybrid algorithm. We used 50 input node 100 hidden node and 1 output node in neural network so 5100 real valued weights are coded in individual for neural network. Input nodes are binary valued and each one corresponds to a word which is determined by WEKA s [14] string to word vector component from DS. We used 50 most discriminative words so 50 input features are used. Before determining the words, all header information is deleted from all documents. Each document in DS is converted to a training instance. If document includes desired word then corresponding input feature is set to 1 otherwise feature set to 0. There is only one output feature to ISBN:

4 decide whether instance belongs to desired sub category or not. We compared our result with graph based method [11] and almost same settings are used. We have had to choose this work because this is the one of the latest work for instance transfer approach in inductive transfer setting. Seven per cent of twenty newsgroups data set [8] i.e. 70 unique positive and 70 unique negative documents for each task are used. In order to prevent interfere, negative examples are selected from first sub categories of the main categories and positive examples of each task is selected remaining sub categories. So there are 13 tasks. When a task is selected as target task remaining tasks are assumed as source tasks and aim is transferring and weighting instances from these 12 source tasks and improving performance of the target task which has limited labeled data. All available source task data is also coded in individuals in genetic algorithms to determine instance weights. Population size is set to 50; maximum generation count is set to 20. Crossover rate is selected 75% and mutation rate is selected 1%. After 20 generation an feed-forward multilayer ANN is trained 100 epoch by backpropagation, using neural network weights and instance weights of the best fitness valued individual with DS+ DT data set. Every trial for different percent of DT is repeated 10 times independently by selecting desired training percent of data randomly from available training data and result in graphics are drawn from the mean value of these independent runs. Test results for different tasks, which are also included in [11], are illustrated in figure 2 to 5. Figure 3. Experimental results for rec.sport.baseball Figure 4. Experimental results for sci.space Figure 5. Experimental results for talk.politics.mideast Figure 2. Experimental results for comp.windows.x ISBN:

5 5.Experimental Results: Although we used 50 features instead of 100 which is the feature count of graph transfer method used in [11], proposed method has better results in sci.space and talk.politics.mideast data sets and has better result for high per cent of training data in comp.windows.x and rec.sport.baseball data sets. Performance of our method is worse than graph transfer as seen in Figure 2 and Figure 3 while using low percent of training data because instead of choosing some of the source tasks for instance transfer in graph transfer our method uses all available knowledge of source tasks so if there is a little training data it may decide wrong weights and there will be more negative effect than graph transfer. But this is not always same, because in figure 4 and 5 performance of proposed method is better beginning from low percent of training data. For high percent of training data there will be more positive effect and our method performs better than graph transfer as it seen in figure 2 to 5. 6.Conclusion: In this work we have demonstrated a novel instance weighting algorithm using genetic algorithms. We have also proposed solutions for the transfer learning problems which described in section 1.1 with the proposed instance weighting method because with the proposed method, task relatedness is calculated by natural selection. Only feasible information is transferred to target task using weighted instance transfer. Last advantage of the proposed method is of course reusability. When similar classification task are met, same solution pool can be evaluated and used. References [1] IPS95_LTL/transfer.workshop.1995.html [2] Pan, S.J., & Yang Q. (2008). A Survey on Transfer Learning. Department of Computer Science and Engineering Hong Kong University of Science and Technology. [3] Konidaris, G. and Barto, A Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. Ijcai Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [4] Cohen, P. R., Chang, Y., and Morrison, C. T Learning and transferring action schemas. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. International Joint Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [5] Fernández, S., Aler, R., and Borrajo, D Transferring learned control-knowledge between planners. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. International Joint Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [6] V.W. Zheng, Q. Yang, W. Xiang, and D. Shen, Transferring Localization Models over Time, Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [7] V.W. Zheng, S.J. Pan, Q. Yang, and J.J. Pan, "Transferring Multi-Device Localization Models Using Latent Multi-Task Learning," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [8] Rennie, J.: 20 Newsgroups data set, sorted by date. Available online at (September 2003) [9] S.J. Pan, D. Shen, Q. Yang, and J.T. Kwok, Transferring Localization Models across Space, Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [10] W. Dai, G. Xue, Q. Yang, and Y. Yu, Transferring Naive Bayes Classifiers for Text Classification, Proc. 22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [11] E. Eaton, M. desjardins, and T. Lane, Modeling Transfer Relationships between Learning Tasks for Improved Inductive ISBN:

6 Transfer, Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 08), pp , Sept [12] W. Dai, Q. Yang, G. Xue, and Y. Yu, Boosting for Transfer Learning, Proc. 24th Int l Conf. Machine Learning, pp , June [13] Koçer, B., Arslan, A Genetic Transfer Learning. Expert System with Application vol [14] Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA (2000) ISBN: