Error Back-propagation in Multi-valued Logic Systems

Error Back-propagation in Multi-valued Logic Systems Georgios Apostolikas Stasinos Konstantopoulos Institute of Informatics and Telecommunications NCSR Demokritos, Ag. Paraskevi 153 10, Athens, Greece {apostolikas,konstant}@iit.demokritos.gr Abstract Error back-propagation and its many variations has been used extensively to train neural networks. A multi-layer system cannot be trained in a supervised learning scheme because data are usually provided only as end-to-end input-output pairs for the global system. The central idea of error back-propagation is to derive target input-output pairs for each layer in the system from the global input-output data. We propose a new method for error-back propagation in a fuzzy Description Logic reasoning system. This permits us to derive inputoutput data pairs in a two-layer setup for training the lower-layer classifiers. To the best of our knowledge, this is the first error back-propagation method for a logic reasoning system. 1. Introduction Supervised learning is the predominant training methodology in artificial intelligence systems. The main concept of supervised learning is the presence of a teacher who provides the correct action (or output) for a given situation (or input). Therefore, the general target of supervised learning is identifying the function that correctly maps input to output. In multi-layer systems, each layer has intermediate output, which is piped to the next layer as input. In the majority of supervised learning scenarios, these intermediate datasets are not provided, but can be extracted by error-back propagation, i.e., by computing for each intermediate layer the difference between the current and the desired output [7]. An example is multi-layer perceptrons, the most common neural networks found in the literature [4]. Error back-propagation actually computes the derivatives of the system parameters (i.e., the desired change) rather than the actual target output for each layer. In that sense, it can be considered as part of the general framework of Automatic Differentiation Algorithms [2]. There are many variations of back-propagation that have been proposed for decentralized computational models and especially multi-layer perceptrons [8]. Error back-propagation has been extensively used in various statistical supervised learning setups, but has not been explored at all with systems where the modelling formalism is based on symbolic logic. In this paper, we propose a new method for error back-propagation through a fuzzy logic system. These latter systems, and, more particularly, systems based on a family of logics called Description Logics (DLs), have been steadily gaining importance as they are one of the core technologies of the Semantic Web. The proposed method 1

Figure 1. The two-layer architecture. is implemented on a two-layer classifier, in the context of an automatic video annotation system for categorizing news broadcasts. The paper is organized as follows: Section 2 describes the application and the video annotation system, making references to related existing work, and then Section 3 describes the proposed methodology and our implementation. Finally, Section 4 comprises some concluding remarks and and outlook on future research. 2. Problem Statement At the core of the video annotation system is a two-layer classifier of TV news broadcasts (Figure 1). The first layer is an array of Support Vector Machines (SVMs) that map lowlevel video features (like colour, texture, position, etc) to high-level video features that represent the recognition of concrete objects in the video (e.g., human figure, football pitch, fire, etc.). The second layer is a conceptual model of the domain, that assigns abstract features (e.g., interview, sports item, and so on) based on the concrete-object recognition features. The inference engine behind this second layer is a Fuzzy DL Reasoner, a multi-valued logic reasoner, described in more detail in Section 3.1 below. The fuzzy reasoner places multimedia document in a category hierarchy based on the high-level video features and the axioms of the conceptual model. These axioms are hand-crafted by domain experts and are fixed (not trained by machine learning) in our setup. Therefore, in the twolayer classifier only the SVM array is trained with machine learning methods. 2.1. Inclusion of Human Knowledge in Computational Models Moving away from the specific video annotation scenario we can generalize the problem to the domains of system modelling and classification. Going back to Figure 1, what we are actually facing is the challenge of learning a highly complex input-output mapping from lowlevel video features (overall system input) to categories (overall system output). The domain has a very high complexity and, thus, is considered intractable to learn the desired mapping from scratch using machine learning methods. The most prominent approach for handling such challenging learning tasks is to incorporate prior human knowledge in the system. 2

We do this by employing a two-layer architecture with the second layer integrating human expertise, while the first layer is trainable for improving and fine-tuning the overall system performance. Consequently, we have reduced the learning task down to a manageable complexity, since the strong non-linearities of the mapping have been absorbed by the hand-crafted fuzzy reasoning system. The above approach lies in the more general framework of the methods of injection of human knowledge in computational models. Computational models, such as neural networks or SVMs in our case, have a high learning capacity and, given enough data and time, can approximate any non-linear function in a supervised training setup. However, this universal approximation potential comes at a cost; the model parameters cannot be interpreted by human logic and, vice versa, human experience cannot be directly mapped to the model s parameters. Although local approximators (e.g. centroid-based or basisfunction computational models) can easily include human knowledge (i.e. in the form of fuzzy rules for neuro-fuzzy systems), they lack the generalization capabilities of global approximators such as multi-layer perceptrons. Moreover, local approximators are known to suffer significant performance loss with the increase of input-space and domain complexity. Last but not least, the human knowledge that can be entered in those models is in very basic form (i.e. simple if-then rules in the neuro-fuzzy system), lacking the expression capacity and input-output mapping potential of high-level symbolic systems such as description logics. The DL fuzzy reasoner (actually working on a subset of first-order logic), if seen from the computational theory perspective, can form highly non-linear global approximation functions, can easily cope with complex, compound and sparse mappings, has no problem scaling to multidimensional input and can achieve excellent generalization over the whole input space. 2.2. User Feedback Exploitation and Error Back-Propagation User feedback consists of the correct category of the video, and the user is never burdened with checking lists of recognized objects, their relative positions, or any other internal details of the classification process. This approach makes the system more user-friendly, and the users more likely to actually provide feedback, but, on the other hand, deprives the system of any information about the intermediate output (high-level features) of the two-layer setup. The essential target of our approach is to maintain the supervised-learning capability of the overall system, even after the human knowledge (second layer subsystem) has been incorporated in the classifier. To achieve this, we need to derive a meaningful first layer target output from the user feedback that consists only of the global (second layer) system output. Therefore, we must back-propagate the output error of the fuzzy system to its inputs. This is no trivial task since the input-output function of the fuzzy reasoner cannot be analytically represented and, therefore, the classic approach of automatic differentiation or similar methods cannot be applied to derive the error in the input layer. The developed method is analyzed in the next section. Following the fuzzy semantics, all inputs and outputs (including intermediate ones) are scalars in the range [0, 1]. A value closer to 1 indicates the presence of a feature while a value closer to 0 indicates absence of the feature. The output layer consists of one output per category. The intermediate inputs (high-level video features) and their corresponding fuzzy truth values are used to populate the reasoner knowledge-base. The outputs are computed 3

by the fuzzy reasoner as follows: each output is defined as a separate concept in the knowledge base. For each video that is classified, the reasoner computes the video s fuzzy degree of membership in each and every of the output category concepts. These degrees of membership are, effectively, the outputs of the second layer. The concept definitions that populate the knowledge base are provided by human domain experts, and reflect highlevel, conceptual knowledge about the domain. For our purpose, this hand-crafted fuzzy DL model is treated as a fixed mapping from its inputs to outputs. 3. Methodology and Implementation To support the back-propagation method, a new fuzzy DL reasoner has been developed. The fuzzy reasoner satisfies the need for construction of proof trees that contain inputs as free variables. The error back-propagation algorithm then uses those proof trees to calculate admissible values for the input of the fuzzy system, and afterwards compute the actual target values for the input. 3.1. Multi-valued DL Reasoning Logical formalisms like Description Logics are typically interpreted with set-theoretic semantics which define logical connectives and operators in terms of set theory. We shall not here re-iterate these formal foundations, but refer the interested reader to handbooks of Description Logics [1, Chapter 2]. Informally, unary predicates (concepts) are interpreted as sets of individuals, binary predicates (relations) as sets of pairs of individuals, and the logical connectives as set operations; for example, concept disjunction is interpreted as set union, and so on. Multi-valued logics, on the other hand, base their interpretations on fuzzy set-theoretic semantics. In fuzzy set theory [9], set membership does not have a binary yes-no valuation but a numerical one, denoting the degree to which an individual is a member of the set. Fuzzy interpretations are based on algebraic norms that provide multi-valued semantics for the logical connectives. The most common approach to implementing multi-valued reasoners is to combine proof algorithms, like resolution or tableaux, with numerical methods [5, 6]. Existing reasoning algorithms and implementations, however, demand that the degrees of all assertions in the knowledge base are known numerical constants. As mentioned before, this makes them unsuitable for our purpose, as our back-propagation method (see following section) relies on the ability to provide unbound variables as degrees for some assertions, and algebraically calculate these variables values (or rather, admissible value ranges) given the degrees of the rest of the knowledge base. In order to satisfy this requirement, we have developed a new DL reasoner that can handle unbound-variable degrees by calculating results as systems of restrictions over these variables. We shall here only briefly introduce the reasoner, as it is discussed in detail elsewhere [3]. The structural part of the reasoning process implements the resolution proof method, which builds a proof tree by re-writing logical formulae based on the assertions (rules and facts) in the knowledge base into logically equivalent, but more elementary formulae; each time a disjunction is encountered, the tree branches. As the proof tree gets built, the degree of each node gets calculated using the algebraic norm. Occasionally, assertions with unbound fuzziness degrees (as opposed to bound 4

numerical values) will be encountered. The admissible values for these variables get restricted within the range that would yield the required valuation for the overall expression, which builds up to a system of linear constraints that is solved using a constraint linear programming implementation. At the end of this process, one can collect at the leaves of the open branches a system of inequations. This system specifies the admissible values for the unbound degrees, so that the original formula at the root of the tree is satisfiable. 3.2. Error back-propagation through a multi-valued DL reasoner Error back-propagation is traditionally based on the computation of the partial derivatives that correspond to the desired direction and magnitude of the displacement for each system parameter. A simple algorithm is afterwards used to actually change the value of the parameter towards the specified direction. This change is usually a small fraction of the full change needed for the parameter in order to drive the output to the given supervised value. Many small changes will eventually lead the system parameters towards values that model the physical system from which supervised training data were drawn. That s the central idea for statistical learning. The error-back propagation method proposed does not depend on the computation of derivatives for the direction of change of system parameters. Instead it directly calculates the desired changes in the input layer (high-level features) in order to correctly map the given input-output training pair. Note that in this subsection, since we are only interested in the fuzzy reasoner, we will denote the fuzzy reasoner input (high-level features) as simply input and its output as output. The problem can be stated as: Given the current state of the system (current system parameters) and a new inputoutput training pair that is incorrectly mapped by the current system, find the input that: 1. When fed to the current system yields acceptable output (this will be defined later). 2. Is as close as possible to the input of the training pair with respect to some distance measure. In the context of the algorithm s intended application, the hybrid classifier-reasoner system is expected to classify inputs in a number of predefined categories. For the categories the instance belongs to, the reasoner output should be a fuzzy value above a certain threshold. Accordingly, for the same input, the outputs for the rest of the categories should be below another selected threshold. It is clear that since we deal with a fuzzy system, an exact match of the system output with the training pair output is neither required nor desired. In our platform, which represents the most common scenario, the training data output is a truth value that indicates membership in the specific category. Therefore, we define the acceptable output of point 1 above, as a value that satisfies either the lower accepted threshold for category membership or the higher accepted threshold for non-membership. For point 2 above, we use the Euclidean distance. The implementation of the reasoning algorithm analyzed in the previous paragraph permits the extraction of the unbound proof tree for each category and high-level feature. For the purpose of propagating the error back through the fuzzy reasoner, we, therefore, build a separate proof tree for each pair of category and high level feature. Note that the proof tree is generally different for each training data pair since, in each case, different high-level features (non-zero inputs) will populate the reasoner knowledge- 5

base. The proof tree is unbound in the sense that it includes one input parameter as a variable that can be entered in the tree to calculate the truth value for the output. The output can be derived by the unbound proof tree by binding the specific variable in the tree, i.e. by providing a numerical value for the input variable. It is clear that on a per category basis, system incorrect mapping could mean that the category has a high truth value (above some threshold) although the instance is not member of the category and vice versa. Therefore, for each category the method sets correspondingly an upper or lower value threshold. The goal is to find the fuzzy reasoner input (high-level feature) values that produces an output that satisfies the threshold and has the maximum proximity to the real high-level features (as the computed output of the first layer). The method for input error assignment works directly on the derived unbound proof tree. The method poses the requirement set by the defined threshold and computes constraints on the input variables for satisfying this requirement. Each leaf node in the proof tree corresponds to a conjunction of concepts that either have to be true (positive literal) or false (negative literal) for the concept this video is a member of this category to be true. Of course, intermediate values in the range of [0,1] are permitted and the fuzzy inference follows the Lukasziewicz t-norm. By taking all the leaf nodes of a proof tree, a disjunction of conjunctions is formed. This corresponds to the fuzzy truth value of a single category. Following the chosen fuzzy inference rule, the requirement set by the selected threshold is satisfied when: 1. If the threshold is a lower limit: The Lukasziewicz conjunction of all the disjunctive terms has an equal or higher value compared to the threshold. 2. If the threshold is an upper limit: The Lukasziewicz conjunction of all the disjunctive terms has an equal or lower value compared to the threshold. Applying the logic in 1 or 2, a set of correspondingly disjunctive or conjunctive constraints are derived. These constraints are in the form of inequalities that the input variable must satisfy. The above described procedure is repeated for all the categories needed. However, note that only few categories contribute to this global constraint set since only the categories that are incorrectly activated or incorrectly deactivated, produce constraints. To form the global constraint expression for one variable we take the conjunction of all the constraints derived from the above categories since these constraints must be satisfied simultaneously. This constitutes the method for calculating the constraints for one input variable. The above procedure is repeated for all input variables, but it should be noted that not all inputs contribute overall constraints since some proofs might not involve the unbound variable instance. The goal is now to find the input vector that yields an acceptable output and at the same time has the smallest Euclidean distance from the original input. This is achieved by employing an iterative method. An initial acceptable input vector is found using a coarse search in the input space. Afterwards, the algorithm iterates through the elements in the input vector searching for alternate input vectors with higher proximity to the original input. The procedure terminates when no more optimization is possible. 4. Conclusions and Future Work In this work we have proposed a novel method for error back-propagation in a fuzzy Description Logic system. The method can calculate the input that is most approximate 6

to the real input of the fuzzy reasoner and, when fed to the fuzzy system, produces the correct output. We consider this minimum perturbation logic to define the actual target input for the fuzzy reasoner. The derived target input is then used as supervised training data for adaptive learning of the first-level classifiers. The method developed essentially permits the inclusion of human knowledge in any computational model, by simply adding the fuzzy reasoning subsystem above the computational model and forming a two-layer setup. The human expertise can be entered with much more expressive terms compared to the classic approaches of, for example, addition of if-then rules. This increased expressiveness, when seen from the computational theory perspective, means the capacity for construction of highly-complex global approximation functions. At its current state of development, the system does not take into account the possibility that the logic model of the second layer can be wrong, as this is taken to represent current expertise over the domain. However, this assumption can be relaxed and in further research we are planning to investigate a method for blame assignment between the two layers. In the new approach, we plan to provide corrective feedback to the logic subsystem as well, when appropriate. Such feedback can then be either forwarded to the domain expert for further refinement of the model, or used by a Machine Learning algorithm that will construct a new model. 5. Acknowledgments The work described here was supported by DELTIO, 1 a project funded by the Greek General Secretariat of Research & Technology. DELTIO researches using evolving ontologies to analyse multimedia content, with an application to news broadcasts. References [1] Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. [2] G. F. Corliss. Automatic Differentiation of Algorithms: From Simulation to Optimization. Springer, 2002. [3] Stasinos Konstantopoulos and Georgios Apostolikas. Fuzzy-dl reasoning over unknown fuzzy degrees. In Proc. International IFIP Workshop of Semantic Web and Web Semantics (IFIP-SWWS 07), Algarve, 29 30 Nov. 2007, 2007. [4] James L. McClelland and David E. Rumelhart. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. The MIT Press, Cambridge, Massachusetts, 1986. [5] Giorgos Stoilos, G. Stamou, V. Tzouvaras, J. Z. Pan, and Ian Horrocks. The fuzzy description logic f-shin. In Proc. of the International Workshop on Uncertainty Reasoning For the Semantic Web, 2005. [6] Umberto Straccia. Reasoning within fuzzy Description Logics. Journal of Artificial Intelligence Research, 14:137 166, 2001. [7] Paul John Werbos. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. Wiley, New York, 1994. [8] X. Yu, M. O. Efee, and O. Kaynak. A general back-propagation algorithm for feed-forward neural networks learning. IEEE Trans. Neural Networks, 13(1):251 254, 2002. [9] Lotfi A. Zadeh. Fuzzy sets. Information and Control, 8(3):338 53, 1965. 1 See also http://www.atc.gr/deltio/ 7