KI-2012: Poster and Demo Track

Transcription

1 KI-2012: Poster and Demo Track 35th German Conference on Artificial Intelligence September 24-27, 2012 Saarbrücken, Germany Stefan Wölfl (Ed.)

2 Contact address: Dr. Stefan Wölfl Institut für Informatik Albert-Ludwigs-Universität Freiburg Georges-Köhler-Allee Freiburg, Germany

3 Program Committee Christian Becker-Asano Christoph Beierle Maren Bennewitz Mehul Bhatt Boris Brandherm Lutz Frommberger Birte Glimm Björn Gottfried Sven Hellbach Oliver Kutz Marco Ragni Gabriele Röger Jürgen Sauer Stefan Schiffer Lars Schmidt-Thieme Angela Schwering Thora Tenbrink Ingo J. Timm Rudolph Triebel Stefan Wölfl Additional Reviewers Osman Akcatepe Rasoul Karimi Neelava Sengupta Acknowledgements Special thanks go to the members of the Program Committee for their careful reviews and helpful comments on the submitted contributions, to the Local Organization Team of KI-2012, coordinated by Boris Brandherm and Ralf Jung, for their support in organizing the Poster & Demo Session, and to Rebecca Albrecht for her help in compiling this collection of poster and demo papers. Stefan Wölfl iii

4 Table of Contents Invited Project Contributions 1 The Robot Head Flobi : A Research Platform for Cognitive Interaction Technology Sven Wachsmuth, Simon Schulz, Florian Lier, Frederic Siepmann, and Ingo Lütkebohle Assistance Robotics: A Survival Guide for Real World Scenarios Hans-Joachim Böhme, Sven Hellbach, Frank Bahrmann, Marc Donner, Johannes Fonfara, Marian Himstedt, Mathias Klingner, Peter Poschmann, Mathias Rudolph, and Richard Schmidt Dora: A Robot that Plans and Acts Under Uncertainty Moritz Göbelbecker, Marc Hanheide, Charles Gretton, Nick Hawes, Andrzej Pronobis, Alper Aydemir, Kristoffer Sjöö, and Hendrik Zender The Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition.. 17 Thomas Barkowsky and Julia Gantenberg Collaborative Process Assistant Towards a Context-sensitive Business Process Support Based on s Julian Krumeich, Thomas Burkhart, Dirk Werth, and Peter Loos uservice A Personalized and Situation-Aware Recommender System for User-Generated Mobile Services Alexandra Chapko, Andreas Emrich, Marc Gräßle, Dirk Werth, and Peter Loos ARGUMENTUM Towards Computer-Supported Analysis, Retrieval and Synthesis of Argumentation Structures in Humanities Using the Example of Jurisprudence Constantin Houy, Peter Fettke, Peter Loos, Iris Speiser, Maximilian Herberger, Alfred Gass, and Ulrich Nortmann Facetted Search on Extracted Fusion Tables Data for Digital Cities Jochen Setz, Gianluca Quercini, Daniel Sonntag, and Chantal Reynaud The ALIZ-E Project: Adaptive Strategies for Sustainable Long-Term Social Interaction The ALIZ-E project team PeerEnergyCloud Trading Renewable Energies Jochen Frey, Boris Brandherm, and Jörg Baus iv

5 Poster Contributions 47 Towards Augmenting Dialogue Strategy Management with Multimodal Sub-Symbolic Context Paul Baxter, Heriberto Cuayáhuitl, Rachel Wood, Ivana Kruijff-Korbayová, and Tony Belpaeme Eager Beaver A General Game Player André Doser, Florian Geißer, Philipp Lerche, and Tim Schulte Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm Markus Goldstein and Andreas Dengel A Concept of a Reliable Three-Layer Behaviour Control System for Cooperative Autonomous Robots Christian Rauch, Tim Köhler, Martin Schröer, Elmar Berghöfer, and Frank Kirchner Dataset Generation for Meta-Learning Matthias Reif, Faisal Shafait, and Andreas Dengel Meta 2 -Features: Providing Meta-Learners More Information Matthias Reif, Faisal Shafait, and Andreas Dengel Organizational Social Network Analysis Case Study in a Research Facility Wolfgang Schlauch, Darko Obradovic, and Andreas Dengel Object Recognition with Multicopters Falk Schmidsberger and Frieder Stolzenburg Semantically-enriched Electric Car Recharge Optimization Toolkit Mikhail Simonov, Antonio Attanasio, and Davide Luzio Ubiquitous Monitoring & Service Robots for Care Mikhail Simonov, Marco Bazzani, and Antonella Frisiello Citation Context Sentiment Analysis for Structured Summarization of Research Papers Niket Tandon and Ashish Jain Towards Robust Spontaneous Speech Recognition with Emotional Speech Adapted Acoustic Models Bogdan Vlasenko, Dmytro Prylipko, and Andreas Wendemuth Tool Support for Activity Recognition with Computational Causal Behaviour Models Kristina Yordanova, Frank Krüger, and Thomas Kirste v

6 Demo Contributions 113 Auto Classifier Explaining Customers a Machine-Learning Model Benjamin Adrian, Markus Ebbecke, and Sebastian Ebert Please Tell Me Where I Am: A Fundament for a Semantic Labeling Approach Frank Bahrmann, Sven Hellbach, and Hans-Joachim Böhme MAINSIM MultimodAl INnercity SIMulation Jörg Dallmeyer and Ingo J. Timm Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann A Conversational System for Multi-Session Child-Robot Interaction with Several Games Ivana Kruijff-Korbayová, Heriberto Cuayáhuitl, Bernd Kiefer, Stefania Racioppa, Piero Cosi, Giulio Paci, Giacomo Sommavilla, Fabio Tesser, Hichem Sahli, Georgios Athanasopoulos, Weiyi Wang, Valentin Enescu, Werner Verhelst, Lola Cañamero, Aryel Beck, Antoine Hiolle, Raquel Ros Espinoza, and Yiannis Demiris Making Virtual Pancakes Acquiring and Analyzing Data of Everyday Manipulation Tasks through Interactive Physics-based Simulations Lars Kunze, Andrei Haidu, and Michael Beetz BendIT An Interactive Game with two Robots Tim Niemueller, Stefan Schiffer, Albert Helligrath, Safoura Rezapour Lakani, and Gerhard Lakemeyer robocd: Robotic Order Cups Demo An Interactive Domestic Service Robotics Demo Stefan Schiffer, Tobias Baumgartner, Daniel Beck, Bahram Maleki-Fard, Tim Niemüller, Christoph Schwering, and Gerhard Lakemeyer Using a Discourse and Dialogue Infrastructure for Collaborative Radiology Daniel Sonntag and Christian Schulz Author Index 160 vi

7 Invited Project Contributions

8

9 The Robot Head Flobi : A Research Platform for Cognitive Interaction Technology Sven Wachsmuth, Simon Schulz, Florian Lier, Frederic Siepmann, and Ingo Lütkebohle Center of Excellence Cognitive Interaction Technology (CITEC), Bielefeld University, Universitätsstr. 25, Bielefeld, Germany {swachsmu,sschulz,flier,fsiepmann,iluetkeb}@cit-ec.uni-bielefeld.de Abstract. Founded on a vision of a human-friendly technology that adapts to users needs and is easy und intuitive for ordinary people to use, CITEC has established an exciting new field: Cognitive Interaction Technology. It aims to elucidate the principles and mechanisms of cognition in order to find ways of replicating them in technology and thus enable a new deep level of service and assistance. In order to proceed in this highly interdisciplinary field, appropriate research platforms and infrastructure are needed. The anthropomorphic robot head Flobi combines state-of-the-art sensing functionality with an exterior that elicits a sympathetic emotion response. In order to support several lines of research and at the same time ensure the maintainability of the software and hardware components, a virtual realization of the Flobi head has been proposed that allows an efficient prototyping, systematic testing, and software development in a continuous integration framework. Keywords: Human-Robot Interaction, Demonstrator Engineering 1 Introduction Classic AI is very much focussing on the modeling of a rational mind including agents that rationally react on environmental changes or human actions. Starting from these insights many systems have been constructed that show intelligent behavior and are intelligently interacting with humans. However, the general approach does not care if the behavior is realized in a text-based dialogue, a virtual character, or a physical robot it concentrates on the modeling of the mind in the first place. The field of Cognitive Interaction Technology takes a different approach placing the interaction that takes place in the physical world in the first place. If we understand how this interaction is shaped, what ascriptions and expectations between interlocutors are provoked by which factors, which processes initiate, maintain, and re-establish an interaction over time, then cognitive processes can 0 This work has been partially funded by the German Research Foundation (EC277). S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp. 3-7, The Authors, 2012

10 4 S. Wachsmuth, S. Schulz, F. Lier, F. Siepmann, and I. Lütkebohle be more easily modeled that replicate the smooth interaction between humans for human-machine interfaces. At CITEC 1 groups from informatics, biology, linguistics, psychology, and sport science are working together along these lines in an interdisciplinary research environment. The research is rooted in a number of research platforms and will be exemplified in the following for the anthropomorphic head Flobi. Flobi is a robotic head that integrates powerful sensing with social expressiveness [13]. The scientific insight behind it is that the social impact of a robot cannot be devided from its function, i.e. a function like the motion of a directed sensor is interpreted socially as well as an anthropomorphic face raises expectations about possible functions and means of interaction. So far, most robot heads reported in the literature either focused on active sensing capabilities or on social expressiveness. Examples of the first category are POP- EYE [7], Cog [5], or the Karlsruhe Humanoid Head [1]. The second category includes robots like Kismet [4], icub [3], WE-4RII [17], or Nexi [11]. These also integrate several sensor capabilities but still have limitations in providing less degrees of freedom and lower velocities, in having a bigger size or holes in the exterior design, or in seperating camera sensors from the eyes. Thus, any current hardware solution is a compromise between both requirements: being functional and socially expressive. This has consequences for research that experimentally deals with human-robot interaction. The Flobi robot head that is developed at Bielefeld University is a very good compromise in this field providing the right basis for experimental research. Having interaction in the first place, it is essential to start with a physical platform because as discussed before its presence and limitations already shapes and influences the interaction between agents. Nevertheless, a physical platform limits the testability and accessability for experimental studies and system development. On the one hand, new algorithms should be tested first on a simulation model due to safety reasons. On the other hand, the accessability of a simulation model increases the available testing time making also automated offline test cycles possible, as e.g. required in continous integration (CI) approaches [8]. Therefore, a simulated robot model has been implemented for Flobi that is coupled with an integrated development environment [12]. As already notices in, e.g., the USARSim approach [6], there are typically large differences between the actual robot and its corresponding simulation [14], so that a validation step needs to be integrated into the development cycle. Therefore, in our approach special care has been taken at considering the physical limitations and certain limits of the platform dynamics by using an iterative validation approach that considers the actual configuration and control algorithms. As a result, the simulated robot model and integration approach provides a strong extension of the tool chain leading to efficient research on interaction analysis, modeling, and evaluation.

11 The Anthropomorphic Robot Head "Flobi" 5 Fig. 1. The robot head Flobi showing emotional and gender variation. 2 The Anthropomorphic Robot Head Flobi To achieve understandability, all previous social robots have used an exterior that alludes to what people already know. We anthopomorphize because it allows us to maximally approach human-like communication and increases human speculation about the robots intentions. The robot has been designed to use a comic-like human face (Fig. 1) in order to avoid unwanted reactions as predicted by the uncanny valley hypothesis [2]. Other requirements, like that there should not be any holes in the mask, lead to the development of an innovative magnetic actuation mechanism [13]. The hair part, eyebrows, lips and frontal face part are easily replacable in order to experiment with different character features. The platform features stereo vision, stereo audio and a gyroscope for motion compensation. The cameras are actuated and support at least 400/s saccade speed. Velocity control is provided by custom design motor control boards implementing the necessary control algorithms and real-time support. The complete robot head has 18 degrees-of-freedom (DoF) [13]. 3 Iterative Model Validation While the concept of Continuous Integration (CI) is well established in common software development processes, CI practices have only received little attention in robotics so far. One reason might be the typical deviations between the physical robot and its simulation in terms of the control interfaces as well as movement 1 CITEC - Center of Excellence Cognitive Interaction Technology. Fig. 2. Concept setup for the iterative validation approach [12].

12 6 S. Wachsmuth, S. Schulz, F. Lier, F. Siepmann, and I. Lütkebohle dynamics and physical limitations. A second problem is that this match need not be established only once, but be maintained over the course of development of the robot, i.e. any change in calibration, model updates, manufactoring variation, firmware updates and control algorithms affect this match. With respect to these problems, we introduce a real robot into our CI setup for interative testing and validation of simulated robot models (Fig. 2). The robot model is based on MORSE [9]. The robot provides an instance for comparison with the simulated robot model as a motion generator component sends movement commands to a control server which actuates the physical and virtual robot [12]. Thus, the differences between the model and the real robot are tested on every change of control software or configuration using internal and external data capture support (like simulation, control and video logger). The movement profile that is tested with the robot head has been recorded from a real movement of a human head using motion capture techniques. 4 Flobi in the research cycle Research using the Flobi platform is conducted on different levels. Hegel and Eyssel studied different appearances of the robot and what people associate with them [10]. Therefore, it was essential that people got the impression that they rate a physically existing platform. On the hardware side, Schulz et al. proposed a new mechanical construction of the robot eye [16]. Here, the robot simulation provides possibilities for prototyping experiments. On the software side, Lütkebohle et al. used Flobi in an interactive object learning task looking at multi-modal dialog [15]. They provide a complete system evaluation approach using the actual physical platform in a human-robot interaction study. Doing parallel research on the hardware, control, and system level especially requires an interative validation approach for the simulated robot model. 5 Conclusion Research on human-robot interaction aims at the construction of artificial systems that smoothly interact with ordinary people in a human-style fashion. Thus, it directly contributes to the core of artificial intelligence. However, the research focus is on the interaction rather than the robot mind. This requires that the physical instance of a robotic system is not the end product but start of a new research cycle. In this paper, we have proposed how this process can be governed by appropriate research tools starting from a robot head with a modular exterior to a robot simulation model that is tightly integrated into the software development process using a continuous integration approach. The proposed strategy enables the decoupling of interdisciplinary research without loosing the grounding in a physical embodiment.

13 The Anthropomorphic Robot Head "Flobi" 7 References 1. T. Asfour, K. Welke, P. Azad, A. Ude, and R. Dillmann. The karlsruhe humanoid head. In Humanoid Robots, Humanoids th IEEE-RAS International Conference on, pages , February C. Bartneck, T. Kanda, H. Ishiguro, and N. Hagita. Is the uncanny valley an uncanny cliff? In in: Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pages , R. Beira, M. Lopes, M. Praga, J. Santos-Victor, A. Bernardino, G. Metta, F. Becchi, and R. Saltaren. Design of the robot-cub (icub) head. In Robotics and Automation, ICRA Proceedings 2006 IEEE International Conference on, pages , June C. Breazeal. Toward sociable robots. Elsevier Science B.V., R. Brooks, C. Breazeal, M. Marjanovic, B. Scassellati, and M. Williamson. The cog project: Building a humanoid robot. Computation for Metaphors, Analogy and Agents, 1562:8 13, S. Carpin, M. Lewis, J. Wang, S. Balakirsky, and C. Scrapper. USARSim: a robot simulator for research and education. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pages IEEE, Apr H. Christensen, J. Barker, Y.-C. Lu, J. Xavier, R. Caseiro, and H. Ara fajo. POPeye: Real-time, binaural sound source localisation on an audio-visual robot-head. In Conference on Natural Computing and Intelligent Robotics, P. Duvall, M. Steve, and A. Glover. Continuous integration: improving software quality and reducing risk. Addison-Wesley Professional, first edition, G. Echeverria, L. N., D. A., and L. S. Modular openrobots simulation engine: Morse. In Proceedings of the IEEE ICRA, F. Eyssel and F. Hegel. (s)he s got the look: Gender stereotyping of robots. Journal of Applied Social Psychology, In Press. 11. M. M. Lab. MDS Head & Face. Accessed 10th September F. Lier and I. Lütkebohle. Continuous integration for iterative validation of simulated robot models. In 2012 Int. Conf. Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), Tsukuba, Japan, 05/11/ I. Lütkebohle, F. Hegel, S. Schulz, M. Hackel, B. Wrede, S. Wachsmuth, and G. Sagerer. The bielefeld anthropomorphic robot head flobi. In Robotics and Automation (ICRA), 2010 IEEE Int. Conf. on, pages , may S. Okamoto, K. Kurose, S. Saga, K. Ohno, and S. Tadokoro. Validation of Simulated Robots with Realistically Modeled Dimensions and Mass in USARSim. In 2008 IEEE International Workshop on Safety, Security and Rescue Robotics, pages IEEE, Oct J. Peltason, N. Riether, B. Wrede, and I. Lütkebohle. Talking with robots about objects: A system-level evaluation in hri. In 7th ACM/IEEE Conference on Human- Robot-Interaction, Boston, Masschusetts, USA, S. Schulz, I. Lütkebohle, and S. Wachsmuth. An affordable, 3d-printable camera eye with two active degrees of freedom for an anthropomorphic robot. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Vila Moura, Portugal, July M. Zecca, S. Roccella, M. C. Carrozza, H. Miwa, K. Itoh, G. Cappiello, J. J. Cabibihan, M. Matsumoto, H. Takanobu, P. Dario, and A. Takanishi. On the development of the emotion expression humanoid robot WE-4RII with RCH-1. In Humanoid Robots, th IEEE/RAS International Conference on, volume 1, pages Vol. 1, June 2005.

14 Assistance Robotics: A Survival Guide for Real World Scenarios Hans-Joachim Böhme, Sven Hellbach, Frank Bahrmann, Marc Donner, Johannes Fonfara, Marian Himstedt, Mathias Klingner, Peter Poschmann, Mathias Rudolph, and Richard Schmidt University of Applied Sciences Dresden, Artificial Intelligence Lab boehme@htw-dresden.de 1 Introduction The research goal of the Artificial Intelligence Lab at the University of Applied Sciences Dresden is the development of intelligent and interactive mobile service and assistance systems. For the application of such systems, different scenarios are imaginable, like guidance and information systems within public institutions (museums, airports, train stations, etc.), assistance or supporting systems in markets (do-it-yourself store, shopping malls,etc.), systems to support the elderly in home environments, and systems for industrial tasks (mobile measuring and surveillance systems). For the current research work the robot, which is depicted in Fig. 1, is used as a guidance system in a museum in the city of Dresden (Technische Sammlungen Dresden). The research platform is a commercially available SCITOS G5 equipped with two laser range finders, an omnidirectional camera, a depth camera, Fig. 1. One of our robots. a ring of sonar sensors, and a head with two eyes that has no further sensor capabilities, but is used for interaction purposes. The robot is meant to serve as companion or tutor, which guides through the exhibition and provides background information to the exhibits. To start the dialog with the visitors of the exhibition, the robot possesses a broad spectrum of multimodal abilities or channels, like language, gestures, a touch screen and multimedia presentations. Furthermore, the system is meant to be able to adapt the way those different channels are combined during the dialog to the specific needs and preferences of its interaction partner. In addition to the research and development of a dialog system, our robot needs to be able to navigate the exhibition autonomously. However, our intention is not to provide help by augmenting the environment with additional sensors. Instead, the robot is equipped with a number of sensors as already mentioned. Only these sensors are used to generate an inner map of the robots surroundings. This work was supported in part by ESF grands number and , as well as by SMWK grand /5 S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp. 8-11, The Authors, 2012

15 Assistance Robotics 9 In particular in an environment like the museum, which contains irreplaceable exhibits, a reliable, collision-free navigation is essential. A further difficulty for the navigation task are visitors, which in fact are dynamical obstacles that need to be avoided. The obstacle avoidance should happen in a socially acceptable way, so that the visitors are not frustrated or even frightened. 2 Ongoing Research Work For our application, a lot of subsystems have to work in conjunction, in particular navigation (mapping, localization, obstacle avoidance) and interaction (people detection and tracking, speech recognition). The remainder of this paper gives a short overview of the researched and applied system components. Navigation and Obstacle Avoidance: For being able to navigate autonomously within the exhibition, we use a navigation system based on Monte Carlo Localization using a laser range finder that is supported by a 3D obstacle segmentation system. Our navigation strategy is based on Fox et al. [1], who introduced the Dynamic Window Approach (DWA) for collision avoidance that became very popular over the last decade. Further details on our navigation system can be found in [2,3,4]. 3D obstacle segmentation: The museum offers a challenging environment for the application of an autonomous mobile robot platform. Using only a laser range finder for navigation would not allow to detect each and every obstacle. That is why we complement the laser detection with data from a depth camera. Our obstacle detection approach consists of two steps. The parameters of the ground plane are estimated, followed by a segmentation of the depth values into those belonging to the floor plane and those that represent obstacles. The fact that our depth camera is mounted on top of the robot leads to undesired pitch movements. To compensate this effect, we correct the plane parameters in each incoming frame. A more elaborated description of this process can be found in [2,5]. Autonomous awareness behavior: To support the human-robot interaction, the robot should show an observable awareness behavior, e.g. move the head towards its interaction partners and give them the feeling that the robot is listening. Hence, they are encouraged to address the robot in a natural way. In order to do so, the robot has to perceive the persons within its vicinity in the first place, for which we need a people tracking system. We use a tracking-by-detection approach, where a number of sensor cues detects people and passes those detections to the Kalman-filter-based tracker. Details of the people tracking approach can be found in [6]. To give surrounding people the impression that the robot is aware of them, the robot turns its head and it appears as it would look at them. In order to fortify this impression of being looked at, the robot changes the person he looks at every ten seconds. The person to look at is chosen at random, but not each person has the same probability of being chosen. The probability is computed from its distance to the robot, the necessary head movement and the time since

16 10 UAS Dresden, Artificial Intelligence Lab the person was looked at before. Therefore, the robot prefers nearby persons, little head movement, and persons it has not looked at for a while or at all. Dialog system and Wizard-of-Oz: We believe that having a spoken dialog with museum visitors is an important aspect of a tour guide robot. It attracts people and allows to demonstrate an intelligent system. One reason why today s guide robots still lack complex dialog capabilities, is that speech recognition is a major unresolved problem. Developing such a dialog system under real world conditions is a challenging task. An unfinished system would leave the visitors unsatisfied or even frustrated. Hence, we have decided to create the illusion for the visitor, that the robot is already able to talk to them in a natural way. This is achieved by replacing the speech recognition system by a human operator using a Wizard of Oz method. With this idea, high level parts of the dialog system can then be evaluated, and other subsystems can be tested under real world conditions without annoying visitors too much by a highly experimental or even malfunctioning system. For Wizard of Oz inspired experiments a hidden operator controls the robot s dialog system. In order to allow the operator to still react to the visitors, the images from the omnidirectional camera as well as an audio stream are transferred to the operator s laptop. To select the available phrases, different parts of the dialog were defined. For all of those parts, answers or reactions to possible questions or situations were assembled beforehand. For this task, a GUI has been developed that allows the operator to select pre-defined phrases or text fragments. It is not possible to enter arbitrary text. Furthermore, the operator can select the robot s target position and initiate the driving mode, which is completely autonomously. While the robot navigates, his head is facing in the direction of movement. This allows persons walking towards the robot to guess the robot s intention to pass. Such a behavior is one of the many subtle steps towards a socially acceptable navigation. During the dialogs, the head of the robot is facing the dialog partners. From our first experiments with a real audience we derived first corner stones for an automated dialog system. In [2] these ideas and aspects are gathered and discussed. Automated Prosody generation: As the dialogs themselves are held in natural human language, real time speech recognition and generation are crucial. So far, external software solutions (Loquento S.p.A.) are employed for those tasks. An evaluation of Text-To-Speech (TTS) software showed a general lack of the ability to generate speech with an entertaining and natural sounding prosody, which is of particular interest for long, tentatively boring information. Therefore a focus of our research is the automatic labeling of text with prosodic information, applying relevance learning algorithms [7]. Speaker Localization: To capture the acoustic environment, a special microphone array, consisting of four directional microphones, is used. A blind source separation approach utilizing the special microphone array characteristics [8] currently gets implemented into our framework. Furthermore, the characteristics of the microphone array allows the localization of the sound source. This possibility leads to a potential further cue for our people tracker.

17 Assistance Robotics 11 SOM based upper body pose estimation: For our work on motion classification we apply an enhancement of the approach presented in [9]. It relies on the data of a depth camera from which a Self-Organizing Map (SOM) is extracted to model the human upper body. Crucial in that context is the correct assignment of the SOM neurons to a specific region of the tracked person s upper body. Various sources of error and noise lead to the situation that sometimes neurons migrate from one part of the upper body to another. Without a verification of the SOM a future subsequent classification of the pose will produce incorrect results and may lead to wrong interpretations of the actual situation. Therefore we extended the approach in [9] by reshaping the trained SOM to a skeleton model to estimate the anatomical correctness of the pose. Having generated the skeleton model, incorrect Self-Organizing Maps will be rejected if the subsequent verification fails. The further goal is to eliminate this problem by integrating adaptive metrics [7]. A deeper insight in our approach can be gained in [10]. 3 Summary In this paper summary of the current field of work of the AI Lab at the UAS Dresden was presented. As it could be seen, for the moment, the focus of our research efforts is set to a museum s tour guide robot. However, our future plans are to concentrate on the idea of assisted living for the elderly, as well. References 1. Fox, D., Burgard, W., Thrun, S.: The dynamic window approach to collision avoidance. IEEE Robot. Automat. Mag. 4(1) (March 1997) Poschmann, P., Donner, M., Bahrmann, F., Rudolph, M., Fonfara, J., Hellbach, S., Böhme, H.J.: Wizard of Oz revisited: Researching on a tour guide robot while being faced with the public. In: RO-MAN. (2012) In press. 3. Himstedt, M., Hellbach, S., Böhme, H.J.: Feature extraction from Occupancy Grid Maps using Non-negative Matrix Factorization. In: DAGM NC2. (2012) In press. 4. Bahrmann, F., Hellbach, S., Böhme, H.J.: Please tell me where I am: A fundament for a semantic labeling approach. In: Poster and Demo Track KI (2012) 5. Donner, M., Poschmann, P., Klingner, M., Bahrmann, F., Hellbach, S., Böhme, H.J.: Obstacle detection for robust local navigation through dynamic real-world environments. In: IROS. (2012) in press. 6. Poschmann, P., Hellbach, S., Böhme, H.J.: Multi-modal people tracking for an awareness behavior of an interactive tour-guide robot. In: ICIRA. (2012) in press. 7. Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Networks 15 (2002) Gunel, B., Hachabiboglu, H., Kondoz, A.: Acoustic source separation of convolutive mixtures based on intensity vector statistics. Trans. on ASL 16(4) (2008) Haker, M., Böhme, M., Martinetz, T., Barth, E.: Self-organizing maps for pose estimation with a time-of-flight camera. In: DAGM. (2009) Klingner, M., Hellbach, S., Kästner, M., Villmann, T., Böhme, H.J.: Modeling Human Movements with Self-Organizing Maps using Adaptive Metrics. In: DAGM NC2, Graz (AT) (2012) In press.

18 Dora: A Robot that Plans and Acts Under Uncertainty M. Göbelbecker 1, M. Hanheide 2, C. Gretton 3, N. Hawes 4, A. Pronobis, A. Aydemir, K. Sjöö 5, and H. Zender 6 1 Albert-Ludwigs-Universität Freiburg, Germany 2 University of Lincoln, UK 3 NICTA, Australia 4 University of Birminham, UK 5 Royal Institute of Technology (KTH), Stockholm, Sweden 6 DFKI GmbH, Saarbrücken, Germany Abstract. Dealing with uncertainty is one of the major challenges when constructing autonomous mobile robots. The CogX project addressed key aspects of that by developing and implementing mechanisms for selfunderstanding and self-extension i.e. awareness of gaps in knowledge, and the ability to reason and act to fill those gaps. We discuss our robot called Dora, a showcase outcome of that project. Dora is able to perform a variety of search tasks in unexplored environments. One of the results of the project is the Dora robot, that can perform a variety of search tasks in unexplored environments by exploiting probabilistic knowledge representations while retaining efficiency by using a fast planning system. 1 Introduction The mission statement of the CogX project is [...] to develop a unified theory of self-understanding and self-extension with a convincing instantiation and implementation of this theory in a robot. By self-understanding we mean that the robot has representations of gaps in its knowledge or uncertainty in its beliefs. By self-extension we mean the ability of the robot to extend its own abilities or knowledge by planning learning activities and carrying them out. 7 The Dora robot is one instantiation of this theory, demonstrating self-extension in a task-driven way, i.e. ultimately it is driven to satisfy goals given by a user. Dora can perform a variety of object search and exploration tasks in a dynamic real-world environments. To achieve this, we have contributed to three areas of research: The research reported here was performed in the EU FP7 IP CogX: Cognitive Systems that Self-Understand and Self-Extend (ICT ). 7 S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

19 Dora: A Robot that Plans and Acts Under Uncertainty 13 Reliability in the presence of an uncertain environment requires a probabilistic representation of the world and taking these probabilities into account for decision making. Our system uses probabilistic background knowledge to perform tasks faster in the common case while still being able to perform in unlikely situations. Flexibility of tasks is provided by using a domain-independent planner. This has a number of important benefits in our setting. First, it means Dora can be given a variety of goals e.g., search for one or more specific objects, classify a space, explore a space, etc. Second, as dialogue and sensing capabilities are added to the robot, the planner is able to plan for those without any modification to its architecture. To operate in an open world, the planner needs to be able to reason about possible gaps in the robot s knowledge and on how to fill those gaps. In the following section we will give a short overview on related work and then present the two central components of the system. A more comprehensive description of many aspects of Dora is provided in an earlier paper by Hanheide et al. [1]. 2 Related Work A number of autonomous robot systems have been developed that use domain independent planning as their primary decision making component [2, 3], some of them also operating in open worlds. We are, however, not aware of any robot system that uses probabilistic models and domain independent planning for highlevel decision making. A number of systems have been developed specifically for the task of object search [4], but they usually treat the problems as purely geometric, not leveraging semantic information or additional sources of information, such as dialogue with a human. The system that comes closest to ours [5] uses decision theoretic planning to locate objects in large environments but requires a pre-build and annotated map. 3 The Dora System The Dora system architecture is based on PECAS [6], consisting of various components which are grouped into subarchitectures. For this abstract, we would like to focus on two of those that are central to decision making: The Conceptual Map, that represents and updates the probabilistic knowledge of the system and the Planner, which uses this information to decide which actions to perform. Some other subarchitectures, on which we will not expand, are the spatial subarchitecture which is responsible for navigation and mapping, vision for object detection and dialogue for interaction with humans. All these subarchitectures can provide state information directly to the planner, or to the conceptual map to integrate with prior knowledge.

20 14 M. Göbelbecker et al. Fig. 1. A visualisation of the entities contained in the conceptual map. Sensing processes (at the bottom) transform modal sensor input into instances (shown as ellipses) and acquired relations with either other instances or known concepts (shown as boxes). Using acquired and predefined relations, the conceptual map can then infer further relations which is then used by the planning system. 3.1 Conceptual Map The conceptual map [7], illustrated in Figure 1, uses probabilistic common sense knowledge to make connections between concepts, such as room categories and the existence of certain objects. For example, the knowledge that cereal boxes are likely to be found in kitchens can (a) help in classifying the category (e.g. kitchen) of a room in which cereals are seen, and (b) help the robot prioritise a search for cereal if a room is likely to be a kitchen. These relations were initially obtained from the Open Mind Indoor Common Sense database 8 and quantified by queries to an online image search engine [8, 9]. 3.2 Planning Our system uses a domain independent planner that combines decision theoretic reasoning with fast classical continual planning [10]. For high level decision making, which we call sequential sessions, it operates according to the continual 8

21 Dora: A Robot that Plans and Acts Under Uncertainty 15 planning principle: It does not create a full plan for all contingencies but computes one serial plan and monitors its execution, replanning if the robot ends up in a state that makes the rest of the plan invalid. While the classical planner used in this step is inherently deterministic, is can take probabilities into account by making assumptions (such as room 0 may be a kitchen ). Less likely assumptions lead to higher costs of the resulting plan, thus leading to plans that tend to rely on more likely facts. The replanning method employed in the sequential session makes it easy to support open worlds, in which entities (such as rooms or objects) can appear and disappear from the environment. To reason about finding new objects in the planner, we add a number of virtual objects to the initial planning state which the planner can instantiate and use later on [11]. For example, if the only room known to the robot has a low likelihood of containing a cereal box, it may decide to explore unknown space, hoping to eventually find a room where finding cereals is more likely (e.g. a kitchen). For problems that involve uncertainty and (possibly noisy) sensing, it is usually more appropriate to model it as a partially observable Markov decision process (POMDP). POMDPs allow accurate modelling noisy sensing, such as the effect of false positive and false negative rates in object detectors. As solving large-scale POMDPs is infeasible, our planner tries to identify the parts of the sequential plan in which taking these effects into account is important, and uses a separate, decision theoretic planner to solve these subproblems. 4 Conclusion We presented a robot system that approaches the problem of acting and reasoning in dynamic, open and uncertain worlds by integrating two approaches: the conceptual map that integrates uncertain observations with probabilistic conceptual knowledge and a fast continual planning system that can exploit these representations to efficiently find plans to solve the given tasks. The resulting system can operate autonomously in unknown environments while still being able to solve tasks quickly. References 1. Hanheide, M., Gretton, C., Dearden, R., Hawes, N., Wyatt, J., Pronobis, A., Aydemir, A., Goebelbecke, M., Zender, H.: Exploiting probabilistic knowledge under uncertain sensing for efficient robot behaviour. In: Proceedings of the Twenty- Second International Joint Conference on Artificial Intelligence (IJCAI 2011). (2011) 2. Talamadupula, K., Benton, J., Kambhampati, S., Schermerhorn, P., Scheutz, M.: Planning for human-robot teaming in open worlds. ACM Trans. Intell. Syst. Technol. 1 (December 2010) 14:1 14:24 3. Kraft, D., Başeski, E., Popović, M., Batog, A.M., Kjær-Nielsen, A., Krüger, N., Petrick, R., Geib, C., Pugeault, N., Steedman, M., Asfour, T., Dillmann, R., Kalkan,

22 16 M. Göbelbecker et al. S., Wörgötter, F., Hommel, B., Detry, R., Piater, J.: Exploration and planning in a three-level cognitive architecture. In: CogSys. (2008) 4. Shubina, K., Tsotsos, J.: Visual search for an object in a 3D environment using a mobile robot. Computer Vision and Image Understanding 114(5) (2010) Kunze, L., Beetz, M., Saito, M., Azuma, H., Okada, K., Inaba, M.: Searching objects in large-scale indoor environments: A decision-thereotic approach. In: IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA (May ) 6. Hawes, N., Brenner, M., Sjöö, K.: Planning as an architectural control mechanism. In: HRI 09: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, New York, NY, USA, ACM (2009) Pronobis, A., Mozos, O.M., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. 29(2-3) (February 2010) Zhou, K., Zillich, M., Zender, H., Vincze, M.: Web mining driven object locality knowledge acquisition for efficient robot behavior. In: Proc. Int. Conf. Intelligent Robots and Systems (IROS). (to appear) 9. Zender, H.: Multi-layered conceptual spatial mapping representing spatial knowledge for situated action and human-robot interaction. In Amirat, Y., Chibani, A., Zarri, G.P., eds.: Bridges Between the Methodological and Practical Work of the Robotics and Cognitive Systems Communities From Sensors to Concepts. Intelligent Systems Reference Library. Springer Verlag, Berlin/Heidelberg, Germany (to appear) 10. Göbelbecker, M., Gretton, C., Dearden, R.: A switching planner for combined task and observation planning. In: Twenty-Fifth Conference on Artificial Intelligence (AAAI-11). (August 2011) 11. Aydemir, A., Göbelbecker, M., Pronobis, A., Sjöö, K., Jensfelt, P.: Plan-based object search and exploration using semantic spatial knowledge in the real world. In: Proc. of the European Conference on Mobile Robotics (ECMR 2011), Örebro, Sweden (sep 2011)

23 The Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition Thomas Barkowsky and Julia Gantenberg University of Bremen, SFB/TR 8 Spatial Cognition, Germany {barkowsky,gantenberg}@sfbtr8.uni-bremen.de Abstract. The SFB/TR 8 pursues interdisciplinary research in Spatial Cognition. In twenty projects, computer scientists, psychologists, and linguists collaborate in basic research as well as in application-oriented development. We present a selection of projects to illustrate the variety of research issues addressed within this research center. Keywords: Spatial Cognition, Interdisciplinary Research, Autonomous Spatial Agents, Spatial Language, Social Robotics, Design Assistance 1 Overview The Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition: Reasoning, Action, Interaction is concerned with basic research related to intelligent spatial information processing based on cognitive principles as well as with spatial task assistance for variable environments. The overall goal of the SFB/TR 8 is the integration of competence for reasoning about space, for acting in space intelligently, and for interacting in spatial environments. The SFB/TR 8 is located at the Universities of Bremen and Freiburg. It involves approximately 70 scientists from various disciplines like informatics, cognitive science, psychology, and linguistics. The SFB/TR 8 started in It is funded by the German Research Foundation (DFG) for an overall duration of twelve years. The conceptual basis of the SFB/TR 8 is the hypothesis that cognitive agents i.e., humans, animals, robots, or computer programs apprehend their spatial environments through (1) mental or computational operations (e.g., association and reasoning); (2) perception and action in space; and (3) communication in or about space and other forms of interaction. In all cases, spatial structures are interpreted and computationally transformed into new structures; the new structures reflect insights about spatial situations. They form the basis for further reasoning processes, for actions in the spatial environment or on external representations (for instance diagrams or maps), and for the interaction with other agents. The projects of the SFB/TR 8 investigate cognitive agents in spatial environments. Several projects address the question how cognitive agents can assist one another in solving spatial tasks such as reasoning about space, map comprehension, navigation, S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

24 18 T. Barkowsky and J. Gantenberg and understanding and evaluating actions in space. The SFB/TR 8 also studies how to communicate about space using language and map-like representations to enable this assistance. The research is concerned with mental processes and structures underlying behavior in large-scale space, environmental space, vista space, and tabletop space environments. Solving spatial tasks in these environments requires adequate representation structures and processing capabilities as well as locomotion, navigation, or the movement of physical or mental objects. The projects in the SFB/TR 8 are structured into three research areas: The research area Reasoning is concerned with investigations in mental and computational spatial processing, the research area Action is concerned with spatial tasks for autonomous robots and bio-inspired agents, and the research area Interaction deals with ontologies and representations of spatial descriptions. The three research areas are strongly interconnected. Currently, 16 DFG-reviewed research projects including a Junior Research Group, an integrated Research Training Group, and three strategic projects established by the SFB/TR 8 board are pursued in the SFB/TR 8. The SFB/TR 8 also forms the nucleus of the International Quality Network (IQN) on Spatial Cognition that connects researchers from more than 30 universities. The IQN researchers are frequent visitors at University of Bremen and enhance the scientific exchange in the area of spatial cognition. We will exemplarily present four research projects carried out in the SFB/TR 8 to illustrate the variety of research issues addressed within the research center. 2 Project ActionSpace: Bio-Inspired Self-Localization Determining one s position within a spatial environment is a basic competence of biological as well as artificial agents. The project ActionSpace in the SFB/TR 8 develops a bio-inspired architecture for self-localization, which is based on basic neurobiological research and on specific behavioral experiments, which are designed and conducted within the project. By using physically impossible virtual worlds implemented in VR environments, researchers in this project found that human subjects have no problem in dealing with global spatial inconsistencies regarding geometry and even the general topological layout of a spatial environment. This supports their hypothesis that the primary representation underlying human self-localization is not an integrated image-like one, but rather of a more local, sensorimotor nature. Thus, in the architecture developed in the project, bottom-up sensory features and motor data are combined in a sensorimotor representation of the spatial environment. This sensorimotor representation is integrated with a knowledge-based top-down mechanism, which controls the active exploratory perception of a scene according to the principle of maximum information gain. The architecture can operate on different sensorimotor levels, for instance movements of the agent in space and the control of eye movements. Currently, the architecture is implemented in two systems, a mobile robot and a virtual agent capable of navigating in ordinary as well as physically impossible virtual environments.

25 SFB/TR 8 Spatial Cognition 19 3 Project NavTalk: Spatial Inference in Navigation and Language The project NavTalk addresses the ways in which humans make inferences about spatial relationships during complex navigation tasks. For this purpose, wayfinding behavior and linguistic data are investigated both in indoor and in outdoor environments. We know that adult humans navigating in buildings and street networks do not perceive their surroundings with a blank mind; rather, previous experience leads to systematic expectations both about the structure of certain types of environment, and the options for navigating in them. Such mental presuppositions supplement the information that wayfinders actually receive via perceiving the real world, and via maps, signs, and linguistic descriptions. As a consequence, inference processes (which may be partly subconscious as well as probabilistic) support the wayfinders task of navigating in partially unknown environments; the incomplete spatial information is extended by standard expectations together with spatial reasoning. Together, these processes add up to the development of a (partial) cognitive map, guiding the wayfinder s subsequent navigation decisions. This procedure may be encouraged or discouraged by particular environmental features or types of information provided to the wayfinder. The aim in this project is to gain a better understanding of these spatial inference processes by investigating human behavior and linguistic representations given different types of tasks, environments, and verbal and visual route information. Naturalistic navigation experiments address inferences made between floors in multi-level buildings with respect to local route choices and detour planning, pragmatic solutions found across different tasks, inference and reasoning processes in inconsistent (virtual) environments or those distorted by disaster, and the impact of varying types of input in the form of language-based route descriptions and maps. The behavioral results of these navigation experiments are analyzed in tandem with language data collected via think-aloud protocols and retrospective reports. In addition to a basic content analysis, also the structural features of the language data are investigated, which are expected to be systematically related to the underlying conceptual phenomena. The systematic combination of linguistic and behavioral analysis will lead to a better understanding of the cognitive processes involved in wayfinding tasks with incomplete knowledge. 4 Project SocialSpace: Making Robots Socially Compatible Next-generation robots will work closely with humans as companions at home, as caretakers for the elderly, in the form of intelligent cars, or as helpers in the service industry. Their tasks require advanced social and cognitive skills to effectively interact and cooperate with humans. It seems plausible that the key to long-term acceptance and to utility of robots is their ability to perceive, understand, learn, and reproduce human social behavior. Thus, the project SocialSpace investigates how

26 20 T. Barkowsky and J. Gantenberg robots can learn to become socially more compatible, for instance by learning human social behavior by observation and imitation. Another aspect investigated in the project addresses the question how robots may be able to recognize intentions and goals of humans to naturally blend into the human s activities they are expected to support. So a core question may be what are the cognitive processes and how can they be implemented to enable robots developing socially sensible behavior. As an overall goal, SocialSpace aims to enable the creation of sustainable and efficient human-robot relationships. 5 Project DesignSpace: Assistive Intelligence for Spatial Design The project DesignSpace aims to develop computational techniques and tools to be used as a basis of providing assistive design intelligence within a conventional spatial and architectural design workflow. Such a workflow typically involves an iterative refinement cycle consisting of modeling, evaluation, and re-modeling phases. Intelligent capabilities in spatial design are essential to reduce design errors and failures by iterative design validation and verification, and also to ensure that functional requirements of a design are met when the design is deployed in reality. The DesignSpace project is especially interested in the relationship between the structural form and function and its formal interpretation within the assistance system. As an example, user-centered design analyses during the master-planning stage should be one of the most crucial considerations in the spatial design of large-scale public environments such as airports, museums, train stations, exhibition halls, or hospitals (i.e., all places with clearly definable functional purposes). In the research on computational design analysis in the Project DesignSpace, a range of analytical aids is developed that aim to relieve the designer from the cognitive stress involved during the early master-planning stage. In the context of wayfinding analyses for circulation planning, the assistance system developed in DesignSpace (1) derives the logical structure of topological connectedness, (2) generates all possible topological and geometric routes, (3) derives affordance-based routes aimed at predicting the motion pattern of special interest groups, (4) performs hypothetical what-if scenarios by providing comparative analyses, and (5) visualizes not only the explicitly existing physical space, but also the implicitly existing affordance spaces, physical and non-physical artifacts, and so on. The long-term goal in this project is to develop a methodology that is able to detect requirement inconsistencies in the preliminary CAD design and communicate them back to the designer. This also includes the ability to perform diagnosis, and the derivation of alternate recommendations that do not violate the explicit and implied requirement constraints of the designer or architect. Initial studies have investigated this for the specific case where the new generation of smart environments and buildingautomation systems are being designed.

27 Collaborative Process Assistant Towards a Contextsensitive Business Process Support Based on s Julian Krumeich, Thomas Burkhart, Dirk Werth, and Peter Loos Institute for Information Systems (IWi) at the German Research Center for Artificial Intelligence (DFKI) Stuhlsatzenhausweg Saarbruecken, Germany {Firstname.Lastname}@iwi.dfki.de Abstract. In many companies, a majority of business processes take place via communication. Large enterprises have the possibility to operate enterprise systems for a successful business process management. However, these systems are not appropriate for small and medium-sized enterprises (SMEs), which are the most common enterprise type in Europe. Thus, the European research project Commius addresses the special needs of SMEs and the characteristics of communication, namely high flexibility and unstructuredness. Additionally, it copes with the trade-off between process guidance and flexibility. In this paper, COPA which is a prototypical implementation of the Commius concept will be presented. Keywords: , Workflow, Business Process, Flexibility 1 Introduction s as a means for communication has become an integral part of our daily business activities, without which modern business would be unthinkable. On average, employees spend 2.6 hours a day with sending and receiving 33 respectively 72 s [1]. However, not only the time spent with s as a means of communication, but also the knowledge that is bundled in an unstructured way within companies' repositories is quite difficult to manage. This becomes clear, if the number of 75 % is taken into mind representing the percentage of a company's knowledge saved in s [2]. Large companies have the possibility to operate sophisticated enterprise systems, for instance ERP systems, which contain features for a successful business process management. Nevertheless, these solutions are not appropriate for all types of companies. Small and medium-sized enterprises (SMEs) do not have the ability to spend money for purchasing, operating and maintaining such expensive systems [3]. Currently, none of the existing software systems address the special needs for SMEs. Furthermore, -based business process solutions would have to address special characteristics of communication, namely highly flexibility and unstruc- S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

28 22 J. Krumeich, T. Burkhart, D. Werth, and P. Loos turedness. Traditional workflow engines lack the required flexibility for reacting to ad-hoc changes [4]. Their rigid underlying process model would need to foresee all possible variation, which becomes unfeasible even for simple processes. On the other hand, flexible workflow engines expect user knowledge about the procedural structures of an enterprise and do not provide enough guidance. However, introducing more procedural structures would result in a decrease of a system's flexibility [5]. Due to these problems, none of the proposed solutions could be established on the market. The EU-FP7 research project Commius (COMMunity-based Interoperability Utility for Small and medium enterprises), in which 9 European participants were involved until the end of 2011, and its prototypical realization COPA (Collaborative Process Assistant) address these problems (cf. The proposed concept is tailored to the special needs of SMEs, manages -based business processes and copes with the personal and company individual requirements of communication. COPA has the overall target to make workflows easier, faster and more structured. The achievement of these targets could be empirically validated [6]. 2 The Collaborative Process Assistant The usage of the system is divided into two phases, build-time and run-time. During build-time, the basic system configuration is accomplished in a first step. Afterwards, the process types to be supported are defined in a configuration tool. COPA will contain several standard process templates that can be customized or new processes can be defined. Therefore, parser elements to be identified in future s are specified. A set of certain parser elements constitutes a certain process step. After defining such process steps, they can be combined to process types. Desired annotation and processing treatment can be determined individually for each process step. Having defined customized process settings, the COPA system can be employed. During this run-time phase, all movements of the given accounts are observed with respect to potential s. In case of an identification of such an , it runs through the COPA enhancing treatment. From a more technical perspective, COPA is divided into three main layers: On the level of the system layer each received will be intercepted by the system and subsequently be analyzed, archived, decoded and decomposed. Each part of an will be transformed into plain text and merged into a single XML document to allow other COPA components to directly access the information for further processing. In addition, the system layer will provide system connectors usable to interface external as well as legacy systems. The semantic layer signifies meaningful communication of an enterprise. As such, it underpins the interoperability between collaborating enterprises. Outgoing from pattern-based information extraction, notifications, orders and other communication, can be identified and relevant information in this regard will be extracted. The process layer concerns process interoperability. User interactions take place mainly within this layer. The layer is subdivided into four run-time components that are described in the following subsections in more detail.

29 Collaborative Process Assistant Detection Component The first step along the processing is the Detection Component. Here, the COPA system determines, whether the incoming concerns an already running process or if a new process instance has to be initiated. Based on the prior performed semantic analysis, the can either be assigned to an existing process where it constitutes the next step or the is considered as a starting event and triggers a new process. In this case a new process instance with its specific Process ID will be created outgoing from the corresponding reference model template from the Enterprise Process Repository. Further, the information, whether the incoming is part of an already known process or a completely new one, is being displayed to the user. Future incoming s concerning this particular process will be assigned to this initial process instance henceforth. 2.2 Tracking Component The process tracking functionality is responsible for monitoring all incidents occurring within a running process as well as storing every performed process step and additionally relevant information, e.g. customer IDs or order quantities, in the respective context. Following this approach, every performed step and conjoined information within a process instance is documented and comprehensible for further disposal thus forming the precondition for further beneficial functionalities of COPA. As soon as the processing is completed the displayed contains two sets of data consisting of information about the present state of the actual process and the visualization of the preceding modules including the corresponding s. 2.3 Assisting Component As the correct business module has already been identified, the Assisting Component exploits the Enterprise process repository gathering relevant process data in order to supply the user with beneficial information related to the particular process step. This data may either consist of internal information like for example customer history or article information or it can be presented in form of a gateway to external links. To achieve the best response to specific requirements the nature and the level of detail of the information to be displayed can be adjusted using the customization tool. The personnel of a shipping department for example would require item IDs and ordered quantity as well as the customer's address for example, while account details would not be expedient in this context. 2.4 Advising Component Due to the knowledge gathered in prior phases, the Advising Component provides suggestions and recommendations for further proceedings in a particular process. If the incoming has, for example, been identified as a confirmation of payment of an order transaction, COPA would recommend triggering the shipment and procure

30 24 J. Krumeich, T. Burkhart, D. Werth, and P. Loos reasonable options like forwarding the confirmation to the shipping department. The gathered information is then being forwarded to the system interoperability layer, which incorporates the auxiliary information into the generated output . This provides the user with a set of descriptive hyperlinks representing the different possibilities to proceed within the process's workflow and allows accessing the ascertained information. Using the method of embedding the information directly into an , allows COPA to be applicable as well in a collaboration scenario in which only one partner deploys the system, due to the wide spreading of the standard. A second functionality of the Advising Component is to provide advice in actually executing the next process step once the user has chosen one of the provided actions. This more interactive part is not directly invoked while processing an incoming , but later via the embedded hyperlinks which redirect the respective user to an integrated web-interface where he will be provided with more specific information on the further proceedings. 3 Conclusion In this paper, we have presented Commius as a prototype to support enterprise interoperability based on technologies which fits the special needs of SMEs. It enables SMEs to use and maintain the system without major financial efforts and changes in their technological landscape. The most innovative aspects of Commius include the extensive capabilities for flexible adaptation to changed environmental needs while still providing process guidance. Acknowledgments. This work has been partially supported by the EU STREP project Commius (FP ). 4 References 1. The Radicati Group, I Business User Survey. (accessed July 2012). 2. Messaging Architects Policy-Based Security and Data Leak Prevention. (accessed July 2012). 3. European Commission The European e-business Report (2006/07 edition). 4. Günther, C. W., Reichert, M. and Van der Aalst, W. M. P Supporting Flexible Processes with Adaptive Workflow and Case Handling. In Proceedings of the 17th IEEE International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE '08, Rome, Italy), Burkhart, T. and Loos, P Flexible business processes - Evaluation of current approaches. In Proceedings of the Multikonferenz Wirtschaftsinformatik (MKWI 2010, Göttingen, Germany), Burkhart, T., Krumeich, J., Werth, D. and Loos, P Flexible Support System for -Based Processes: An Empirical Evaluation. In International Journal of E-Business Development (IJED), 2, 3,

31 uservice A Personalized and Situation-Aware Recommender System for User-Generated Mobile Services Alexandra Chapko 1, Andreas Emrich 1, Marc Gräßle 1, Dirk Werth 1, and Peter Loos 1 1 Institute for Information Systems at the German Research Center for Artificial Intelligence, Saarbrücken, Germany {alexandra.chapko,andreas.emrich,marc.graessle,dirk.werth, peter.loos}@dfki.de Abstract. The following short paper describes the mechanisms of the recommender system contained in the prototype of the project uservice [1]. uservice is a European project which ended in April The German consortium was funded by the Federal Ministry for Education and Research of Germany (project funding reference number 01IS09020D). A service infrastructure for the mobile super prosumer was developed which enables users to create their own mobile services and provide them to other users. The German Research Center for Artificial Intelligence developed the personalized and situation-aware recommender system for mobile user generated services. Keywords: Recommender Systems, Semantic Search, Ontology Representation 1 Introduction Nowadays, mobile devices enable users to generate their own content and provide it centrally via a server. Soon, this will evolve further, and a mobile device will provide not only contents but user-created services and become a server itself. As a result there will be many little services for many small target groups. The European project uservice examined the turn of mobile users into service super prosumers, i.e. producers, providers and consumers of services and examined what mechanisms are necessary to enable a user to consume and provide such user-generated mobile services. This implies millions or perhaps billions of potential sources with valuable services for billions of potential consumers. Due to the particular challenge of the mobile environment, where device resources, interaction possibilities and user attention are much more restricted compared to the fixed web environment, intelligent, context-aware discovery mechanisms are necessary S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

32 26 A. Chapko, A. Emrich, M. Gräßle, D. Werth, and P. Loos 2 uservice Demonstration Description 2.1 Search and recommendation mechanisms for services consumption User-generated mobile services may be valid not only for a specific location- but also for a specific context. This means given a specific situation - that services are possibly only relevant for hours or maybe even minutes. A user may create a service in order to quickly disseminate information. The type of information to be distributed could be as simple as a text message describing a current traffic situation, or a photo illustrating a natural disaster. Therefore, they are called micro-services in the following. A simple and shallow domain ontology was defined, that serves the purpose of having an infrastructure for semantic reasoning. The micro service description (e.g. tags and full-text description) is matched to ontology concepts and an ontological representation, i.e. a vector of concept URIs, is saved in the semantic search index. With the semantic reasoning not only concepts which match the micro-service description are added to the ontological representation but also concepts which are a generalization or related in another way. Therefore, relevance weights for each concept indicate in the ontological representation how well it describes the micro-service. Other information such as validity (e.g. date and/or time), availability (e.g. GPS information, network connection), and access rights (e.g. groups ids) is also saved in the index. This information enables a fast filtering for relevant micro-services and reduction from potentially millions of micro-services to a few thousand. Unlike traditional search engines, relevant services are then not identified using only comparisons how well a user query matches the micro-service description. Instead the search functionality utilizes a phased approach of multiple threshold chains. Thresholds in this context describe the minimally necessary relevance of microservices for the particular user in a particular situation. Each dimension consists of one or more sub-thresholds, i.e. the distance of the current user location to the location addressed in a location-based service. Only if all thresholds are passed, the service qualifies for display given an overall score calculation. The calculation of this overall score is a weighted sum of five plug-in functions which evaluate aspects such as context, usage, rating and fit with user query. The first plug-in function is the BooleanKeywordRanker, which captures the number of matches between the service description and query tags, generalizations of query tags and tags in the user profile. To normalize the value, this number is divided by the number of tags contained in the service description. The UsageRanker calculates the ratio of number of the times this service was chosen by other users to the overall number of service choices within the time the service valid. This function can be modified in order to have collaborative filtering aspects by considering not all users but only users with a similar profile. The RatingRanker computes the average rating of the given services in percent to the best possible rating value. This function can also be modified in the same way as the UsageRanker for collaborative purposes. To measure the context, the TimeRanker and LocationRanker compute the ratio between the distance of the time and location of the

33 uservice A Personalized and Situation-Aware Recommender System 27 given service to the current position and time of the user and a threshold distance. As a possible extension, the threshold distance can be personalized. 2.2 Search and recommendation mechanisms for service creation To create a service a user can utilize a service template, which is an editable sample of a service. Search for and recommendation of service templates is similar to search and recommendation of micro-services. Based on a description of the template an ontological representation of the service template is created and saved in the semantic search index. Furthermore, a ranking function evaluates how suitable the service template is for a specific users based on categories such as amount of usage by other users with the same user profile, user ratings or fit with user intention. Another way to create a micro-service is to combine building blocks. A building block is for instance, a header, a text input field, a map, etc. which can be customized by the user. Ontological representations of building blocks are saved in the semantic search index and a ranking function evaluates how suitable the service a block is based on categories such as match with user query. In addition, collaborative filtering is applied. Depending on the combination of chosen build blocks the list of building blocks displayed to a user is updated by showing building blocks which were historically used most often with the given combination of chosen building blocks. 3 Semantic Knowledge Representation to a User In our approach a user can browse the tree structure contained in an ontology. The approach is similar to tag clouds but personalized for each user, i.e. the higher the relevance of a concept for an individual user the bigger its representation. To hide the complexity from the user, only one level or a fragment of the tree is shown. The navigation is being kept rather simple and supports different interaction modes: A click on a parent concept causes a shift of focus to the concept, which is the super class concept of the concept clicked on. In other words, a click on the concept on top results in an upward movement in the tree structure. A click on one of the concepts in the bottom causes a shift of focus to the selected concept and new concepts are loaded from the ontology which have not yet been displayed. In other words, a click on a concept on the bottom results in a downward movement in the tree structure of the ontology. As mentioned above the representation of individual concepts is adapted to the user s personal interests which are defined in the user profile. They were rated by the user with a relevance weight in the user profile. Therefore, these concepts are displayed bigger then concepts which are not of interest for the user. For concepts with no relevance weight a recursive algorithm assigns weights according to the sum of weights of its subclasses. Figure 1 shows that a user specified weights for concepts football, tennis, clubbing and music. Since there was no weight for sports, the sum of the child concepts football and tennis is used. As a result concept sport is represented biggest, concept music second biggest and concept clubbing third biggest. Overall rating for concepts is based on subclass relevance weights.

34 28 A. Chapko, A. Emrich, M. Gräßle, D. Werth, and P. Loos 0,9 sports weight = 0,0 0,4 0,8 clubbing music weight = 0,4 weight = 0,4 0,7 0,2 0,4 football weight = 0,3 tennis weight = 0,2 bass weight = 0,4 0,4 soccer weight = 0,4 Figure 1. Weighting of areas of interest 4 Outlook & Conclusion Research result were presented on several international conferences and published in international journals. For instance mechanisms for personalized recommendations were shown at the Pacific Asia Conference on Information Systems in 2011 (Title: Personalized and Situation-Aware Recommendations for Runners) [2], the 20th European Conference on Information Systems (Title: Personalized and context-aware recommendations of running routes) [3] or at the American Conference on Information Systems (Title: uservice Enabling user-driven fitness services on-the-go) [4]. Further research is planned on the combination of learning mechanisms with recommendation mechanisms in order to enhance personalization over time [5]. The recently started project MENTORbike funded by the Federal Ministry of Education and Research of Germany (project funding reference number 01IS11034D) builds on and further develops the results of uservice [6]. MENTORbike researches and develops an adaptive, intelligent, mobile assistance system, which combines an ebike, a wireless body area network, body sensors, a smart phone and a service server to an innovative training device for prevention and rehabilitation of heart diseases. MENTORbike will for instance adapt power support by the ebike automatically based on ECG recordings from the body sensors. Therefore, the recommendation mechanisms developed in uservice will be further developed and extended.

35 uservice A Personalized and Situation-Aware Recommender System 29 References 1. Chapko A., Gräßle M., Emrich A., Werth D., Rust C., Tacken J., Flake S., Laum N., Lerche C., Weber A.: User-Generated Mobile Services for Health and Fitness. In: Golatowski F., Bello L. L., Ditze M., Niedermeier C. (ed.). IEEE International Workshop on Service Oriented Architectures in Converging Networked Environments - Workshop Proceedings. IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA-11), IEEE International Workshop on Service Oriented Architectures in Converging Networked Environments, September 5-9, France (2011) 2. Emrich A., Chapko A., Gräßle M., Werth D., Flake F., Tacken J., Rust C.: Personalized And Situation-Aware Recommendations For Runners. In: PACIS 2011 Proceedings. Paper 570. Pacific Asia Conference on Information Systems (PACIS-2011), July 7-11, Brisbane, Australia (2011) 3. Chapko A., Knoch S., Leonhardt F., Emrich A., Gräßle M., Ganev L., Werth D., Loos P.: Personalized and context-aware recommendations of running routes. 20th European Conference on Information Systems (ECIS 12), June 10-1, Barcelona, Spain (2012) 4. Chapko A., Emrich A., Gräßle M., Feldmann T., Werth D., Tacken J., Flake S., Rust C.: uservice Enabling user-driven fitness services on-the-go. In: AMCIS 2011 Proceedings - All Submissions. Paper 391. American Conference on Information Systems (AMCIS-11), August 4-7, Detroit, Michigan, United States, AIS Electronic Library, Knoch S., Chapko A., Emrich A., Werth D., Loos P. 2nd International Workshop on Recommender Systems meet Databases (Workshop RSmeetDB'12 ) at 23rd International Conference on Database and Expert Systems Applications (DEXA 2012 ) 6. Chapko A., Werth D., Feodoroff B, Schmitt A., Walter H., Stützinger V., Schlicker M., Koch M.: MENTORbike - Das intelligente Pedelec. In Proceeding: Technik für ein selbstbestimmtes Leben - 5. Deutscher AAL-Kongress, Berlin (2012)

36 ARGUMENTUM Towards Computer-Supported Analysis, Retrieval and Synthesis of Argumentation Structures in Humanities Using the Example of Jurisprudence Constantin Houy 1, Peter Fettke 1, Peter Loos 1, Iris Speiser 2, Maximilian Herberger 2, Alfred Gass 3, and Ulrich Nortmann 4 1 Institute for Information Systems (IWi) at the German Research Center for Artificial Intelligence (DFKI) and Saarland University, Campus, Building D Saarbrücken, Germany {Constantin.Houy Peter.Fettke Peter.Loos}@iwi.dfki.de 2 Institute for Law and Informatics (IFRI) Saarland University, Campus, Building D Saarbrücken, Germany {i.speiser m.herberger}@mx.uni-saarland.de 3 European Academy of ejustice (EEAR) Torstr. 43 a Merzig, Germany gass@eear.eu 4 Institute of Philosophy Chair of Theoretical Philosophy Saarland University, Campus, Building C Saarbrücken, Germany u.nortmann@mx.uni-saarland.de Abstract. Argumentation represents a fundamental intellectual activity and is, furthermore, one of the central tasks in the context of every scientific discipline. Developing new arguments and analyzing existing argumentation structures is of special importance for the field of humanities and, thus, for jurisprudence. The analysis of existing and the synthesis of new argumentation structures comprise sophisticated intellectual processes which are, nevertheless, bound to the natural limitations of the human information processing capacity. Against the background of the improving electronic availability of a growing corpus of jurisdiction, approaches and techniques from the field of artificial intelligence offer a considerable potential for an automated analysis, retrieval and synthesis of argumentation structures. The project ARGUMENTUM aims at exploring the potential and limitations of computer-supported methods for the analysis, retrieval and synthesis of argumentation structures using the example of law. Keywords: Information Retrieval, Argumentation Analysis, Computer- Supported Argumentation, Argumentation Mining, ehumanities S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

37 The ARGUMENTUM Project 31 1 Motivation Argumentation is a fundamental intellectual activity and, moreover, a central task in the context of every scientific discipline. In this context, justifications as well as refutations of statements are developed by humans in order to convince other people of the trueness or falsity of these statements. Developing new and analyzing existing argumentation structures is of tremendous importance for every scientific discipline; especially for jurisprudence as one representative of the humanities. A central task of jurisprudence lies in the analysis of court decisions which represent aggregated and formalized argumentation structures. Argumentation structures are methodically wellinvestigated and generally accessible for intellectual analysis. They are characterized by the fact that certain theses are attacked or defended step-by-step by means of supporting or refuting arguments. In a basic structure for justification one thesis is supported by one or several justifications. In this regard, a justification is a set of sentences which are presented to justify the thesis. For jurisprudence and for legal practice it is crucial to identify those justifications which support a thesis. However, basic justification structures can be embedded into more complex structures such as in the following example: given a legal norm N 1, the fact F 1 and the legal consequence C 1. The fact consists of two conjunctively linked characteristics CH 1 and CH 2. The interpretation of CH 1 is controversial. Let us assume there are two opposite interpretations for CH 1 viz. I 1 (CH 1 ) and I 2 (CH 1 ). The thesis T 1 PRO(I 1 (CH 1 )) is submitted supporting the first interpretation I 1 (CH 1 ). Opposed to I 1 (CH 1 ), the thesis T 1 CONTRA(I 1 (CH 1 )) is submitted. Authority A 1 argues for the thesis T 1 PRO(I 1 (CH 1 )), hence A 1 PRO(T 1 PRO(I 1 (CH 1 ))). Authority A 2 argues for the thesis T 1 CONTRA(I 1 (CH 1 )), hence A 2 PRO(T 1 CONTRA(I 1 (CH 1 ))). Figure 1 gives a graphical representation of the described justification structure. N1 F1 C1 CH1 CH2 I1 I2 T1PRO A1PRO T1CONTRA A2PRO Fig. 1. Exemplary justification structure Typically asked questions in jurisprudence which are highly relevant for daily work are e.g., which authority supports the thesis saying that the characteristic CH 1 should be interpreted in the sense of I 1? The concrete answer would be A 1. Jurists in every possible role need such information to be able to prepare their argumentation. However, the analysis of argumentation structures is a complex intellectual process which is bound to the natural limitations of the human information processing capacity. This means that the preparation of argumentation structures is only based on those legal cases a person is familiar with, and commonly requires a considerable

38 32 C. Houy et al. amount of time. Against the background of the improving electronic availability of an ever growing corpus of jurisdiction, it is remarkable that, so far, no comprehensive support for detailed information retrieval in legal argumentation structures and in preparation for this adequate approaches for their analysis and synthesis have been established. The major legal databases in Germany do not support an argumentation structure-oriented information retrieval, but only simple keyword searches. Some of the elements in the justification structure represented in figure 1 are accessible, e.g. F 1 or CH 1. However, all the other elements are not retrievable and the considerable potential of computer support using approaches from the field of artificial intelligence is not realized in practice. Nevertheless, realizing this potential can support the identification of significant new knowledge for jurisprudence. In the following section, the project ARGUMENTUM which investigates these aspects is presented. 2 The ARGUMENTUM Project The project ARGUMENTUM which has been funded by the German Federal Ministry of Education and Research (BMBF) in the context of the ehumanities initiative since June 2012 deals with the investigation of the potential and the boundaries of computer-supported analysis, retrieval and synthesis of argumentation structures from different perspectives using the example of jurisprudence. 1 The project aims at exploring the potential and opportunities of methods and techniques from computer science and artificial intelligence for new and innovative applications supporting research methods in the humanities, especially argumentation. The identified potential shall, furthermore, be realized by means of an innovative software prototype which will be developed during the project. Innovative methods for computer-supported analysis, retrieval and synthesis of argumentation structures based on large corpora of documented court decisions can support jurisprudence in several ways: 1. The possibility of an electronic search in existing argumentation structures can significantly accelerate the daily work of scientists and practical jurists because relevant issues could be investigated independent of the availability and the organisation of printed sources. 2. The analysis of argumentation structures in larger repositories can also support the identification of interesting and significant patterns of argumentation in the domain of jurisprudence. Based on these patterns, an information system could recommend successful patterns of argumentation to a jurist who is interested in a similar problem or issue. In addition to the disclosure and presentation of such argumentation structures by the planned software prototype, ARGUMENTUM also aims at checking whether the developed insights and findings concerning the analysis, retrieval and synthesis of argumentation structures can also be transferred and fruitfully applied in other fields of the humanities besides jurisprudence. 1 The ARGUMENTUM consortium consists of the Institute for Law and Informatics (IFRI) and the Chair of Theoretical Philosophy, both Saarland University, as well as the Institute for Information Systems (IWi) at the German Research Center for Artificial Intelligence (DFKI) and the European Academy of ejustice (EEAR).

39 The ARGUMENTUM Project 33 The prospective of developing new and combining existing approaches for a partly automated analysis, retrieval and synthesis of argumentation structures is based on several preparatory studies conducted regarding the jurisdiction of the German Federal Constitutional Court (Bundesverfassungsgericht) which has been available in the internet since 1998 ( 3 Related work In the context of computer-supported argumentation [1] and especially argumentation mining [2], a certain amount of interesting work related to ARGUMENTUM exists which shall be further developed and integrated, e.g. first approaches for the analysis and retrieval of argumentation structures in legal dossiers [3], the retrieval of certain elements of argumentation structures based on linguistic patterns depending on certain domains of interest or text types, e.g. in [4] for scientific articles. 4 Conclusion and outlook This contribution gives an overview of the recently started research project ARGUMENTUM which aims at exploring the potential and limitations of computersupported methods for analysis, retrieval and synthesis of argumentation structures. Based on the improving electronic availability of growing corpora of jurisprudence, approaches from the field of artificial intelligence offer a considerable potential for supporting these tasks. In the upcoming project phases, the project team will develop a software prototype supporting the analysis, retrieval and synthesis of argumentation structures and, furthermore, explore the potential and the boundaries of used approaches and techniques. Acknowledgements: The research described in this paper is supported by a grant from German Federal Ministry of Education and Research (BMBF), project name: ARGUMENTUM Analyse und Synthese von Argumentationsstrukturen durch rechnergestützte Methoden am Bsp. der Rechtswissenschaft, support code 01UG1237C. References 1. Scheuer, O., Loll, F., Pinkwart, N., McLaren, B.M.: Computer-Supported Argumentation: A Review of the State of the Art. International Journal of Computer-Supported Collaborative Learning 5, (2010) 2. Mochales, R., Moens, M.F.: Argumentation mining. Artificial Intelligence and Law 19, 1-22 (2011) 3. Sombekke, J., van Engers, T., Prakken, H.: Argumentation structures in legal dossiers. 11th International Conference on Artificial Intelligence and Law. pp Stanford, CA (2007) 4. Angrosh, M.A.: Modelling Argumentation Structures in Scientific Discourse through Context Identification: Towards Intelligent Information Retrieval Systems. Bulletin of IEEE Technical Committee on Digital Libraries 6, (2010)

40 Facetted Search on Extracted Fusion Tables Data for Digital Cities Jochen Setz 2, Gianluca Quercini 1, Daniel Sonntag 2, and Chantal Reynaud 1 1 Laboratoire de Recherche en Informatique Université Paris Sud XI, Orsay, France 2 German Research Center for Artificial Intelligence (DFKI) Abstract. Digital cities of the future should provide digital information about points-of-interest (POIs) for virtually any user context. Starting from several Google Fusion tables about city POIs, we extracted and transferred useful POI data to RDF to be accessible by SPARQL requests. In this initial application context, we concentrated on museum and restaurant resources as the result of a precision-oriented information extraction part. With the current application system we are able to retrieve, filter, and order digital cities POI data in multiple ways. With the help of facets, a user can browse the museums and restaurants; he or she can also filter the relevant objects according to available metadata criteria such as city, country, and POI categories. Different views allow us to visualize the objects of interest as tables, thumbnails, or POIs on an interactive map. In addition, if complementary information from Dbpedia about museum and restaurant records from the cities they are located in is available, this information can be retrieved and displayed at query time. 1 Introduction Digital Cities of the Future should feature a democratic city space through a citizen-centric model. This is the vision of the EIT action line Digital Cities 3. Citizen participation could take different forms, e.g., the execution of necessary actions to improve the city s performance and sustainability, or, as in the direction we pursue, the collection and usage of data to be broadcast or to be used to analyze and sense the status and the dynamics of the city as a place where people live and spend their spare-time. As part of the EIT ICT Labs KIC activity DataBridges, Data Integration for Digital Cities, we are developing a framework that enables the enrichment of data related to points-of-interests (POIs) in cities (e.g., restaurants, museums, or theatres) and supports the applications which aim at using the data to provide specific and dynamic city services (e.g., city tour recommender systems). We are confronted with two major challenges on which we will focus in this demo paper: Collecting as much data as possible about digital city POIs. Presenting the data so that information is easily retrieved. 3 S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

41 Facetted Search on Extracted Fusion Tables Data for Digital Cities 35 2 Which facetted browsing functionality do we provide? Starting from several Google Fusion tables, the data is being transferred to RDF to be accessible by SPARQL requests. We first concentrated on museum and restaurant resources from Digital Cities. With the current demonstrator, we are able to filter and order digital cities in multiple ways, rather than in a single, pre-determined, taxonomic order. With the help of automatically generated facets, a user can browse the museums and restaurants in an elegant way, but also filter the relevant objects according to the available metadata criteria: city, country, and/or POI categories. Different views allow us to visualize the objects of interest as tables, as thumbnails, and points-of-interest on an interactive map. In addition, if complementary information from Dbpedia about museum and restaurant records from the cities they are located in are available, this information can be retrieved and displayed at query time. Figure 1 shows digital cities results of restaurants and museums in Australia. The facets in the upper part allow us to filter the city, country, and category resources (here restaurant types) as indicated. Different types of resources are aggregated into the result lenses shown in the lower part of figure 1 which displays one museum and five restaurant results. Fig. 1: Facetted search in digital city POIs Our web application allows us to switch between different views of a single filter being applied, namely aggregated result view, map view, compact view,

42 36 J. Setz, G. Quercini, D. Sonntag, and C. Reynaud and full view. Figure 2 shows a result map of 26 filtered restaurant resources. The Google map indicates Australian restaurants and museums as landmarks. Fig. 2: Digital cities restaurant results indicated on a map 3 Which features have we implemented? We make use of multiple Linked Data sources; Dbpedia contents are extracted according to semantic city links of the filtered resources. Photos related to Dbpedia cities are provided by the Linked Data resource flickr wrappr 4. The system s browser page is dynamically generated and updated according to the retrieval sets of the combined Linked Data queries. In addition to interactive boxes, links, and tables, the GUI uses Javascript widgets to visualize individual museum and restaurant retrieval results (works best in Chrome, Safari, etc.). External data is provided by a slideshow which enables us to jump directly to the data of interest, here Dbpedia comments or flickr photos. The slideshows can be enabled by selecting the links provided in the aggregated view. Figure 3 shows the selection of a point-of-interest (restaurant) on the map; location-based details of the resource, which are obtained from Dbpedia, can be highlighted. 4

43 Facetted Search on Extracted Fusion Tables Data for Digital Cities 37 Fig. 3: Map details of one of the 26 filtered digital cities results (restaurants) 4 How does the system work technically? Google Fusion Tables (GFT) are data tables which are consolidated into a Web application that hosts a vast collection of tables contributed by people over the Internet [1]. We developed a tool that automatically converts these tables to RDF data. This problem of understanding the semantics of tables has already been addressed by numerous research groups [2,3,4]. In particular the approaches described in [2,4], which rely on probabilistic models, show promising results. Based on our extraction process for the Google table which follows the implementation in [4], we implemented a graphical user interface from scratch by using the open-source knowledge management tool Exhibit 5. After the facets and lenses have been specified, several Google Fusion tables are converted to RDF and loaded onto our DFKI Virtuoso server. The interactive GUI then triggers several SPARQL queries (at query-time) and provides additional Dbpedia information about cities of filtered restaurants and museums in multiple languages (according to established Dbpedia links). Additionally, we use the web service of flickr wrappr at query-time in order to retrieve RDF links to relevant photos of Dbpedia resources. Figure 4 shows the slideshow which we built from the tables. A click on a city location where restaurants or museums are located triggers an ad-hoc query to Dbpedia to fetch more information about the city. The slideshow contains comments, external links, and photos from flickr. 5

44 38 J. Setz, G. Quercini, D. Sonntag, and C. Reynaud Fig. 4: Slideshow built from Fusion tables, Dbpedia data and flickr photos 5 Conclusion Starting from several Google Fusion tables about city POIs, we extracted and transferred useful POI data to RDF to be accessible by SPARQL requests in the context of an interactive facetted browsing application. In particular, different views allow us to visualize the objects of interest as tables, thumbnails, or POIs on an interactive map or slideshow which also take dynamic data from other Linked data sources, Dbpedia and flickr, into account. The facetted browsing tool would highly benefit from further Google fusion table contents to be automatically extracted and used in the context of a digital city search application. References 1. H. Gonzalez, A. Y. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapley, W. Shen, and J. Goldberg-Kidon. Google Fusion Tables: Web-centered Data Management and Collaboration. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD 10, pages , New York, NY, USA, ACM. 2. G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow., 3: , September V. Mulwad, T. Finin, Z. Syed, and A. Joshi. Using Linked Data to Interpret Tables. In First International Workshop on Consuming Linked Data (COLD2010), P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering Semantics of Tables on the Web. Proc. VLDB Endow., 4: , 2011.

45 The ALIZ-E Project: Adaptive Strategies for Sustainable Long-Term Social Interaction The ALIZ-E project team Partners University of Plymouth, U.K. German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany Vrije Universiteit Brussel, Belgium Netherlands Organization for Applied Scientific Research (TNO), Soesterberg The Netherlands Imperial College, London, U.K. University of Hertfordshire, U.K. Fondazione Centro San Raffaele del Monte Tabor, Milan, Italy National Research Council (CNR), Padova, Italy Gostai, Paris, France Overview The aliz-e project 1 ( ) aims to contribute to the development of integrated cognitive systems capable of naturally interacting with young users in real-world situations, with a specific goal of supporting children engaged in a residential diabetes management course. Fundamental to making human-robot interaction natural and integrated into the fabric of our lives, is that the robot can establish itself cognitively in the long term. Only if interaction provides a sense of continuity over longer periods of time, can it provide the resonance necessary for a constructive relationship between human and robot. It is commonly acknowledged that learning, adaptation, emotion, multi-modal dyadic and group interactions will be necessary to achieve this goal, but the field has not yet been presented with conclusive design paradigms, algorithms and test results showing how a robot can enter and successfully maintain an interaction spread beyond the current single episode interaction frame and stretching over several days. The project goal is to extend the science and technology behind long-term human-robot interaction. To achieve this, we address three related issues in developing interactive robots capable of sustaining medium- to long-term autonomous operation in real-world indoor environments. aliz-e addresses how long-term experience can be acquired, so the robot can tailor its behavior based on historical user interactions. To achieve contingent behavior it is important to have the system deal robustly with inevitable differences in quality in perceiving and understanding a user and her environment. To this end, aliz-e develops new methods for adaptively controlling how a cognitive system invokes and balances 1 aliz-e is supported by the European 7th framework programme (FP7-ICT ). S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

46 40 The ALIZ-E project team Fig. 1. Children with a robot engaging in quiz, imitation dance (left to right). a hybrid ensemble of processing methods for perception, action and interaction. Finally, aliz-e addresses how a system can engage in an intersubjective interaction using potential anthropomorphization of robots by the user. The long term aim is to implement believable, long-term, social child-robot interaction. The partners combine expertise in cognitive robotics, situated dialogue processing, human-robot interaction, cognitive user modeling, machine learning, audio/visual processing and robot control. The theory and practice of aliz-e will impact on theoretical cognitive systems research (e.g., cognitive memory, longterm affective interaction) and implementation of embodied cognitive systems (e.g., adaptive deployment of processing and behavior for robust interaction, speech processing for young users). To demonstrate and evaluate the scientific approach, aliz-e develops and instantiates these methods in a succession of integrated systems that are tested through experiments with young users in the wild. Specifically, aliz-e is testing robots with children (target age 8 11), who have metabolic disorders, such as diabetes or obesity. We aim for the robots to support the children s well-being and facilitate therapeutic activities in a hospital setting. Several series of experiments have already been carried out with children of the target user group at various sites including two hospitals in Italy and The Netherlands [6, 1, 5]. The children engaged in various activities with a (partially) WOZ-simulated system implemented using the Nao robot2 [2, 4, 3], including (Figure 1): quiz: the child and the robot ask each other series of multiple-choice quiz questions from various domains, the robot provides evaluation feedback; imitation: either the child or the robot presents a sequence of simple arm poses that the other tries to memorize and imitate; dance: the robot explores various dance moves with the child and then teaches the child a dance sequence according to its abilities The R&D and evaluation in aliz-e provides useful results emerging from the cycle of testing at the hospital and in other settings. The challenges of CRI in the wild are significant with both technical and pragmatic issues to be faced. Children are not mini-adults and this fact is very much apparent in the context of CRI. Children bring an imaginative investment to encounters with robot agents that is hugely valuable in the exploration of how we can develop 2

47 ALIZ-E: Adaptive Strategies for Sustainable Long-Term Social Interaction 41 technologies and systems for social interaction. Conversely, they have often had exposure to highly sophisticated toys with complex (if rigid) behavior patterns, thus the interest of a child is easily lost when the limits of a robot s responsiveness are discovered. Thus we have come to understand that sometimes less complex but more robust and flexible behaviors are those which produce the best results with users. The central feature of aliz-e is a coupling of innovative technology with embedding in a real-world application domain. Thus the project involves the development of new solutions to a number of significant issues in social HRI some of which have a particular focus on child users e.g. novel technologies for parsing child speech and aspects of user modeling as applied to children. Other technologies are applicable across the range of social HRI applications such as associative memory-driven coordination of behavior. These methods and solutions are being developed in parallel and continuously integrated for testing with child users both in a hospital setting and elsewhere. This rolling program of integrated testing obviously provides invaluable feedback on the real-time performance of the system. At the same time, each interaction session enables us to study the particular characteristics of the target user group, learning more about what a robot needs to do in order to establish and maintain a social bond with a child. References 1. Blanson Henkemans, O., Hoondert, V., Groot, F., Looije, R., Alpay, L., Neerincx, M.A.: i just have diabetes : Children s need for diabetes selfmanagement support and how a social robot can accomodate. Patient Intelligence Accepted (2012) 2. Kruijff-Korbayová, I., Athanasopoulos, G., Beck, A., Cosi, P., Cuayáhuitl, H., Dekens, T., Enescu, V., Hiolle, A., Kiefer, B., Sahli, H., Schröder, M., Sommavilla, G., Tesser, F., Verhelst, W.: An event-based conversational system for the nao robot. In: IWSDS Granada, Spain (2011) 3. Kruijff-Korbayová, I., Cuayáhuitl, H., Kiefer, B., Racioppa, S., Cosi, P., Paci, G., Sommavilla, G., Tesser, F., Sahli, H., Athanasopoulos, G., Wang, W., Enescu, V., Verhelst, W., Cañamero, L., Beck, A., Hiolle, A., Ros Espinoza, R., Demiris, Y.: A conversational system for multi-session child-robot interaction with several games. In: 35th German Conference on Artificial Intelligence, demonstration (2012) 4. Kruijff-Korbayová, I., Cuayáhuitl, H., Kiefer, B., Schröder, M., Cosi, P., Paci, G., Sommavilla, G., Tesser, F., Sahli, H., Athanasopoulos, G., Wang, W., Enescu, V., Verhelst, W.: Spoken language processing in a conversational system for child-robot interaction. In: Workshop on Child-Computer Interaction (2012) 5. Nalin, M., Baroni, I., Kruijff-Korbayová, I., Cañamero, L., Lewis, M., Beck, A., Cuayáhuitl, H., Sanna, A.: Children s adaptation in multi-session interaction with a humanoid robot. In: Proceedings of the Ro-Man Conference. Paris, France (2012) 6. Ros, R., Nalin, M., Wood, R., Baxter, P., Looiije, R., Demiris, Y., Giusti, A., Pozzi, C.: Child-robot interaction in the wild : Advice to the aspiring experimenter. In: ICMI (2011)

48 PeerEnergyCloud Trading Renewable Energies Jochen Frey, Boris Brandherm, and Jörg Baus German Research Center for Artificial Intelligence GmbH, Stuhlsatzenhausweg 3, Saarbrücken, Germany Abstract. An increasing number of private households are becoming producers of renewable energy. From an economic perspective it is beneficial to utilize this energy both locally and promptly. This does require the ability to be able to deal with local excess production at short notice using, for example, an electronic trading platform. In this paper we describe a future energy scenario demonstrating techniques which are necessary for implementing a civil marketplace for trading renewable energies. We will especially focus on the learning of individual activity and load profiles and the prediction of energy consumption and production. Keywords: Smart Grid, Renewable Energies, Cloud Computing, Activity Recognition, Multiagent Systems 1 Introduction End-users have traditionally been consumers of electrical energy but nowadays more and more private households are becoming producers of electrical energy, like solar and wind power. Unfortunately solar and wind are not constant and reliable sources of power, since wind power fluctuates from moment to moment and solar power is generated only in the daytime to name just a few reasons for their unreliability. From an economic perspective it is beneficial to utilize this energy both locally and immediately, and hence, to preserve fossil energy resources bearing in mind that their availability is decreasing which increases their prices. That is the much-needed future electrical grid, an interconnected network for delivering solar and wind-based electricity from suppliers to consumers. One approach would be to let other power plants compensate for this variability and unpredictable power fluctuations, which may only work if more energy is demanded than produced. Peaks in both direction may cause a shutdown of the energy grid. To avoid this either producers have to be shutdown or additional consumers have to be switched on automatically to achieve a so-called load balance. In Germany things are even more complicated due to the shutdown of several old nuclear power plants which decreased already the elasticity of the power grid. This elasticity will be decreased in the future when more and more nuclear power plants will be shut down as planned and no countermeasures S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

49 PeerEnergyCloud Trading Renewable Energies 43 would be taken. One possibility to safeguard the quality of the power flows and to reduce the current amplitudes and additionally to enable a significant level of penetration and effective use of renewable energy sources amid growing energy demands, would be the ability to deal with local excess production at short notice using, for example, an electronic trading platform. The objective of PeerEnergyCloud is to research and develop cloud-based technologies for such a trading platform [1]. The partner consortium consists of the German Research Center for Artificial Intelligence 1, the Karlsruhe Institute of Technology 2, AGT Germany 3, Seeburger AG 4 and Stadtwerke Saarlouis 5. 2 Load Balancing & Energy Trading As a concrete application a micro grid in the city of Saarlouis (Germany) will be considered which consists of about 500 residential units and several photo voltaic systems. The residential units (smart homes) are connected to the local energy provider (Stadtwerke Saarlouis) via a dedicated secured fiber-optic cable which allows for processing of data in real time, e. g. for forecast purposes. The integration of local sensors and actuators in the smart homes will be done via this fiber-optic cable. In this micro grid the provision of power won t be controlled anymore centrally by the current energy consumption instead consumer and local producer are trading locally their renewable energies on a virtual market place. To facilitate this trading the conception and the development of innovative acquisition and forecasting procedures are needed. Figure 1 depicts in brief the system s components and the data flow from a smart house to the market place for trading renewable energies. First step is the acquisition of data coming from different sources like calendar, weather station and other sensors. To do inference with this data we have to cope with the problem of uncertainty, noisy sensors, and sensor fusion. Dynamic Bayesian networks [2] are a computational framework for the representation and the inference of uncertain knowledge that takes into account previous states as well as new data. To fit the needs to serve as input in the sensor nodes of a dynamic Bayesian network selected data will be preprocessed resulting in sub-symbolic data. All data will serve as input for learning and adapting of fine-grained profiles like user profiles or load profiles. After a first learning phase dynamic Bayesian networks will be constructed from these profiles. In the second phase the dynamic Bayesian networks will be adapted from time to time (given there is enough data). A smart combination of the results of the dynamic Bayesian networks with the profiles then will result in the long-term and short-term forecast of

50 44 J. Frey, B. Brandherm, and J. Baus Smart Home 1 Data Acquisition Sensor Data Acquisition Data Pre- Processing Calendar (Holiday, Meeting, ) Weather (Sun, Wind, ) Semantic annotated Raw Sensor Data Sub-symbolic Data Learning and Adapting of Fine-grained Profiles (User Profiles, Load Profiles, ) User Profile X Load Profile Y Adaptation of and Inference in Dynamic Bayesian Networks DBN DBNs DBNs Sensorknoten Sensorknoten Sensor Nodes Forecast Energy Demand and Energy Provision Agent Offering A Offering (from, to, A, (from, Amount, to,, ) Amount, ) Request B Request (from, to, B, (from, Amount, to,, ) Amount, ) Market Place for Trading Renewable Energy Smart Home 2 to n Stadtwerke Saarlouis Fig. 1. Data flow in the smart micro grid from a smart house to the market place for trading renewable energies. a user s energy demand and provision. The long-term and short-term forecast will serve (regularly updated) as input for the user s personal agent which then will deal for him with the personal agents from other smart homes on the civil market place for trading renewable energies. A Standard Load Profile (SLP) is a representative load profile, which helps to predict and balance the load profile of an energy consumer, without to measure and record the current performance. In general most of the households will not show such a consumption pattern. The quality of a forecast (for a group) based on standard load profiles depends on the size of the group. But this standard load profile is not suitable to predict the power consumption of an individual. But this would be necessary to enable a user for trading with renewable energies. The standard load profile and the actual energy consumption don t have much in common. The curve of the load balance shows many big peaks in both directions. But big peaks in either direction may cause a shutdown of the energy grid and have to be avoided. Hence, this method is unsuitable for energy trading at the civil marketplace. The Individual Fine-grained Load Profile (IFP) overcomes the constraints of the standard load profile. First, it reflects the user s individual power consumption over the day. Second, compared to the standard load profile which consists

51 PeerEnergyCloud Trading Renewable Energies 45 Fig. 2. The standard load profile and the actual energy consumption don t have much in common. The curve of the load balance shows many peaks in both directions. The individual fine-grained load profile and the photovoltaic system are combined and the resulting curve represents the absolute power consumption respectively provision by the household.the resulting load balance shows in both directions only small peaks. of measurement points every 15 minutes the individual fine-grained load profile consists of measurement points every 1 second which supports the identification of peak consumptions. Additionally, measurement points will be enriched semantically by available sensor data. Based on such an individual fine-grained load profile and some current sensor data (e. g. weather forecast) a long-term forecast for the next day will be done in the night. The short-term forecast will adapt this long-term forecast for the following two hours based on current context information. Context information will be any available sensor data which will help to foresee a user s activities and therefore his energy demands. For the photovoltaic system considering its specific parameters and based on a weather forecast a long-term forecast will be done in the night. The short-term forecast will adapt this long-term forecast for the following two hours based on the local conditions like e. g. just one cloud is covering one part of the photovoltaic system. The resulting individual fine-

52 46 J. Frey, B. Brandherm, and J. Baus grained load profile and the photovoltaic system are combined and the resulting curve represents the absolute power consumption respectively provision by the household. This data serves as input for the household s personal agent which then will deal for the household with the personal agents from other households at the civil market place for trading renewable energies. The curve of the load balance shows in both directions only small peaks: this method is suitable for trading renewable energies on the civil market place. 3 Outlook In addition to load profiling a complementary next step is to realize a mechanism for recognizing activities of daily living (ADL) [3] based on the existing Peer Energy Cloud infrastructure. A relevant factor when dealing with terms like energy awareness and resource efficiency is to understand the motivation of people s actions living in smart houses. To support, influence or even change their behavior we must be able to find out why and under which conditions they are performing those actions. For instance, the problem which monthly energy bills is that people just see the total amount of their energy consumption. Typically there is no logical connections to single daily activities like cooking, heating or watching television. Acknowledgment This research was funded in part by the German Federal Ministry of Economics and Technology under grant number 01MD11002 (Project PeerEnergyCloud). The responsibility for this publication lies with the authors. References 1. Brandherm, B., Baus, J., Frey, J.: Peer Energy Cloud - Civil Marketplace for Trading Renewable Energies. In: Proceedings of the 8th International Conference on Intelligent Environments (IE), June 26-29, Guanajuato, Mxico (2012) 2. Brandherm, B., Jameson, A.: An extension of the differential approach for Bayesian network inference to dynamic Bayesian networks. International Journal of Intelligent Systems 19(8), (2004) 3. Rashidi, P., Cook, D.J., Holder, L.B., Edgecombe, M.S.: Discovering Activities to Recognize and Track in a Smart Environment. IEEE Transactions on Knowledge and Data Engineering 23, (2011)

53 Poster Contributions

54

55 Towards Augmenting Dialogue Strategy Management with Multimodal Sub-Symbolic Context Paul Baxter 1, Heriberto Cuayáhuitl 2, Rachel Wood 1, Ivana Kruijff-Korbayová 2, and Tony Belpaeme 1 1 Centre for Robotics and Neural Systems, Plymouth University, U.K. {paul.baxter,rachel.wood,tony.belpaeme}@plymouth.ac.uk 2 Language Technology Lab, DFKI, Saarbrücken, Germany {heriberto.cuayahuitl,ivana.kruijff}@dfki.de Abstract. A synthetic agent requires the coordinated use of multiple sensory and effector modalities in order to achieve a social human-robot interaction (HRI). While systems in which such a concatenation of multiple modalities exist, the issue of information coordination across modalities to identify relevant context information remains problematic. A system-wide information formalism is typically used to address the issue, which requires a re-encoding of all information into the system ontology. We propose a general approach to this information coordination issue, focussing particularly on a potential application to a dialogue strategy learning and selection system embedded within a wider architecture for social HRI. Rather than making use of a common system ontology, we rather emphasise a sub-symbolic association-driven architecture which has the capacity to influence the internal processing of all individual system modalities, without requiring the explicit processing or interpretation of modality-specific information. Keywords: Context in Cognitive Architecture, Dialogue Strategy Management, Distributed Memory, Multimodal Coordination, Social HRI 1 Introduction In the application of robotic agents to societal issues, their capacity for extended social interaction with people is a central feature of behaviour that has not yet been resolved [9]. It has been proposed that key to achieving such functionality is the requirement for the coordination of multiple interaction modalities (such as verbal, behavioural, emotional, etc) and the adaptation of the agents behaviour to that of the human interactant [13]. Among the various potential interaction modalities, linguistic interaction is clearly of central importance. However, while the joint optimisation of dialogue strategies with other modalities shows improvements over the optimisation of dialogue strategies for individual modalities [4], the combinatorial expansion in the possible number of dialogue context This work is funded by the EU FP7 ALIZ-E project (grant ). S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

56 50 P. Baxter, H. Cuayáhuitl, R. Wood, I. Kruijff-Korbayová, and T. Belpaeme configurations if encoded symbolically (e.g. [8]) could prove problematic in a real-time social HRI application domain. This variation on the frame problem is particularly apparent in those systems where a global symbolic representation scheme (ontology) is used, where an explicit encoding of what information that should be taken into account must be provided. The aim of this contribution is to provide an outline approach (roadmap) to augment existing solutions to this problem by taking a perspective inspired by aspects of biological cognitive systems, emphasising sub-symbolic associative strategies. In so doing, we propose a system that may provide complementary functions to those mechanisms that are used in an existing dialogue strategy management implementation [8]. 2 An Association-driven approach to providing internal context Taking inspiration from neuropsychological theories of memory and cognitive processing in biological agents, a framework for the soft coordination of multiple modalities within a cognitive architecture has been devised that is based on the hebbian-like associative learning of relationships between information in different modalities [15]. Given that learning of this type may be characterised in a statistical manner, e.g. principle component analysis [12], such a system can extract the statistical relationship between the constituent modalities of a cognitive architecture, over the course of its operation. If the resulting structure is then treated as the substrate for activation dynamics, then the mechanism of priming is instantiated, in which the activity in one modality can influence the processing in another, thereby linking prior experience with ongoing behaviour generation across modalities [2, 15]. A computational system that embodies these characteristics has been implemented, which is informed by Interactive Activation and Competition models (and their adaptive extensions, e.g. [3]). Applied to a classification problem, it is shown that this system both completes the task, and does so in a manner consistent with human behavioural data [6]. In combination, these mechanisms (Hebbian association and activation spreading) enable the embedding of a realtime sub-symbolic statistical relationship component with what may be a largely symbolic processing cognitive architecture. Indeed, this general approach has formed the inspiration for a number of contemporary cognitive robotics implementations, notably in [11] which is used to model a range of human cognitive competencies. In this perspective, context in a cognitive architecture can be derived on the basis of the associative substrate, and the activation dynamics operating over it: it is an encoding of the activity within the system, resulting from external input and internal processing. From the point of view of a given single cognitive modality (e.g. face recognition or dialogue strategy management), the activation received from other such cognitive modalities constitutes an internal context signal since it is explicitly based on the associations that have been formed, and thus on prior experience. In this way, a symbolic representation of context is not required, thus circumventing combinatorial explosion in symbolic representation

57 Dialogue Strategy Management and Multimodal Sub-Symbolic Context 51 schemes. Furthermore, due to the generalisation (and compact representation) properties inherent to distributed associative networks (e.g. [10]), the proposed mechanism has the capacity to similarly generalise over prior experience. 3 A proposed application to augment Dialogue Strategy Management One computational approach to dialogue strategy learning and language generation that has recently shown particular promise is that of Hierarchical Reinforcement Learning (HRL) [7, 8]. In this approach, a hierarchy of learning agents optimise a set of policies, rather than a unitary dialogue system policy. Each layer of the hierarchy therefore consists of multiple RL agents optimised to maximise reward for a subset of the state space (which includes context information). Recent developments have relaxed the notion of strict hierarchical organisation by positing a role for informal structure to enhance flexibility [5]. On this basis, it may be proposed that associations can be formed between the currently active dialogue management agent(s) and other cognitive modalities (e.g. face recognition, emotion interpretation, etc, see figure 1). This process would, over experience, encode statistical relationships between the dialogue agents and the state of processing in other cognitive modalities. These multimodal associations thus subsequently act as the substrate for priming effects between the different cognitive modalities and dialogue strategy management, by modulating the transition functions used in the HRL structure. For example, a dialogue agent that has strong positively weighted associations with a set of states in other modalities can be differentially supported in terms of dialogue strategy processing over other dialogue agents should these states re-occur: i.e. the influence of a global context signal. Thus, though the associative mechanism remains constant, the quantitative effect on action selection changes as a function of experience, and may take effect over relatively short time-periods given the real-time learning rate of the type of associative system in question, e.g. [11]. This scenario is particularly consistent with the loosening of the requirement for strict hierarchy in the HRL schema, as transitions between dialogue strategies in different hierarchy branches can be informed to a greater extent by the information provided by the proposed association mechanism (figure 1). A number of points can be raised concerning this outline proposal. Firstly, the proposed association mechanism assumes that there is a functioning cognitive architecture upon which the associative mechanisms can operate: the emphasis of the current approach is on augmenting existing processing rather than replacing it. Secondly, there is no explicit semantic/symbolic information transferred between modalities through the proposed system. While this results in the need for an existing architectural mechanism to transfer specific semantic information (e.g. the ball I see is red rather than green... ), the association and activation spread mechanisms enable multi-modal information to be taken into account in a relatively simplistic manner (architecturally), without requiring translation using a system ontology with the resulting frame-problem issues. Indeed, the

58 52 P. Baxter, H. Cuayáhuitl, R. Wood, I. Kruijff-Korbayová, and T. Belpaeme Fig. 1. Schematic of the interaction between a loose hierarchical dialogue strategy management system (left, adapted from [5]), and cognitive modalities (right, such as visual processing, etc), through the sub-symbolic associative learning mechanism (centre, see section 2): encoded statistical relationships can subsequently influence dialogue strategy state transitions (not all possible influences depicted). combination of sub-symbolic and RL approaches has been previously considered as promising, e.g. [14]. Furthermore, it enables a clear perspective on how the agent can improve its behaviour through interaction experience rather than off-line training (enabling the findings of developmental learning theory to be leveraged [1]), since further associative specification can be progressively gained. In summary, there are two primary advantages to this type of augmented dialogue strategy management: (1) opportunities for fast learning over the course of interaction with real users; and (2) global integration of modalities for coordinating multimodal conversational behaviours. 4 Perspectives A roadmap for complementing the functionality of an existing dialogue strategy management implementation has been outlined: making use of statistical multimodal associations learned through experience (on the basis of a hebbian-like learning mechanism) in order to augment the dialogue strategy management process. In this way, there is also the potential for coordination among other modal-specific systems within a cognitive architecture. Even though the flow of explicit information through such a system is sparse (being constituted of scalar activation levels only), it has the capacity to improve performance not through the explicit transfer of information per se, but by virtue of the statistical properties of those associations that have been experienced providing a global context. While this paper has only presented a roadmap and is an ongoing programme of research, we believe that this principle of sub-symbolic context signals for augmenting modal-specific processing is generalizable to other cognitive architecture modalities, from perception to action, in the service of achieving social human-robot interaction with autonomous systems.

59 Dialogue Strategy Management and Multimodal Sub-Symbolic Context 53 References 1. Asada, M., MacDorman, K.F., Ishiguro, H., Kuniyoshi, Y.: Cognitive developmental robotics as a new paradigm for the design of humanoid robots. Robotics and Autonomous Systems, 37: pp , (2001) 2. Baxter, P., Wood, R., Morse, A., Belpaeme, T.: Memory-Centred Architectures: Perspectives on Human-level Cognitive. In Proceedings of the Advances in Cognitive Systems track at AAAI Fall Symposium 2011, Arlington, USA, pp , (2011) 3. Burton, A.M.: Learning new faces in an interactive activation and competition model. Visual Cognition, 1(2): pp , (1994) 4. Cuayahuitl, H., Kruijff-Korbayova, I.: Towards Learning Human-Robot Dialogue Policies Combining Speech and Visual Beliefs. In Proc. of the Workshop on Paralinguistic Information and its Integration in Spoken Dialogue Systems, Granada, Spain, pp , (2011) 5. Cuayahuitl, H., Kruijff-Korbayova, I.: An Interactive Humanoid Robot Exhibiting Flexible Sub-Dialogues. In Proc. of The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demo Session, Montreal, Canada, (in press) 6. de Greeff, J., Baxter, P., Wood, R., Belpaeme, T.: From Penguins to Parakeets: a Developmental Approach to Modelling Conceptual Prototypes. In Proc. of the PG Conference on Robotics and Development of Cognition at ICANN 2012, Lausanne, Switzerland, (in press) 7. Cuayahuitl, H., Dethelfs, N.: Optimizing Situated Dialogue Management in Unknown Environments. In Proc. of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, pp , (2011) 8. Dethelfs, N., Cuayahuitl, H.: Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Orientated Natural Language Generation. In Proc. of Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, USA, pp , (2011) 9. Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems, 42(3-4): pp , (2003) 10. McGarry, K., Wermter, S., MacIntyre, J.: Hybrid Neural Systems: From Simple Coupling to Fully Integrated Neural Networks. Neural Computing Surveys, 2: pp , (1999) 11. Morse, A., De Greeff, J., Belpaeme, T., Cangelosi, A.: Epigenetic Robotics Architecture (ERA). IEEE Transactions on Autonomous Mental Development, 2(4): pp , (2010) 12. Oja, E.: A simplified neuron model as a principle component analyzer. Journal of Mathematical Biology, 15: pp , (1982) 13. Ros, R., Nalin, M., Wood, R., Baxter, P., Looije, R., Demiris, Y., Belpaeme, T., Giusti, A., Pozzi, C.: Child-Robot Interaction in The Wild: Advice to the Aspiring Experimenter. In Proc. of ACM ICMI, Valencia, Spain, pp , (2011) 14. Santiago, R., Lendaris, G.G.: Reinforcement Learning and the Frame Problem. In Proc. of IEEE IJCNN, vol 5: pp , (2005) 15. Wood, R., Baxter, P., Belpaeme, T.: A Review of long-term memory in natural and synthetic systems. Adaptive Behaviour, 20(2): pp , (2012)

60 Eager Beaver A General Game Player André Doser, Florian Geißer, Philipp Lerche, and Tim Schulte Foundations of Artificial Intelligence University of Freiburg {dosera,geisserf,lerchep,schultet}@tf.uni-freiburg.de Abstract. General game playing is the research field of being able to play multiple different kinds of games with one AI. We present Eager Beaver, a general game player based on Propositional Networks with dynamic code generation and an enhanced Upper Confidence Bounds applied to Trees (UCT) algorithm as an approach to solve this problem. We ran an evaluation study against Centurio [7], another UCT player and show the results of various UCT extensions used during this competition. 1 Introduction Since the beginning of research in Artificial Intelligence, solving games has been an important area of interest. One complex, well-known game is chess, which consists of around different possible states. In 1997 the chess computer Deep Blue was the first AI to defeat the reigning chess world champion Kasparov. However, while Deep Blue is nearly a perfect chess player it is not able to play a simple game of Tic-tac-toe. Playing multiple different kinds of games with one AI is the focus in the research field of general game playing (GGP). To provide a uniform set of game rules, Love et al. [6] introduced the so called Game Description Language (GDL). With these rules a player is able to play any finite, discrete and deterministic 1 multi-player game of complete information without any game-specific algorithm. After receiving the game rules, each player has a limited time for preparation once a game and a limited time to choose its move each round. Since 2005, a yearly general game playing competition [4] takes place, to promote the research area of general game playing and contribute new ideas. In the first years, the traditional approach was to use a minimax-based game tree search combined with automatically learned heuristic evaluation functions. Nowadays, most of the players use the UCT algorithm [5]. The efficiency of the algorithms depends on the ability to quickly provide basic state manipulation operations like computation of the initial state, legal moves and next states. In contrast to the widely used Prolog-based reasoner we use a different, performance-capable approach: Propositional Networks [2] with dynamic code generation. 1 In 2010 Thielscher [9] proposed GDL-II which allows game rules with incomplete and imperfect information S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

61 Eager Beaver A General Game Player 55 This paper is organized as follows: in the next section, we introduce the general mechanics of a general game player. Afterwards, we describe the theoretical background of the main data structures and algorithms that are used by Eager Beaver. We conclude with implementation details and some benchmarks. 2 Data Structures and Algorithms The game rules, available in GDL syntax, are parsed and forwarded to the Propositional Network or any other reasoner, that provides state manipulation operations. The state machine is an interface that allows the algorithm to receive the same responses for its queries, e.g. legal moves, independently of the used reasoner. See Figure 1 for an overview of Eager Beaver s layout. GDL PropNet Reasoner State-Machine Algorithms - Random Player - UCT 2.1 Propositional Networks Fig. 1. Layout of Eager Beaver In order to compute the best possible move to play in a given state, every algorithm has to rely on an efficient way to calculate upcoming legal moves for each player, their goal values (points) and the consequent states. The algorithms walk through hundreds of thousands of different game situations, therefore it is important that these basic operations are done as fast as possible. Propositional Networks are developed by the Stanford Logic Group and are an intuitive representation of GDL rules as a bipartite graph. As a consequence possible graph structures can be exploited. A Propositional Network (PN) consists of propositions, which represent a part of the game state and can be either true or false. These propositions are connected to either boolean gates, logically connecting multiple propositions or to transitions, allowing the transition from one state into another. There are three different types of propositions 2 : Base propositions describe the state of a game, input propositions correspond to game moves and view propositions, determined by the GDL rules, which can be used to represent legal moves or goal values. Figure 2 shows the PN of the Two Button game. In this single-player game the player is able to press two different buttons A and B. If one button is pressed it remains pressed for the rest of the game. The goal is to press both buttons. The two base propositions (pressed A) and (pressed B) describe the whole state of this 2 In this year s upcoming competition GDL rules are enhanced by base and input relations, thus making it easier to compute the different propositions.

62 56 A. Doser, F. Geißer, P. Lerche, and T. Schulte game: either no button is pressed, A is pressed, B is pressed or both buttons are pressed. If the player presses button A the value of the input proposition (does player (press A)) becomes true. Its value is propagated through the network, so the OR, VIEW and TRANSITION node values become true. Remember that transitions allow the transition from one state into another. Since the input of the base proposition (pressed A) is true our new state consists of (pressed A). Note that the view propositions VIEW are just for a correct PN representation, because boolean gates have to be connected to propositions and the input of transitions has to be a proposition. To provide an efficient data structure for the aforementioned computations, Eager Beaver uses PNs extended by dynamic code generation, thus gaining a remarkable performance increase compared to the standard reasoner. For a PN we generate code in a way that redundant logic evaluations are omitted. For each proposition a method is generated that propagates its truth value by calling other methods. Furthermore boolean gates are represented as a byte array, thus we can efficiently save and load their values. By calling only methods of relevant propositions we avoid multiple computations of the same truth values. When we want to perform a basic operation, such as calculating the current goal values, we call the corresponding method in the generated class. This method runs very fast, since all it does is comparing and setting boolean values. We want to mention here that games with tens of thousands of propositions can be a problem for generating code with Java as underlying language, because Java has an upper bound of methods and characters per class. One solution is to split the proposition updates into multiple classes, like one class for all base proposition updates, one for all view proposition updates and so on. (does player (press A)) OR VIEW TRANSITION (pressed A) (does player (press B)) OR VIEW TRANSITION (pressed B) AND (both pressed) Fig. 2. A PN representation of the Two Button game [2] 2.2 UCT Algorithm In the last years, the Upper Confidence Bounds applied to Trees (UCT) algorithm has prevailed for most successful players. The idea for this technique arose

63 Eager Beaver A General Game Player 57 from the UCB1 algorithm proposed by Auer et al. [1] and is, in contrast to former concepts, based on a Monte-Carlo approach. After receiving a request to select a move a in a given game state, the algorithm gradually builds up a tree whose spread is biased by the UCB1 formula { a = argmax Q(s, a) + C a A(s) } ln N(s), N(s, a) where Q(s, a) denotes the average score for action a in state s, N(s) the accumulated visits of the parent state s, N(s, a) the accumulated visits of the successor s of s through a and C the constant for the weighting the UCT-bonus. Since it is generally impossible to represent the whole state space of a game, the UCT algorithm resorts on random playouts, namely Monte-Carlo simulations. The algorithm consists of four fundamental phases which are consecutively executed until the timeout is reached. In the Selection-Phase the tree is recursively traversed by using the UCB1 formula, which tries to find a balance between exploiting the most promising moves and exploring less encouraging moves. After finding a game state which hasn t been inspected before, the Expansion-Phase starts and its successor states are computed and added to the game tree. In the Simulation-Phase, one successor is selected and random simulations are run, thus obtaining a terminal state and receiving scores for each player. These scores are propagated through the tree in the Backpropagation-Phase. After the timeout is reached the most promising move, i.e. the move with the most visits, is selected from the game tree. Besides the plain UCT algorithm, Eager Beaver uses two major extensions: MAST and RAVE [3]. Furthermore the implementation allows to steer the move selection by adaptation of several constants, e.g. updating the UCB1 constant every rollout depending on the current node s wins and visits. 3 Results Eager Beaver is based on the GGP-Base framework by Sam Schreiber [8], which is written in Java 1.6 and provides several basic features, such as server communication via HTTP. Compared to the framework s default reasoner, the implementation of the Propositional Networks led to a significant increase in runtime performance. For example, in case of connect-4, state machine queries are computed around 25 times faster with Propositional Networks. For our benchmarks we competed against the general game player Centurio V2.1 on an Athlon SUN Fire X2200 M2 x64 with 2,3GHz per core (8 cores available in total) and 32GB RAM (although we used only 64MB). Each player received one core. We let Eager Beaver play with three different configurations (plain UCT, UCT with MAST and UCT with RAVE) against Centurio using default settings with 15 seconds lap time and 45 seconds preparation time. It seems that for the three chosen games, the increased computational costs of the UCT extensions didn t pay off compared to the higher number of plain UCT simulations.

64 58 A. Doser, F. Geißer, P. Lerche, and T. Schulte For the games we chose Tictac-toe, Connect 4 and Breakthrough Cylinder. Each game and configuration was played 1400 times. See results in Figure 3. % of games Tic-tac-toe Connect 4 Breakthrough c. 4 Conclusion 0 DEF MAST RAVE DEF MAST RAVE DEF MAST RAVE Win Tie Loss According to the present literature our implementation of Fig. 3. Three different games vs Centurio Propositional Networks with dynamic code generation seems to be unique so far. The results emphasize that this led to a great performance increase. Together with our UCT implementation Eager Beaver was able to defeat Centurio in the majority of the matches. The default UCT configuration was the most successful setting with which Eager Beaver seems to outperform Centurio. Although the field of general game playing is young, there are more improvements, which we could include in our player, like Propositional Network Factoring [2] or other UCT enhancements. We will participate in the next general game playing competition 3, starting July 22 this year. Moreover the source code of Eager Beaver is available at References [1] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3): , [2] Evan Cox, Eric Schkufza, Ryan Madsen, and Michael R. Genesereth. Factoring general games using propositional automata. Proceedings of the IJCAI-09 Workshop on General Game Playing (GIGA 09), [3] Hilmar Finnsson and Yngvi Björnsson. Cadiaplayer: Search-control techniques. KI, 25(1):9 16, [4] Michael R. Genesereth, Nathaniel Love, and Barney Pell. General Game Playing: Overview of the AAAI competition. AI Magazine, 26(2):62 72, [5] Levente Kocsis and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, ECML, volume 4212 of Lecture Notes in Computer Science, pages Springer, [6] Nathaniel Love, Timothy Hinrichs, David Haley, Eric Schkufza, and Michael Genesereth. General Game Playing: Game Description Language Specification. Technical report, Stanford Logic Group, March [7] Maximilian Möller, Marius Schneider, Martin Wegner, and Torsten Schaub. Centurio, a general game player: Parallel, Java- and ASP-based. KI, 25(1):17 24, [8] Sam Schreiber. The general game playing base package, [9] Michael Thielscher. GDL-II. KI, 25(1):63 66, For more information see

65 Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm Markus Goldstein and Andreas Dengel German Research Center for Artificial Intelligence (DFKI), Trippstadter Str. 122, Kaiserslautern, Germany {markus.goldstein,andreas.dengel}@dfki.de Abstract. Unsupervised anomaly detection is the process of finding outliers in data sets without prior training. In this paper, a histogrambased outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 standard algorithms show, that it can detect global outliers as reliable as state-of-theart algorithms, but it performs poor on local outlier problems. HBOS is in our experiments up to 5 times faster than clustering based algorithms and up to 7 times faster than nearest-neighbor based methods. 1 Introduction Anomaly detection is the process of finding instances in a data set which are different from the majority of the data. It is used in a variety of application domains. In the network security domain it is referred to as intrusion detection, the process of finding outlying instances in network traffic or in system calls of computers indicating compromised systems. In the forensics domain, anomaly detection is also heavily used and known as outlier detection, fraud detection, misuse detection or behavioral analysis. Applications include the detection of payment fraud analyzing credit card transactions, the detection of business crime analyzing financial transactional data or the detection of data leaks from company servers in data leakage prevention (DLP) systems. Furthermore, anomaly detection has been applied in the medical domain as well by monitoring vital functions of patients and it is used for detecting failures in complex systems, for example during space shuttle launches. However, all of these application domains have in common, that normal behavior needs to be identified and outlying instances should be detected. This leads to two basic assumptions for anomaly detection: anomalies only occur very rarely in the data and their features do differ from the normal instances significantly. From a machine learning perspective, three different scenarios exists with respect to the availability of labels [4]: (1) Supervised anomaly detection has a labeled training and test set such that standard machine learning approaches can be applied. (2) Semi-supervised anomaly detection uses a anomaly-free training set S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

66 60 M. Goldstein and A. Dengel consisting of the normal class only. A test set then comprises of normal records and anomalies, which need to be separated. The most difficult scenario is (3) unsupervised anomaly detection, where only a single data set without labels is given and the appropriate algorithm should be able to identify outliers based on their feature values only. In this paper, we introduce an unsupervised anomaly detection algorithm, which estimates densities using histograms. 2 Related Work Unsupervised Anomaly Detection: Many algorithms for unsupervised anomaly detection have been proposed, which can be grouped into three main categories [4]. In practical applications, nearest-neighbor based algorithms seem to be the most used and best performing methods today [1, 2]. In this context, outliers are determined by their distances to their nearest neighbors, whereas global [11] and local methods exist. A very well known local algorithm is the Local Outlier Factor (LOF) [3] on which many other algorithms are based on. Although some algorithms suggest speed up enhancements [5, 10], the basic run time for the nearest-neighbor search is O(n 2 ). The second category, clustering based algorithms can be much faster. Here, a clustering algorithm usually computes centroids and outliers are detected by having a large distance to the dense areas. CBLOF [6] or LDCOF [1] are using k-means as a clustering algorithm leading to a faster computation [2]. The third category comprises of statistical methods, both using parametric and non-parametric models for anomaly detection. Parametric models, for example computing Gaussian Mixture Models (GMM), are usually also very compute intense, depending on the used parameter estimation method. Non-parametric models, such as histograms or kernel-density estimators (KDE) can be used for anomaly detection, especially if a very fast computation is essential. Histograms in Network Security: In the network security domain it is required that results of outlier detection algorithms are available immediately. Furthermore, the data sets to be processed are very large. This is the reason why histograms are often used as a density estimator for semi-supervised anomaly detection [8]. If multivariate data has to be processed, a histogram for each single feature can be computed, scored individually and combined at the end [7]. In most of the proposed methods, a fixed bin width of the histogram is given or the bin widths are even defined manually. In this work we are using this basic idea and introduce an unsupervised anomaly detection algorithm based on histograms. Furthermore, we propose a dynamic bin-width approach to cover also very unbalanced long-tail distributions. 3 Histogram-based Outlier Score (HBOS) Besides network security, histogram-based outlier scoring might also be of interest for several other anomaly detection scenarios. Although it is only a com-

67 Histogram-based Outlier Score: Fast Unsupervised Anomaly Detection 61 bination of univariate methods not being able to model dependencies between features, its fast computation is charming for large data sets. The presented HBOS algorithm allows applying histogram-based anomaly detection in a general way and is also available as open source as part of the anomaly detection extension 1 of RapidMiner [9]. For each single feature (dimension), an univariate histogram is constructed first. If the feature comprises of categorical data, simple counting of the values of each category is performed and the relative frequency (height of the histogram) is computed. For numerical features, two different methods can be used: (1) Static bin-width histograms or (2) dynamic bin-width histograms. The first is the standard histogram building technique using k equal width bins over the value range. The frequency (relative amount) of samples falling into each bin is used as an estimate of the density (height of the bins). The dynamic binwidth is determined as follows: values are sorted first and then a fixed amount of N k successive values are grouped into a single bin where N is the number of total instances and k the number of bins. Since the area of a bin in a histogram represents the number of observations, it is the same for all bins in our case. Because the width of the bin is defined by the first and the last value and the area is the same for all bins, the height of each individual bin can be computed. This means that bins covering a larger interval of the value range have less height and represent that way a lower density. However, there is one exception: Under certain circumstances, more than k data instances might have exactly the same value, for example if the feature is an integer and a long-tail distribution has to be estimated. In this case, our algorithm must allow to have more than N k values in the same bin. Of course, the area of these larger bins will grow appropriately. The reason why both methods are offered in HBOS is due to the fact of having very different distributions of the feature values in real world data. Especially when value ranges have large gaps (intervals without data instances), the fixed bin width approach estimates the density poorly (a few bins may contain most of the data). Since anomaly detection tasks usually involve such gaps in the value ranges due to the fact that outliers are far away from normal data, we recommend using the dynamic width mode, especially if distributions are unknown or long tailed. Besides, also the number of bins k needs to be set. An often used rule of thumb is setting k to the square root of the number of instances N. Now, for each dimension d, an individual histogram has been computed (regardless if categorical, fixed-width or dynamic-width), where the height of each single bin represents a density estimation. The histograms are then normalized such that the maximum height is 1.0. This ensures an equal weight of each feature to the outlier score. Finally, the HBOS of every instance p is calculated using the corresponding height of the bins where the instance is located: d 1 HBOS(p) = log( hist i (p) ) (1) i=0 The score is a multiplication of the inverse of the estimated densities assuming independence of the features similar to [7]. This could also be seen as (the inverse 1 For source code and binaries see

68 62 M. Goldstein and A. Dengel of) a discrete Naive Bayes probability model. Instead of multiplication, we take the sum of the logarithms which is basically the same (log(a b) = log(a)+log(b)) and applying a log( ) does not change the order of the scores. The reason why we decided to apply this trick is that it is less sensitive to errors due to floating point precision in extremely unbalanced distributions causing very high scores. 4 Evaluation For a quantitative evaluation of HBOS on real world data, we evaluated the proposed method on three UCI machine learning data sets commonly used in the anomaly detection community. These data sets, the breast cancer data set and the pen-based (global and local) data set, have been preprocessed as in [1]. The receiver operator characteristic (ROC) is generated by varying the outlier threshold and the area under the curve (AUC) is used for comparison afterwards. Table 4 shows the AUC results for 11 different outlier detection algorithms. It can be seen, that HBOS performs quite well compared to other algorithms on the breast-cancer and pen-global data set. On the local anomaly detection problem, it fails, which is due to the fact that histograms cannot model local outliers with their density estimation. Algorithm Breast-cancer Pen-global Pen-local HBOS k-nn LOF Fast-LOF COF INFLO LoOP LOCI CBLOF u-cblof LDCOF Table 1. Comparing HBOS performance (AUC) with various algorithms using optimal parameter settings. AUC k-nn 0.7 LOF 0.65 COF 0.6 LoOP 0.55 INFLO 0.5 LOCI HBOS Fig. 1. Comparing AUCs of nearestneighbor based algorithms with HBOS. k is the number of nearest-neighbors and in HBOS the number of bins. k Besides comparing the outlier detection performance, also the run time of the algorithms was compared. Since the used standard data sets for evaluation are very small (e.g. only 809 instances in the pen-global data set), the experiment was repeated 10,000 times and the mean execution time was taken using an AMD Phenom II X6 1100T CPU with one thread only. The global k-nn method took 28.5ms on average and LOF took 28.0ms to process the pen-global data set. In general, all nearest-neighbor methods perform very similar since the highest effort in this algorithms is the nearest-neighbor search (O(n 2 )). As a 3 Not computable due to too high memory requirements for this dataset using LOCI.

69 Histogram-based Outlier Score: Fast Unsupervised Anomaly Detection 63 clustering based algorithm, LDCOF with k-means was used. The algorithm was started once with 30 random centroids. Using 10 optimization steps, an average run time of 20.0ms was achieved, with 100 optimization steps, which was our default setting for the performance comparison, the algorithm took 30.0ms. We expect clustering based methods to be much faster than nearest-neighbor based algorithms on larger data sets. However, HBOS was significantly faster than both: It took 3.8ms with dynamic bin widths and 4.1ms using a fixed bin width. Thus, in our experiments HBOS was 7 times faster than nearest-neighbor based methods and 5 times faster than the k-means based LDCOF. On larger data sets the speed-up can be much higher: On a not publicly available data set comprising of 1,000,000 instances with 15 dimensions, LOF took 23 hours and 46 minutes whereas HBOS took 38 seconds only (dynamic bin-width: 46 seconds). 5 Conclusion In this paper we present an unsupervised histogram-based outlier detection algorithm (HBOS), which models univariate feature densities using histograms with a fixed or a dynamic bin width. Afterwards, all histograms are used to compute an anomaly score for each data instance. Compared to other algorithms, HBOS works in linear time O(n) in case of fixed bin width or in O(n log(n)) using dynamic bin widths. The evaluation shows that HBOS performs well on global anomaly detection problems but cannot detect local outliers. A comparison of run times also show that HBOS is much faster than standard algorithms, especially on large data sets. References 1. Amer, M.: Comparison of unsupervised anomaly detection techniques. Bachelor s Thesis 2011, 2. Amer, M., Goldstein, M.: Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In: Proc. of the 3rd RCOMM Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. SIGMOD Rec. 29(2), (2000) 4. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41(3), 1 58 (2009) 5. Goldstein, M.: FastLOF: An expectation-maximization based local outlier detection algorithm. In: Proc. of the Int. Conf. on Pattern Recognition (2012) 6. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognition Letters 24(9-10), (2003) 7. Kim, Y., Lau, W.C., et al: Packetscore: statistics-based overload control against distributed denial-of-service attacks. In: INFOCOM vol. 4, pp Kind, A., Stoecklin, M., Dimitropoulos, X.: Histogram-based traffic anomaly detection. Network and Service Management, IEEE Transactions on 6(2), Mierswa, I., Wurst, M., et al: Yale (now: Rapidminer): Rapid prototyping for complex data mining tasks. In: Proc. of the ACM SIGKDD Papadimitriou, S., Kitagawa, H., et al: Loci: Fast outlier detection using the local correlation integral. Int. Conf. on Data Engineering p. 315 (2003) 11. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. pp. " ". SIGMOD 00

70 A Concept of a Reliable Three-Layer Behaviour Control System for Cooperative Autonomous Robots Christian Rauch 1, Tim Köhler 2, Martin Schröer 1, Elmar Berghöfer 2, and Frank Kirchner 12 1 University of Bremen, Robotics Research Group 2 DFKI GmbH, Robotics Innovation Center Robert-Hooke-Straße 5, Bremen, Germany Abstract. Especially in robotic applications with hardly accessible and unknown environments like deep-sea or space missions, two requirements are given: First, the robotic systems need to work reliably and, second, an autonomous operation would increase the efficiency a lot. While the demand for a reliable system behaviour could be satisfied through predefined and tested plans and models, the unknown environment impedes both. Presented is a concept of a reliable behaviour control system that is meant to be used in the context of a lunar or planetary space mission with a loosely-coupled group of autonomous robots. To realize a reliable system behaviour even in unknown environments, the proposed design follows a biologically inspired approach including adaptable prediction and self-evaluation components. 1 Introduction In the application fields of deep-sea and space robotics, the environment of the robots is usually not known at design time. However, in both cases the reliability of the robotic systems is an important requirement as human intervention (e.g., in case of malfunction of the system) is either impossible or would need a great effort. To use an autonomous system in such a reliability-demanding scenario, the system needs to be tested and verified at design time. Since the environment is unknown at this time such a prior verification is impossible, and therefore a remote control setup is chosen for most of the space robotic scenarios. Due to problems with data transmission in space and deep-sea missions (e.g., package loss or blackout phases), the system s efficiency is reduced a lot. In both application fields working in groups of robots and a parallel usage of an autonomous control together with a remote one could increase the reliability. On the one hand the reliability is given through the redundant sensor data processing by the autonomous robot group and the human operator in parallel. On the other hand fast emergency reactions could trigger even before the triggering sensor data is visible to the human operator. Moreover, in cases of communication breaks, the autonomous control could take over the system control. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

71 Reliable Three-Layer Behaviour Control for Cooperative Autonomous Robots 65 Based on biological and psychological findings and models, the proposed architecture is expected to be adaptive to the environment (e.g., being able to develop new behaviours) and robust in facing new situations (e.g., react on new circumstances). One drawback of many biologically inspired models (e.g., models that are implemented within a single neural net) is their cumbersome extensibility, as new functionality has to be trained and cannot easily be exchanged. Therefore, we present a modular biologically inspired architecture whose initial settings can be given in advance and whose module functionality is much more comprehensible. Additionally, it still benefits from the adaptability and flexibility of biologically inspired systems. 2 Related Work Several robotic behaviour control architectures have been proposed, including the traditional subsumption architecture [1] and extensions of three-layered models for multi-robot cooperation [2]. However, as behaviour control in animals and humans can be both very robust and efficient, another way to go for a reliable robot control system is to build on findings in biology or psychology. In both disciplines several specific effects have been studied but just a few covered an overall behaviour control. One example for an overall study (specifically behaviour switching) is the architecture proposed by Norman and Shallice [3]. It covers switching between attended and unattended task execution. Another example is the generation of expectations of action consequences based on internal models. These predictions can be compared with the actually sensed percepts during or post action execution (e.g., in biology [4], computational studies [5], [6] (review), and in robotic experiments [7]). Also a non-biologically inspired model-free execution monitoring approach has been proposed [8], which does not require a prediction model before task execution, but learns the normal and faulty execution of behaviours through a supervised learning phase. Exemplary robotic implementations based on the psychologically inspired behaviour control architectures by Norman and Shallice were proposed by Gurney et al. [9] and Garforth et al. [10]. Gurney et al. implemented the Norman and Shallice model with the application to autonomous vehicle control. The authors tested their system in simulations. Garforth et al. chose the architecture proposed by Norman and Shallice, too. They tested their implementation in simple two-wheeled-robot simulations. In addition, they tried to match the single function blocks of the architecture with specific regions of the human brain. Given that the Norman and Shallice implementation proposed by Garforth et al. is the most extensive one, we extend their architecture to fit our needs of robustness, modularity and flexibility. For robustness reasons, we introduce a monitoring of action execution in the lowest layer that is based on prediction models that can be learned in advance or at runtime. An advantage of learning prediction models over model-free monitoring approaches is the ability to use the inverse models of the prediction for planning the action execution. As Garforth et al. base their architecture on neural networks, it is difficult to extend the

72 66 C. Rauch, T. Köhler, M. Schröer, E. Berghöfer, and F. Kirchner behaviours and their connections to sensor percepts by hand, which impedes the manual design. To emphasise the modularity and flexibility of our architecture, we introduce the concept of triggers that was proposed by Norman and Shallice before, to manually connect sensor perception and behaviours by hand. We will additionally introduce a planning interface on the second layer as well as a cognitive layer on top of it. 3 Three-Layer Behaviour Control System In figure 1 a simplified overview of the proposed behaviour control system structure can be seen. The lowest layer (Reactive Layer) realizes solely reactive behaviours that are activated by specific perceptions. It contains a set of Actions that are selected by the Pre-Selection through triggers. All of these triggers are activated by a trigger condition in the raw and preprocessed sensor values that can further be processed in the Episodic Memory. Actions can be triggered in parallel and each trigger can select several actions. A weight is associated to each selected action that corresponds to the match of this action given the current perception. While executing an action, a so-called Exafference Computation compares the sensor states with a prediction based on an adaptable internal sensorimotor loop model (cf. [6]). The middle layer (Deliberative Layer) contains a Planner, a Semantic Memory for storing general knowledge (e.g., rules applicable at plan generation) and a Working Memory for previously generated plans. While executing a plan, a Monitor compares the current behaviour to be executed according to the plan with the actions to be executed by the lowest layer. If there is a discrepancy, the Deliberative Layer modulates the action selection in the Reactive Layer by the Behaviour Generation and Modulation to bias the selection of an intended action (i.e., actions matching to the current plan). The selection of Actions takes place in both layers in parallel and is then merged into a common list of candidates. Through this non-exclusive action selection, it is still possible that a reactive action is selected instead of an intended action due to a higher weight of this reactive action. This guides the robot system through potential harmful situations (e.g., damaging the hardware or being stuck in fine soil). This, together with the exafference monitoring, is supposed to lead to a reliable system. When the execution of a plan is interrupted, the Creative Cognition can reason about the cause by analysing recent sensor states. While reasoning, new required knowledge for the Semantic Memory might be generated that supports the Planner and the Behaviour Generation and Modulation to generate new plans and behaviours to solve a given situation. This reasoning can also cause the robot to explore the environment in order to fill the knowledge gap for resolving logical contradictions. When a plan is executed several times, a learning in the lowest layer can be triggered (see [10]). Consequently, the lowest layer learns an intended plan or a part of it together with the expected sensor consequences as a future triggerable reactive behaviour. The more often this plan is executed in that perceived state, the stronger the connection between sensor perception

73 Reliable Three-Layer Behaviour Control for Cooperative Autonomous Robots 67 Knowledge Difference Creative Cognition (Reasoning, Solver) Creative Cognition Distance to homeostatic State State Difference Perceptual Difference and Exafferences Semantic Memory (Knowledge, Rules, Model) Working Memory (Current State, Goal-State) Episodic Memory (+ Object Identification) Sensors (+ Preprocessing) Monitor (Goal Tracking) Deliberative Layer Pre-Selection (Trigger Database) Exafference Computation Reactive Layer Planner Behaviour Gen. and Modulation Actions (Reflexes, Habits, Behaviours) Motors (+ Contention Scheduler) Homeostasis Fig. 1. Overview of the proposed behaviour control architecture. and action is evolving. This learning over time leads to strong direct connections in the Reactive Layer, which are independent from the Deliberative Layer. For an application of the behaviour control system in a group of autonomous robots, a loosely-coupled cooperation without or with reduced explicit communication can be realized. For example, if a robot is recognized by another robot (via the Object Identification) a learned reactive behaviour or a plan execution can be triggered (e.g., go to recognized robot, unload the recognized robot ). In this group of autonomous robot systems, knowledge in the Semantic Memory can be exchanged. The Semantic Memory includes, amongst others, the world model and is independent from the sensor modality and the design of the robot. For this reason a robot can obtain needed knowledge from another heterogeneous robot by exchanging and merging world models. The functions of the components as well as the whole-model behaviour need to be examined separately for two different initial conditions: The Blank State and the Learned State. In the Blank State, all memories, parameters, and learnable connections are empty or randomly initialised. Additionally, as few reflexes, basic behaviours, and internal models as possible and biological plausible are predefined. Starting from this state, the system is supposed to gather all other needed behaviours and models on its own. Since learning on a real system will take a lot of time, a second state, the so-called Learned State, will be used for most of the tests and demonstrations. In the Learned State the system is supposed to have learned already several behaviours, plans, and models. For sending a robot system into a real mission, the robot can be trained from the Blank State into the Learned State within a simulator. Equipped with this Learned State it continues to extend its knowledge and behaviours in the actual environment. 4 Conclusions and Outlook Proposed is a biologically inspired reliable control architecture that realises pure reactive and simultaneously deliberative action selection. In addition to the ad-

74 68 C. Rauch, T. Köhler, M. Schröer, E. Berghöfer, and F. Kirchner vantages of such biologically inspired architectures (reliability, robustness, flexibility) it also supports the manual design. It is especially designed for missions with groups of loosely coupled robots, where reliable cooperation is a high demand. The self-evaluation through prediction on the lower layer can be used to ensure that the system prevents dangerous states. At the same time the upper layers cover goal tracking and knowledge management also in cooperation with other even heterogeneous robot systems within a mission team. A prototypical implementation is ongoing and first simulation results of a behaviour switching and the biasing through the deliberative layer were evaluated. The next step is to carry out these experiments on a real robotic platform in a lunar crater model. The scenarios will be designed so that the robot has to adapt to the situation by changing weights of the actions to fulfil its mission. The whole architecture will be evaluated according to these scenarios and comparisons to recently proposed architectures will be made. Acknowledgment Supported by the Federal Ministry of Economics and Technology on the basis of a decision by the German Bundestag, grant no. 50RA1113 and 50RA1114. References 1. Brooks, R.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation 2(1) (1986) Simmons, R., Smith, T., Dias, M.B., Goldberg, D., Hershberger, D., Stentz, A., Zlot, R.: A layered architecture for coordination of mobile robots. In: Multi-Robot Systems: From Swarms to Intelligent Automata, Proceedings from the 2002 NRL Workshop on Multi-Robot Systems, Kluwer (2002) 3. Norman, D.A., Shallice, T.: Attention to action: Willed and automatic control of behaviour. In Davidson, R.J., Schwartz, G.E., Shapiro, D., eds.: Consciousness and selfregulation. Volume 4. Plenum Press (1986) Holst, E., Mittelstaedt, H.: Das Reafferenzprinzip. Naturwissenschaften 37 (1950) Wolpert, D., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Networks 11(78) (1998) Miall, R., Wolpert, D.: Forward models for physiological motor control. Neural Networks 9(8) (1996) Schenck, W., Hoffmann, H., Möller, R.: Grasping of extrafoveal targets: A robotic model. New Ideas in Psychology 29(3) (2011) Pettersson, O., Karlsson, L., Saffiotti, A.: Model-free execution monitoring in behavior-based robotics. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 37 (2007) Gurney, K., Hussain, A., Chambers, J., Abdullah, R.: Controlled and automatic processing in animals and machines with application to autonomous vehicle control. Artificial Neural Networks ICANN 2009 (2009) Garforth, J., McHale, S., Meehan, A.: Executive attention, task selection and attention-based learning in a neurally controlled simulated robot. Neurocomputing 69(16 18) (2006)

75 Dataset Generation for Meta-Learning Matthias Reif, Faisal Shafait, and Andreas Dengel German Research Center for Artificial Intelligence, Trippstadter Str. 122, Kaiserslautern, Germany Abstract. Meta-learning tries to improve the learning process by using knowledge about already completed learning tasks. Therefore, features of dataset, so-called meta-features, are used to represent datasets. These meta-features are used to create a model of the learning process. In order to make this model more predictive, sufficient training samples and, thereby, sufficient datasets are required. In this paper, we present a novel data-generator that is able to create datasets with specified meta-features, e.g., it is possible to create datasets with specific mean kurtosis and skewness. The publicly available datagenerator 1 uses a genetic approach and is able to incorporate arbitrary meta-features. 1 Introduction Meta-learning or learning to learn uses previously gathered knowledge about a learning task in order to provide an automatic selection, recommendation, or support for a future task. One intensively investigated meta-learning field is algorithm or model selection: for a new dataset, one or more suitable algorithms are selected or recommended based on the knowledge about the suitability of algorithms on other datasets. Common approaches for making recommendations use classification [1], regression [10], or ranking [2]. A different meta-learning task supports parameter optimization of learning algorithms [9]. All these approaches use characteristics or properties of datasets as foundation for the actual meta-learning. These properties of datasets are typically called meta-features. Different groups of meta-features have been proposed in the literature: simple meta-features [5] are directly extractable from the dataset such as the number of features or the number of samples. Statistical meta-features [4] use statistical measures of the probability distributions such as the kurtosis or the skewness. Information-theoretic meta-features [3] are based on the entropy such as the joint entropy between feature and class label or the mutual-information. Model-based and landmarking meta-features build a model from the dataset. While landmarking [7] uses the performance achieved by this model as metafeature of the dataset, model-based meta-features [6] are diverse properties of this model. Typical, model-based meta-features are properties of a decision tree such as its width or depth. 1 S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

76 70 M. Reif, F. Shafait, and A. Dengel The meta-features construct the feature space for the meta-learning. As for any pattern recognition method, it is problematic if this space is highdimensional and only sparsely populated. Hence, a sufficient number of datasets is required. Since real-world datasets are rare and hard to obtain, artificially created datasets might solve the issue. In this paper, we present a novel datagenerator that is specially designed to support the investigation and development of meta-learning approaches. It is able to generate datasets with user-defined values of the meta-features, e.g. with a certain mean kurtosis and a certain Naive Bayes accuracy. Being able to generate datasets with specific meta-features, meta-learning can be supported in two different ways: Sparse Feature Space: Typically, many meta-features are extracted and the high-dimensional feature space is only sparely populated. Randomly generated datasets can not ensure that they are sufficiently distributed over the meta-feature space. Using the presented generator, the meta-feature space can be filled in a more controlled way and discovered empty areas can be populated. Investigation of Meta-features: Most of the presented meta-features have not been thoroughly investigated according to their descriptive power for a certain meta-learning task. Generating datasets with specific values of metafeatures allow more controlled experiments that might lead to conclusions about the usefulness of particular meta-features. 2 Design We treat the data generation as an optimization problem. A candidate solution is a specific dataset defined by its data points. Since we want the dataset to fulfill multiple meta-feature requirements, this is a multi-objective optimization problem. Therefore, we constructed a single aggregate objective function using a weighted sum over the meta-feature vector x of size n, the vector of desired values y, and the vector of the according weights w: f(x) = n w i x i y i. (1) i=1 This objective function measures the difference of the measured meta-features of a dataset and the desired values. For solving this minimization problem, we use a genetic algorithm, that mutates datasets by shifting data points and recombines two datasets by swapping fractions of them. Since we know that the best possible value of the objective function is zero, we can stop the genetic algorithm if it achieves a certain interval. The presented generator is able to incorporate a variable set of arbitrary meta-features. The user is able to build a custom set of meta-features by simply providing the functions computing the meta-features. However, the number of samples of each class and the number of features are not optimized but fixed

77 Dataset Generation for Meta-Learning 71 in advance and used for creating the random start population of the genetic algorithm. For each feature and class, values are sampled either from a normal or a uniform distribution using random parameters (mean and variance or minimum and maximum, respectively). Since features are typically normalized anyway, we fixed the range for the parameters of the probability distributions. 3 Implementation We implemented the data generator in Python because it is easy to use and becomes more and more popular in scientific computing. As implementation for the genetic algorithm, we used DEAP [8]. This framework already provides different variants for the components of a genetic algorithm, such as different selection and cross-over schemes. Additionally, parallel and even distributed computation is easily possible as well. The data generator uses Gaussian mutation and two-point cross-over. Adding new meta-feature is done by calling add_measure with at least two parameters: a tupel of functions and the desired value. The functions are successively applied, e.g., the tuple (kurtosis, mean) will compute the kurtosis of all features first, and, then, calculate the mean. This avoids the definition of additional aggregation functions and enables a caching mechanism to save computation time. If, e.g., the minimum and the maximum of the kurtosis are added, the kurtosis itself is computed only once. The remaining parameters of add_measure are optional: The third parameter defines if the measure also requires the label as input. The last parameter defines the weight of the measure for the objective function. By default, a value of 1.0 is used. 4 Examples For illustration of the presented approach, we applied it for the generation of datasets with the following properties: #classes: 2 mean skew: 0.0 naive bayes: 0.8 #features: 2 min skew: -0.8 nearest neighbor: 0.9 #samples: 400 max skew: 0.8 We used a weight of 2.0 for the naive bayes and the nearest neighbor accuracies and 1.0 for the remaining properties. The genetic algorithm uses a population size of 100 and was stopped if the error was below Figure 1 shows the results of four different runs of the data generator. For each run, the final dataset as well as the error of the five optimized properties (#classes, #features, and #samples are fixed) over the generations are plotted. It is notable that different runs generate quite different datasets although all datasets have the same specified characteristics.

78 72 M. Reif, F. Shafait, and A. Dengel error error error mean skew min skew max skew naive bayes nearest neighbor generation mean skew min skew max skew naive bayes nearest neighbor generation mean skew min skew max skew naive bayes nearest neighbor generation error mean skew min skew max skew naive bayes nearest neighbor generation Fig. 1. Generated datasets and multi-criteria error over generations for the same datasets characteristics.

79 Dataset Generation for Meta-Learning 73 5 Conclusion We presented a novel data generator for creating datasets with specific characteristics that can be used for the development and evaluation of meta-learning systems. Its Python implementation is open-source and publicly available. The current version is limited to numerical features and classification datasets. References 1. Ali, S., Smith, K.A.: On learning algorithm selection for classification. Applied Soft Computing 6, (January 2006) 2. Brazdil, P.B., Soares, C.: Zoomed ranking: Selection of classification algorithms based on relevant performance information. In: Proc. of Principles of Data Mining and Knowledge Discovery PKDD. pp (2000) 3. Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: Characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) Modeling Decisions for Artificial Intelligence, Lecture Notes in Computer Science, vol. 3558, pp (2005) 4. Engels, R., Theusinger, C.: Using a data metric for preprocessing advice for data mining applications. In: Proc. of the European Conf. on Artificial Intelligence. pp (1998) 5. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994) 6. Peng, Y., Flach, P., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C. (eds.) Discovery Science, Lecture Notes in Computer Science, vol. 2534, pp (2002) 7. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proc. of the 17th Int. Conf. on Machine Learning. pp (2000) 8. Rainville, F.M.D., Fortin, F.A., Gardner, M.A., Parizeau, M., Gagné, C.: Deap: A python framework for evolutionary algorithms. In: EvoSoft Workshop, Companion proc. of the Genetic and Evolutionary Computation Conference (GECCO 2012) (July 2012) 9. Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Machine Learning 87(3), (2012) 10. Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Analysis and Applications (2012), /s z

80 Meta 2 -Features: Providing Meta-Learners More Information Matthias Reif, Faisal Shafait, and Andreas Dengel German Research Center for Artificial Intelligence, Trippstadter Str. 122, Kaiserslautern, Germany {matthias.reif,faisal.shafait,andreas.dengel}@dfki.de Abstract. Meta-features are used to describe properties and characteristics of datasets and construct the feature space for meta-learning. Many of the different meta-features are defined for single variables and, therefore, are computed per feature of the dataset. Since datasets contain different numbers of features but meta-learning requires feature vectors of the same size, such measures are typically simply averaged over all columns. In this paper, we present an approach of preserving more information of such meta-features while producing a feature vector with a fixed size. An additional level of features are extracted from the meta-features. 1 Introduction Meta-features are a well known concept in the meta-learning domain. They are measures calculated on a dataset in order to describe its properties and characteristics. Meta-features construct the feature space in which each dataset is represented as a point. Multiple datasets as points within this feature space are used as training data for meta-learning: Knowledge about these datasets (e.g. the best performing classifier) is used to infer knowledge about a new dataset, e.g. predicting the best performing classifier. Statistical pattern recognition methods are applied to create a model that is able to make the desired prediction by applying the model on the meta-features of a new dataset. Using meta-features, various meta-learning tasks have been developed. The most prominent meta-learning problem is model or algorithm selection, that has been addressed by applying classification [1], regression [11], and ranking [4], but also parameter optimization can be tackled by meta-learning [10]. Commonly used types of meta-features are statistical and information-theoretic measures [7, 5, 6, 12]. Two statistical meta-features that are often used are the skewness and the kurtosis. The entropy and the joint-entropy are two simple examples of information-theoretic meta-features. Other types of meta-features are landmarking [9, 2] and model-based features [3, 8]. An issue of many statistical and information-theoretic meta-features is that they are defined on single features of the dataset. Computing such measures for all features leads to a different number of values for datasets with different S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

81 Meta²-Features: Providing Meta-Learners More Information 75 numbers of features. Additionally, the meta-feature vectors for datasets with the same number of features are not useful because the order of the features have an influence on the meta-features but obviously not on the characteristics of the dataset. If the meta-features are calculated per feature and additionally per class [7, 12], this issue is further strengthened. Therefore, such meta-features are typically averaged [7, 5, 12]. This leads to a meta-feature vector with the same size and semantics for differently sized datasets, but also to an high loss of information. Spiliopoulou et al. [13] proposed to use the minimum, maximum, and standard deviation of the number of examples per class, the number of distinct values of the attributes, and the number of missing values of the attributes in addition to the average value. Using the minimum, maximum, and the standard deviation in addition increases the amount of information about the dataset. In our paper, we go one step further and propose to use meta-features of meta-features in order to keep as much information as possible. This can be seen as an generalization of the meta-features used by Spiliopoulou et al. 2 Approach The proposed approach is divided into two steps. First, the per-feature metafeatures are calculated for each feature. They are collected and construct an intermediate dataset where each column is a meta-feature (e.g. skewness) and each line is a feature of the original dataset. The value of a cell is the meta-feature value of the original feature. While the number of features of this intermediate dataset is the number of meta-features used and, therefore, the same for each original dataset, the number of instances is differently. In the next step, meta-features of this intermediate dataset are calculated. This might be a subset of the meta-features of the previous step. For example, the entropy of the kurtosis values of the features might be computed. This step leads to a single vector with the same length also for original datasets with different number of features. The two steps of the presented approach are illustrated in Figure 1: The meta-features skewness, kurtosis, entropy, and mutual information are calculated for each of the three features of the original dataset. These values construct the intermediate dataset with four columns and three rows. Afterwards, the minimum, maximum, mean, standard deviation, skewness, kurtosis, and entropy are calculated for each of the previous meta-features. This leads to 4 7 = 28 meta 2 -features. Since the mean is also calculated, the set of meta 2 -features also contains the traditional meta-features and the measures of Spiliopoulou et al. [13]. Of course, other meta-features such as landmarking can be added to the vector as well. Figure 2 shows the distribution of eight features for three artificial datasets as an illustrating example. All three datasets have a similar mean skewness of about However, meta 2 -features are able to describe the difference of the datasets:

82 76 M. Reif, F. Shafait, and A. Dengel Fig. 1. The presented approach uses two steps: first, the meta-features of each feature construct an intermediate dataset from which the final meta 2 -features are calculated (a) mean skewness: 1.37 skewness of skewness: kurtosis of skewness: (b) mean skewness: 1.38 skewness of skewness: kurtosis of skewness: (c) mean skewness: 1.37 skewness of skewness: kurtosis of skewness: Fig. 2. The distribution of eight features for three artificial datasets: While the mean skewness is almost the same, the meta 2 -features show significant differences. both the skewness of the skewness values and the kurtosis of the skewness values show a significant difference. Since the approach leads to an increased amount of meta-features while the usefulness of each single meta-feature is not proven, an automatic feature selection method should be applied in order to select the most useful ones. It was previously shown that automatic feature selection can improve the performance of meta-learning [11, 14]. 3 Conclusion We presented a novel approach of constructing more informative meta-features using a two-stage method based on traditional meta-features. The proposed meta 2 -features are able to describe differences over datasets that are not accessible using the typically used mean of meta-measures, only. An additional feature selection method is suggested in order to automatically select the most useful measures.

83 Meta²-Features: Providing Meta-Learners More Information 77 References 1. Ali, S., Smith, K.A.: On learning algorithm selection for classification. Applied Soft Computing 6, (January 2006) 2. Bensusan, H., Giraud-Carrier, C.: Discovering task neighbourhoods through landmark learning performances. In: Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery. pp (2000) 3. Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to metalearning. In: Proc. of the ECML 2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination. pp (June 2000) 4. Brazdil, P.B., Soares, C.: Zoomed ranking: Selection of classification algorithms based on relevant performance information. In: Proc. of Principles of Data Mining and Knowledge Discovery PKDD. pp (2000) 5. Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: Characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) Modeling Decisions for Artificial Intelligence, Lecture Notes in Computer Science, vol. 3558, pp (2005) 6. Engels, R., Theusinger, C.: Using a data metric for preprocessing advice for data mining applications. In: Proc. of the European Conf. on Artificial Intelligence. pp (1998) 7. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994) 8. Peng, Y., Flach, P., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C. (eds.) Discovery Science, Lecture Notes in Computer Science, vol. 2534, pp (2002) 9. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proc. of the 17th Int. Conf. on Machine Learning. pp (2000) 10. Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Machine Learning 87(3), (2012) 11. Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Analysis and Applications (2012), /s z 12. Segrera, S., Pinho, J., Moreno, M.: Information-theoretic measures for metalearning. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) Hybrid Artificial Intelligence Systems, Lecture Notes in Computer Science, vol. 5271, pp (2008) 13. Spiliopoulou, M., Kalousis, A., Faulstich, L.C., Theoharis: Noemon: An intelligent assistant for classifier selection. In: 13th German Workshop on Machine Learning (August 1998) 14. Todorovski, L., Brazdil, P., Soares, C.: Report on the experiments with feature selection in meta-level learning. In: Brazdil, P., Jorge, A. (eds.) Proceedings of the PKDD-00 Workshop on Data Mining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions. pp (September 2000)

84 Organizational Social Network Analysis Case Study in a Research Facility Wolfgang Schlauch 1, Darko Obradovic 2, and Andreas Dengel 1,2 1 University of Kaiserslautern, Germany 2 German Research Center for AI (DFKI), Kaiserslautern, Germany Abstract. In this paper we address the need of the extension of social network analysis from online social networks to companies. We conducted an analysis in a German research facility and analyzed the network in several dimensions. While we found that inside the work groups were no problems easy to spot we found that the communication between the groups was pretty low. Aside from this fact the facility in question has a working communication network. 1 Introduction Companies have recognized the emergence of social networks and are applying a similar behavior for the evaluation of their employees, as well as they changed their recruting behavior. There are attempts to include organizational science in companies (Scott, 2000). Still, there are few companies really interested in their internal network. They have an organizational chart and believe that knowledge flows along its structure. That is rather untrue as Cross and Parker showed several times for different U.S. companies. Knowledge management might be highly improved if existing connections could be used more efficiently as well as companies could use the knowledge they get from their internal structure to form better teams for new projects. Leveraging these potentials is up to now only done intuitively and could use these informations to a certain extent. In this paper we conducted an online survey at a German research facility and analyzed their internal network. We will explain the measures we took, followed by some of the facts we found. Finally, we will discuss future improvments for this research as well as some problems that emerged. 2 Survey We conducted an online survey for a time period of one and a half months to guarantee participation possibility to all employees. They were asked to give some of the co-workers they rely on in their daily processes, but also people they estimated to be potentially helpful although they did not communicate with them. It is important to note that we did not hand them a list to check the people they work with, but rather gave them the possibility to ponder the question of who is important for themselves. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

85 Organizational Social Network Analysis Case Study in a Research Facility 79 We collected four dimensions of the social behavior inside the company. First of all, what do they know of the capabilities of their chosen co-workers, since we figured this to be a crucial part for choosing them. Secondly, we were interested in the ease of access to those persons. Furthermore we asked for the perceived engagement in a posed question, if the answers are short but precise or too extensive. And last but not least we wanted to know if the communication with the considered persons changed recently, i.e., if they increased or decreased the communication with their peers. We also asked for some information about the relationship to the other persons, i.e., how long they know them and how often they interact with them. These dimension are mostly based on a similar survey conducted by Cross and Parker (2004) but also other ideas were taken into account like the metaphor of the learning organziation and team (Page et al., 1999). 3 Results We sent the survey to two research departments of a company. We actually started in a small setting in one city to see the results and judge if it is worth to perform the analysis with the whole company. A total of 67 employees got the survey and we got 48 responses, resulting in an anwser rate of about 70%, which is a reasonable result to start with. All but one respondent belonged to the scientific staff and only two of the respondents were females. Nevertheless, the results will give a better understanding of the communication network if everyone participates. The average respondent named between 3 to 5 persons as contacts, the maximum number of named persons was 22. On the other hand, only 14 people were named for being not contacted sources of knowledge. This indicates a good structure and a good comunication profile for the company. This is shown in the other answers as well. Less than a quarter of the company needed to communicate more with a person to become more efficient. Considering only those respondents who need the improved communication urgently, the number drops down to only 14 members. Apart from this, there were no obvious deficits in the answers. Keeping in mind that the perception of other people always depends on the person and we can not assume that everyone had the same reasoning process, we analyzed the survey data we gathered in the course of one month. We had several hypotheses how the network would look like. (a) Inside a work group the connectivity will be high, between groups the connectivity will be low. (b) The number of outgoing-connections will be in average between 5 and 8. (c) No group will have a density of one. We found hypothesis (a) to be less true than we thought. Between the two major groups we surveyed, there were not as many connections as we expected. Nevertheless, one group had a sub-group with a different research topic. The

86 80 W. Schlauch, D. Obradovic, and A. Dengel connectivity between the sub-group and its main group, called group A, was suprisingly high and almost everybody of the subgroup had at least one person to talk to in the main group. In Figure 1 one can see the main group A in the upper left corner, the subgroup in the upper right and group B is in the lower left corner. On the other hand, between the sub-group and the other main group, called group B, there were no connections at all. We are sure this is not caused by their physical distance since the survey was conducted in one building. Fig. 1. Company-network. The darker a node the more between, the larger the size of the node the higher the closeness centrality. Hypothesis (b) is true. The average number of outgoing connections (listed person as contact) as well as the average of incoming connections (was listed as contact) was in the expected region. The in-degree has an average of 6.3, the out-degree had an average of 7.9. The difference is reasoned by the fact that persons who were named but did not respond are included in the network. The difference is easy to explain. Some people are more important, so they had a higher number of connections coming in. One might think of the heads of the groups. Other persons were more communicative, they had much more contacts to talk to and hence caused a higher value for the average out-degree. Interest-

87 Organizational Social Network Analysis Case Study in a Research Facility 81 ingly, two persons who work in different groups have direct contact with each other and a good relationship, still they are also very communicative inside their own groups. In fact, they both have the highest number of outgoing connections in their group. They do not have the highest number of incoming connections. It is thus fair to assume that they have a high amount of knowledge they want to spread, or they consider other persons as sources of knowledge and distraction. Distraction is considered to increase productivity or creativity (Fisher, 2006; Wieth and Zacks, 2011) so it should be encouraged to talk to different people. Hypothesis (c) is true for the communication network if we only take into consideration who considered whom in the survey. If we take the connections without directions we get a different result. The sub-group of group A shows again the different behavior with a fully connected group resulting in a density of one. Even in the former view the sub-group is almost at a density of one. This shows that the communication and also the relations in this group are much better developed, and this might give them an advantage in distribution of knowledge. In communication with some of them we found that they usually go to lunch as a group or did other activities together. This does not only improve the social binding in the group but also gives them the former mentioned distraction bonus. As an interesting side note one might figure that the answers were evenly distributed throughout the company. In reality, group A had a much higher response rate (85%) than group B (49%). Summarized, up to this point the results of the survey were interesting and sometimes even surprisingly good. The company seemed to have no need to change the structure of their work since the network showed no fatal flaws and was in some regards beyond expectations. Nevertheless, we were interested in what would happen if one of the persons would leave the company. We decided to remove the persons with the most outgoing connections from the network and investigate in a theoretic scenario what would happen. The network diameter increased from 7 to 8, the average degrees both fell to a much lower value. It seems, even if the person is not talked to much, she is important for information and knowledge distribution. Furthermore, the person is one of the main connectors between group A and group B. Without the person the communication almost ceases. It is even more important to note that every person in the company has a special knowledge. For the person in consideration it would be bad if he would leave but due to his communicative personality it is fair to assume that she also spreads some of his knowledge and gives information where to look for further information about her special knowledge. If we remove another person from the network, one with a normal or even low connectivity rate who is on the outside of the company, it might not harm the network structure but it will also reduce the knowledge pool of the company.

88 82 W. Schlauch, D. Obradovic, and A. Dengel 4 Further research Overall, the research done on this topic is quite scant. There is much research done in the field of network analysis, but in order to apply the algorithms and theories developed on a real company might either be only appropriate enough to show that the algorithms are working correctly. In social sciences there is also much research done on work relations and on differences between people, but since this was performed in a thesis for computer science the effort for research was more on network theory than on social sciences. Nevertheless, we mentioned that a higher response rate would give a better view on the network. To get more answers it appears that being known in person by the subjects is a good way. We suggest to go with both, interviews and an online survey with different question sets. In the additional interview can the answers be investigated (only with non-anonymized surveys) or other question could be posed for specific persons, including some made up persons or persons who left the company to see if the interviewed person is talking truth or if he is telling the interviewer what he thinks they want to hear. On the algorithmic side it could be interesting to predict the network structure. There are several algorithmic ideas, which are able to either track changes in networks (Rosvall and Bergstrom, 2008) or to predict future behaviour of participants (Deffuant et al., 2012). With a combination of those or similar algorithms and repeated surveys, the employer might be able to predict the behavior of his employees. References Cross, R. and Parker, A. (2004). The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations. Harvard Business School Press. Deffuant, G., Carletti, T., and Huet, S. (2012). The leviathan model: Absolute dominance, generalised distrust and other patterns emerging from combining vanity with opinion propagation. Fisher, A. (2006). Be smarter at work, slack off. Page, D., Zorn, T., and of Waikato. Dept. of Management Communication, U. (1999). Nuts about Change: Multiple Perspectives on Organizational Change Communication. Rosvall, M. and Bergstrom, C. T. (2008). Mapping change in large networks. Scott, W. (2000). Institutions and Organizations. Foundations for Organizational Science. SAGE Publications. Wieth, M. B. and Zacks, R. T. (2011). Time of day effects on problem solving: When the non-optimal is optimal. Thinking & Reasoning, 17(4):

89 Object Recognition with Multicopters Falk Schmidsberger and Frieder Stolzenburg Hochschule Harz Automation and Computer Sciences Department Friedrichstr D Wernigerode Germany Abstract. Data acquisition with semi-autonomous flying robots, e.g. multicopters, has several advantages over conventional inspections or aerial photographs. However, in order to facilitate the handling of the flying robot for the pilot, it seems to be appropriate to employ semantic object recognition, making the robot more autonomous. In this paper, we therefore report ongoing work on applying semantic object recognition, where the image recognition procedure works as follows: Each object in an image is composed of segments with different shapes and colors. In order to recognize an object, e.g. a plane, it is necessary to find out which segments are typical for this object and in which neighborhood of other segments they occur. Typical adjacent segments for a certain object define the whole object in the image. A hierarchical composition of segment clusters enables model building, taking into account the spatial relations of the segments in the image. The procedure employs methods from machine learning, namely clustering and decision trees, and from computer vision, e.g. image pyramid segmentation and contour signatures. Keywords: Multicopters, Semantic Object Recognition, Machine Learning, Computer Vision, Applications 1 Introduction Mobile data acquisition with unmanned autonomous-flying systems (UAS) is an inexpensive alternative compared with conventional aerial photography. Since these systems often are equipped with multiple sensors, it seems to be a good idea to improve the autonomy of such vehicles, because otherwise the personnel may not be able to control the whole robot system. In addition, it is not always possible to obtain radio contact during the flight. This means, it might be important to detect known interesting objects automatically during flight, and then to sense the environment around these objects or making photographs, whatever is appropriate. In the sequel, we will introduce the procedure for semantic object recognition in more detail. This research has been partially supported by the grant ZIM KF HM1 from the German ministry of economics in the Airmeter project. Preliminary work has been reported in [14]. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

90 84 F. Schmidsberger and F. Stolzenburg 2 Related Works The problem of recognizing and locating objects is very important in applications such as robotics and navigation. Therefore, there are numerous related works. The survey [5] reviews literature on both the 3D model building process and techniques used to match and identify free-form objects from imagery, including recognition from 2D silhouettes. [7] describes shape surfaces by curves and patches, represented by linear primitives, such as points, lines, and planes. Results are presented for data obtained from a laser range finder. Hence, these results cannot be transferred directly to the analysis of video camera images, as done here. [12] presents an object recognition system that uses local image features, which are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. This proposed model shares properties with the object recognition in primate vision. A nearest-neighbor indexing method is employed that identifies candidate object matches. [13] describes a model-based recognition system of planar objects based on a projectively invariant presentation of shape, using projective transformations. Index functions are used to select models from a model base, exploiting object libraries. However, for general semantic object recognition as considered here, fixed object libraries are certainly not sufficient. In [11], another approach for modeling visual context is introduced. The authors consider the leaves of a hierarchical segmentation tree as elementary units. This method has similarities to the one presented here. The approach here, however, exploits normalized contour feature vectors in the semantic object recognition method, which we will explain now. 3 Object Recognition Each object in a digital image is composed of a number of segments with different shapes and colors. In order to recognize an object, it is necessary to find out which segments are typical for which object and in which segment neighborhood they occur. If such a segment in a characteristic neighborhood is found, it is considered as part of the object. Typical adjacent segments for a certain object constitute the whole object in the image and allow its identification. The data mining methods clustering, decision trees and boosting are used to implement this approach [2, 4, 9]. We extract the image segments by their colors in two steps. For each pixel in the image the similar neighboring pixels are colored with an uniform color by a flood fill algorithm. With an image pyramid segmentation algorithm the shapes of the resulting blobs of uniform color are extracted as the image segments [3]. 3.1 Segment Feature Vectors To process the segments of an image, a normalized feature vector is computed for each segment. The normalized feature vector of a segment pixel set comprises the data of four normalized distance histograms and is computed from the segment

91 Object Recognition with Multicopters 85 contour. A distance histogram consists of a vector, where each element contains the distance between the centroid of the segment, i.e. the center of gravity, and a pixel in the segment contour or the distance between two pixels in the segment contour. These distance histograms are computed with the following three related methods: polar distance, contour signature and ray distance [1, 10, 15]. For the polar distance, fixed angle steps are used to select individual pixels from the segment contour with the maximum distance to the centroid of the segment. For non-convex segments, if there is no pixel with the actual angle, the pixel with the angle + π and the minimum distance to the contour is chosen. All selected pixels p are stored in the pixel set B and the distance of each pixel to the centroid is stored in the vector MP D (maximum polar distance) with a constant number of elements for each segment. In the contour signature histogram vectors, M CD (maximum contour distance) and M incd (minimum contour distance), the distance of each pixel in B to the corresponding opposite segment contour pixel is stored. In this case, the straight line between the two pixels has to have a 90 angle to the tangent through the actual pixel in B. The corresponding opposite pixel is the pixel with the greatest distance to p for MCD (minimum distance for MinCD). MCD and MinCD have the same cardinality as MP D. In the ray distance histogram, the distance of each Pixel in B to the corresponding segment contour pixel is stored. Here, the centroid is on the straight line between the two pixels and the result is the vector MCCD (maximum center contour distance) with the same cardinality as M P D. 3.2 Feature Vector Normalization In most cases, the distance histograms have different values even for the same segment, when this is rotated or resized. To get a normalized segment feature vector, each distance histogram has to be normalized by shifting the distance values of the vector, so that the angle with the maximum value and the maximum angle difference to the next angle with the maximum value is the first element in the feature vector. In a second step, the distance values itself are normalized to [0.0, 1.0], with the respective maximum distance value. After the normalization, all four distance vectors are joined to the new feature vector V which is invariant against translation, rotation and resizing. 3.3 Clustering and Decision Trees In order to reduce the number of feature vectors, clustering algorithms (k-means and agglomerative) are used to build a cluster model [2, 8]. Each resulting cluster represents a set of similar feature vectors, identified by its respective centroid. The cluster model is used to decide the cluster affiliation for a new given segment feature vector. For all segments in one image, the cluster numbers for each segment are computed and stored in a segment cluster tree (cf. Fig. 2). The root node of the tree represents the image itself. A child node represents a segment which is immediate part of the segment of the upper level. Nodes on the same level

92 86 F. Schmidsberger and F. Stolzenburg are marked as neighbors by dotted lines, if the corresponding segments in the image are connected. Different colors in the cluster assignment visualization mean different levels in the segment hierarchy (cf. Fig. 1). image C1 C4 C2 C4 C7 C4 C3 C4 C2 C6 C3 C8 C6 C5 C5 C7 C5 C5 C4 C5 C4 C4 C5 C4 C7 C1 C8 C5 C7 C7 C7 C5 Fig. 1: Segment Cluster Assignments. Fig. 2: Segment Cluster Tree. We can now extract five different types of feature vectors from the segment cluster tree: The first one contains all paths from a leaf node to the root, the second one all child nodes of a node one level above, the third one all nodes marked together as neighbors, and the fourth one all child nodes of all nodes in the tree. The last feature vector contains the numbers of each cluster, found in the image. These five different feature vector types are used to train 10 decision tree/boost models [4, 9], which are combined to predict the right object category of unknown images. 3.4 Results To test and improve the first implemented algorithms in a controlled environment, they were used to classify images from the butterfly image dataset [16]. For all seven categories, the right category of an image is predicted with a success rate of 99.5 % if the image is from the training set and % if the image is from the test set. A random guess would give us only a success rate of 1/7 = %. On images made by the first author the success rates were % and % (5 categories). Here, a random guess has a success rate of 1/5 = 20 %. Next tests will be made on the images from The Pascal Visual Object Classes (VOC) Challenge [6]. It takes about 0.7 seconds to classify a live image with the computational power of our actual multicoper hardware. 4 Conclusions Thus, our first results are encouraging, but in the future, the implementation of our approach will be improved further to become faster with an increased object recognition success rate. For this, a distributed implementation seems to

93 Object Recognition with Multicopters 87 be promising. Using more spatial relations of the segments including different perspective views for a more accurate decision tree/boost model is also desirable. The final goal is to implement the approach as a real-time object recognition working on autonomous multicopters. References 1. Enrique Alegre, Rocío Alaiz-Rodríguez, Joaquín Barreiro, and Jonatan Ruiz. Use of contour signatures and classification methods to optimize the tool life in metal machining. Estonian Journal of Engineering, 1:3 12, Michael J. A. Berry and Gordon Linoff. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. John Wiley & Sons Inc., 3rd edition, Gary R. Bradski and Adrian Kaehler. Learning OpenCV - computer vision with the OpenCV library: software that sees. O Reilly, Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees (The Wadsworth Statistics/Probability Series). Wadsworth Publishing, Richard J. Campbell and Patrick J. Flynn. A survey of free-form object representation and recognition techniques. Computer Vision and Image Understanding, 81(2): , M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2): , June O.D. Faugeras and M. Hebert. The representation, recognition, and locating of 3-D objects. The International Journal of Robotics Research, 5(3):27 52, Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, 2nd edition, Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics). Springer, 2nd ed Corr. 3rd printing 5th Printing. (20. April 2011). 10. Bernd Jähne. Digital Image Processing. Springer, 6th revised and extended edition, Joseph J. Lim, Pablo Arbelaez, Chunhui Gu, and Jitendra Malik. Context by region ancestry. In ICCV, pages IEEE, David G. Lowe. Object recognition from local scale-invariant features. Computer Vision, IEEE International Conference on, 2:1150, C. A. Rothwell, A. Zisserman, D. A. Forsyth, and J. L. Mundy. Planar object recognition using projective shape representation. International Journal of Computer Vision, 16:57 99, Falk Schmidsberger and Frieder Stolzenburg. Semantic object recognition using clustering and decision trees. In Joaquim Filipe and Ana Fred, editors, Proceedings of 3rd International Conference on Agents and Artificial Intelligence, volume 1, pages , Rome, Italy, Fan Shuang. Shape representation and retrieval using distance histograms. Technical report, Dept. of Computing Science, University of Alberta, Cordelia Schmid Svetlana Lazebnik and Jean Ponce. Semi-local affine parts for object recognition. In Proceedings of the British Machine Vision Conference, volume 2, pages , 2004.

94 Semantically-enriched Electric Car Recharge Optimization Toolkit Mikhail Simonov 1, Antonio Attanasio 1, and Davide Luzio 2 1 ISMB, Via P.C. Boggio 61, Torino, Italy 2 Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino, Italy simonov@ismb.it, attanasio@ismb.it, davide.luzio@hotmail.it Abstract. Electric CAr Recharge Optimization Toolkit (e-carrot) is an integrated system for modelling and optimization of Electric Vehicle recharge processes. It uses semantic technology to conceptualize, model, and optimize battery recharge processes occurring in smart grid. The e-carrot tool attempts the time shifting postponing the demand exceeding the available energy flows. It is useful for operational planning of saturated electricity grid when the electricity demand exceeds the available resources during some time slots. 1 Introduction The growing demand of electric energy in EU27 corresponds to annual increments in the energy production. In the last ten years - except the volume of produced electricity has continuously increased [1]. Even with such a production trend, the sustainability limits might be exceeded in the next years, when a high number of Electric Vehicles (EVs) will add new electricity requests. Balancing between the available electricity and its demand is possible from within certain stability limits, which depend on the physical characteristics of distribution sub-topologies. The known art AI methods schedule the charging of EVs in a way to respect the local network constraints. Clement et al. [2] use a centralised scheduler accounting declarations about the expected/future EV mobility. Reporting the earlier vehicle withdrawal leads to the preferential charging, but it overloads the grid up to Denial-of-Service (DoS). The work being described presents a modelling and online optimization toolkit accompanying the integration of EV in smart grid. Unlike online variants of Vickrey-Clarke- Groves method, the proposed algorithm does not need the knowledge about pending/future mobility. Model-free online settings have been considered by Porter [3] and Hajiaghayi [4]. We extend this work by considering shared knowledge about different types/domains of neighbour s processes. The bid mechanism is used to define at the beginning the recharge priorities and to correct them dynamically while running. Further reading about the related work can be found in [5]. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

95 Semantically-enriched Electric Car Recharge Optimization Toolkit 89 2 Recharge process The authors consider one electricity grid with generation capacity P available (t) 0 for any t. EV is considered as an energy consumer with an electricity demand profile EV i (t) 0 for any t. Each EV has a mobility profile depending on its usage by residents and/or tourists. At the end of mobility stage, the EV i battery has a residual charge x i (t k ). The aggregated energy demand by n EVs is expressed by P demand = (c 1 - x 1 ) + (c 2 - x 2 ) + + (c n - x n ), where c i is the full battery capacity for EV i. The values x i and t k are poorly predictable. The EV battery recharge process increases x i values up to c i. To supply (c i -x i ) individual energy quantities the grid provides (c i -x i )/ t elementary flow components along time slots t k, t k+ t and so on. Energy domain researchers produced several load shaping models through the adoption of econometric, statistical, engineering and combined approaches. An example comes from [6]. The absolute sustainability limit of the power system, in terms of the overall energy, is expressed by the formula P demand P available, while with the time varying expression for a generic instant t, is P demand (t) P available (t). To satisfy the above conditions, different EV battery usages are possible. In saturated grid condition with P demand (t)>p available (t) for some t, electric cars have to wait in the queue until some energy become available. The immediate recharge originates DoS for newcomers when P demand (t)>0.99*p available (t). Compared with immediate recharge option, the delayed recharge scheme optimizes the available energy use by postponing the service for certain EVs for a while. It reduces the number of DoS occurrences when EV availability for operations is longer compared with the recharge time. 3 Semantically-enriched toolkit The e-carrot contains several modules being integrated in one toolkit. The Dataset Generation module implements the EV behavioral model and functions supplying events and their distribution in time-space. It is a simulation tool producing a dataset containing the daily population of EVs demanding battery recharge services. The Model Application Utility runs the simulation of the growing demand until the sustainability limits of the system will be reached. It is a batch tool varying input parameters and calling the dataset generator to obtain the collection of datasets simulating the evolutionary trends on an island, those populating a data warehouse. The Semantic Framework is the core of the system. The recharge service requests formulated by EVs are input data going to populate the knowledge repository. The system applies the rules and takes decisions about the optimization, if any. The framework operates time shifting by moving the immediate energy demand exceeding the grid capacity P demand (t k )>P available (t k ) to the future time slots t l with P demand (t l )<P available (t l ). It triggers four main events belonging to the battery recharge process: EV_plugging_in (coming from outside the grid, detected by sensor), relè_on_outgoing (the effective start of EV recharge), relè_off_outgoing (the effective end of EV recharge), and EV_unplug (from outside the grid). Two additional events (relè_on_incoming, relè_off_incoming) come from the inverted flow usage known as

96 90 M. Simonov, A. Attanasio, and D. Luzio EV discharge. Until EVs remain plugged into the electricity grid, the intended use of their batteries - from the user viewpoint - is the recharge mode steadily increasing the energy resource up to 100% and then detaching the EVs. From the grid operator viewpoint more modalities are desirable to better manage the demand. Thus, the Optimization module operates time shifting of EV recharge requests. During the peak time slots it delays the electricity demand by postponing the recharge time slots. Thanks to the presence of the idle time, i.e. the time exceeding that necessary to reach full charge, the process contains alternation of recharge and idle components. The intelligent sequence of optimized recharge-discharge cycles leverages load peaks. The method is designed for an island-like topology. It assumes that an average time spent in a vehicle is less than 2 hours. During the remaining time, EVs are attached to the electricity grid. The island is populated by a number of residents owning an EV and a number of tourists getting EVs from local car rentals. The model incorporates external events impacting on the EV usage by tourists such as arrivals/departures of the ships and airplanes. Those come from publicly available timetables. The usage of EVs on the island determines the time slots in which recharge operations occur. The estimation of possible travel plans by tourists is grounded on the assumption that tourists rent cars upon their arrivals and release them before the definitive departures. The working hours of residents determine their house-workplace mobility and the consequential statistical distribution of the respective recharge periods. The authors encoded an ontological model of recharge system. It includes a semantic model of electro-mobility, the e-carrot ontology encoded in OWL using Protégé to keep interoperability with external OWL-compliant modules, a set of rules based on the knowledge of the local island-like topology, the main entities, and the relationships of EV use, EV recharge, and electricity supply. The control module and the rules have been developed using the Apache Jena API and Java. It schedules EVs recharge processes and constantly monitors the sustainability condition. The ontology describes main entities and relations: EVs and their usage, Power Plants, Power Flows that participate to the energy distribution processes through the electricity grid. An Electric Vehicle connects to a Power Plug, which is located inside a specific Parking. While charging, it executes a Charge Process, which describes, on a time dimension, the evolution of the Power Flow used by the EV. The structure of the Electric Grid is deducible from the connectedof properties owned by Power Plug, Parking and Power Plant entities. Power Series Array Element and its sub-classes are used to model the trend of energy consumption, production and availability over time. A discrete time domain is considered, divided in constant intervals called time slots. The powervalue property is used to express the power intensities for the corresponding time slot. The basic inference mechanism is executed according to the following assumptions. While an Electric Vehicle is connectedto a Power Plug, a Virtual Charge Process (sub-class of Charge Process) is executed and its power flow is derived from the estimation of the departure time of the EV. Virtual processes are not considered for power balancing, but they are used to forecast the aggregate energy request (Estimated Aggregate Power Consumption) arriving from all EVs. A Real Charge Process implies the presence of a Power Flow from the electricity grid. The aggregated energy

97 Semantically-enriched Electric Car Recharge Optimization Toolkit 91 request by these processes at every instant (Effective Aggregate Power Consumption) is the main grid sustainability issue. Assigning negative values to consumed power and positive values to supplied ones (Aggregate Power Production), their sum gives the Effective Total Available Power, which should always remain greater than zero. The Estimated Total Available Power is a sum of Aggregate Power Production and (the negative value of) Estimated Aggregate Power Consumption. All these instantaneous aggregated power flows are sub-classes of Power Series Array Element. A Control Agent processes system Events. Two major sensor-generated external events are considered: EV_plugging_in and EV_unplug. In practice, the method calls semantic reasoner each time an EV is plugged into (or unplugged from) the electricity grid. At each event, the Control Agent activates a sequence of rules (recognized Situations and executed Actions). It takes appropriate control decisions about EVs charge processes then. No other event is considered until the inference cycle ends. At the end of each time slot, an additional inference cycle is called to update the properties of plugged EVs and to check the electrical energy balance (Cyclic Update). For example, when an EV is plugged into the grid, the reasoner fires the appropriate rules in order to decide whether or not to allow the charging. Moreover, a priority ranking is assigned and periodically updated to the EV, based on its residual charge. Since a minimum charge level for every EV should be guaranteed, EVs which have already overcome this threshold receive the lowest priority becoming detachable. If an EV has a residual charge lower than the threshold, but, according to its estimated departure time, it is still possible to introduce some idle time before charging, it receives a medium priority (e.g. deferrable EV). When an EV has to be immediately put in recharge in order to reach the minimum charge level (set up accordingly the Service Level Agreement), it receives the highest priority. When available power runs out, the Control Agent stops detachable EVs, delays the deferrable processes operating time-shift, but keeps online the highest priority processes. In the example shown on Fig. 1 the optimization threshold is set to Figure 1. Optimization results: aggregated electricity consumption is leveraged to remain below the threshold.

98 92 M. Simonov, A. Attanasio, and D. Luzio 4 Conclusions The authors described new modeling and optimization tool used to simulate the recharge of Electric Vehicles on local topology served by a number of energy plants supplying one electricity flow. Authors modeled different categories of users including residents traveling between the houses and work places and tourists renting EVs. Combining a variable number of individuals belonging to above clusters/classes and varying the said ratio along the time, the method gives an improved scheme of the available electricity use. The real-life events occurring in time dimension accordingly the model have contributed to determine the EV battery s recharge schemes. The e- CARROT attempts the time shift optimization of the available energy flows at daily basis. Compared with earlier models assuming the same charging speed for all EVs, new algorithm accepts a wide range of maximum charging speeds. The method makes possible a greedy allocation of EV units to the remotely controllable charging slots, while the clock-driven optimization algorithm supported by semantic model at each step - controls their intermittent (on/off) actuation. The authors gave priority to the timely control of EVs being integrated into saturated grid. For this reason the explicit semantic rules were adopted. The new time shifting optimization method is useful when the sustainability limits are reached and the known FIFO disciplined recharge processes results in DoS because exceeding the energy limits. Invoking the Model Application Utility in back office to run a day-ahead simulation, it could reveal the sustainability limits of the current configuration. It might be useful for operational planning of grid operations going beyond the known art [7]. In the current implementation, the time-shifting algorithm uses the recharge/energy purchase and idle modes. The future work will add the discharge/energy selling mode. References 1. Report "Electricity production and supply statistics", Eurostat 2012/5/3, available online at ly_statistics, seen on 29/5/ K. Clement, E. Haesen, and J. Driesen. Coordinated charging of multiple plug-in hybrid electric vehicles in residential distribution grids. In Proc. of the IEEE/PES Power Systems Conference and Exposition, pp. 1-7, R. Porter. Mechanism design for online real-time scheduling. In Proc. 5th ACM Conference on Electronic Commerce (EC 04), pp , M. Hajiaghayi, R. Kleinberg, M. Mahdian, and D. C. Parkes. Online auctions with re-usable goods. In 6th ACM Conference on Electronic Commerce (EC 05), pages , Y. He, B. Venkatesh, L. Guan, Optimal Scheduling for Charging and Discharging of Electric Vehicles, IEEE Transactions on Smart Grid, vol.3, no.3, pp , Sept C. Gellings, R.W. Taylor, Electric load curve synthesis A computer simulation of an electric utility load shape, IEEE Transactions on Power Apparatus and Systems, 1 (PAS-100), pp.60-65, IEEE Press, New York, P. Evans, S. Kuloor, B. Kroposki, Impacts of plug-in vehicles and distributed storage on electric power delivery networks, In Vehicle Power and Propulsion Conference, VPPC '09. IEEE, pp. 838, IEEE, NJ (USA), 2009.

99 Ubiquitous Monitoring & Service Robots for Care Mikhail Simonov 1, Marco Bazzani 1, and Antonella Frisiello 1 1 ISMB, Via P.C. Boggio 61, Torino, Italy {simonov, bazzani, frisiello}@ismb.it Abstract. The Ubiquitous Monitoring System (UMS) developed in the framework of Knowledgeable Service Robots for Ageing (KSERA) project is able to trigger unfavourable outdoor and indoor environmental conditions, potentially harmful for patients, especially with chronic respiratory diseases. The authors present a new artificial cognitive tool integrating sensing and decision making for care purposes. Authors showcase how a combination of virtual sensorial components acquiring daily the concentration of very fine Particulate Matter, one data analysis component triggering long lasting polluted periods cooperates with one humanoid robot actuating information delivery about the harmful conditions and risks of disease exacerbation. The use of humanoid robotic interfaces offers better persuasive capabilities and compliancy of care. 1 Introduction In Europe, life expectancy is increasing. The number of seniors aged 65 and over continues to increase. Patients with Chronic Obstructive Pulmonary Disease (COPD) have an overall gradual decline in their physical functions accompanied by acute periods of exacerbation. For this reason they become vulnerable to the unfavourable environmental conditions. One example is the pollution by very fine Particulate Matter (PM). Depending on the size of fine particles, there are different polluting agents known as PM 25, PM 10, and PM 2.5. Among them, the PM 2.5 particles penetrate deep into lung tissue giving the correlation with COPD exacerbations. It challenges new remote COPD care applications. Care solutions relying on smart home alone appear to be incomplete. They are stationary and lack attractive interfaces for the elderly requiring them to always go where sensors are. The KSERA Ubiquitous Monitoring System (UMS) processes the information and context. It informs and gives alerts to the person about both the indoor and outdoor conditions (pollution, temperature, humidity). A strong orientation towards human needs as opposed to a pure implementation of the technologically possible is required. Designed according with the User Centred Design (UCD) [1], the system becomes responsive to the context of life of people to which it is addressed and persuasive: an advanced humanoid robotic interface interacts with the user. Nevertheless, the sole presence of a robot is not enough because it needs to interact with the home environment and to provide relevant information that only external devices can provide (e.g., PM 10 levels, environmental conditions). Thus, a COPD care appli- S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

100 94 M. Simonov, M. Bazzani, and A. Frisiello cation which collects historical series of environmental/medical data for the medical assessment [2] before actuating care operations offers better personalized care. The improved compliance of care depends on an efficient and persuasive communication. In turn, it requires context-sensitivity and capacity to read user intentions. Socially assistive robot (SAR) provides two types of support: functional and social. Functional support includes reminders and instructions. Social support typically aims at reducing social isolation and enhancing well-being in the form of social interaction with users. The robot s physical embodiment and multimodal communication channels allow it to communicate with users verbally and non-verbally in a social manner. As a consequence, users benefit from the interaction, while robot can be perceived as companion. Using SAR [3], it can reduce the feeling of social exclusion and the level of stress, both common problems of aging. Thanks to the social robot, the person can also ask for information when, for example, planning his/her daily activity. According to the data gathered, the system can follow some protocols, patient specific, reminding to perform the measurements of relevant parameters. Cognitive capabilities [4, 5] are used to manage the exacerbation periods through monitoring, collecting data, and applying the rules to discriminate among several real-life situations. 2 Ubiquitous monitoring The sensing is one of the core capabilities of AI systems. The KSERA UMS [3] is able to acquire several parameters coming from indoor, outdoor, and wearable devices. One variant of the system uses web services to acquire at regular basis the specific environmental parameters such as PM 10 and/or PM 2.5 values. The main information set (temperature, humidity, and CO levels) comes from physical sensors, while the particle pollution comes from a virtual sensor. The system makes persistent the information flows in the database. After some days the accumulated time series of data about pollution reveal trends. It enables to perform analytical processing of the historical series using data-warehousing algorithms. The persistence of high PM levels for many days is likely to cause exacerbations in patients affected by chronic respiratory diseases. Since 2006, the 24-hour fine particle PM 2.5 threshold is 35 μg/m 3, while PM 10 threshold is 50 μg/m 3. The aforementioned dataset is useful to set up one interesting care scenario capable to prevent COPD exacerbations in situations in which one plans venturing outdoor. The UMS analyzes the outdoor conditions and trends, while KSERA rule engine triggers the situations in which patient is venturing outdoor for any reason. In such cases the intelligent software delivers appropriate information to patient attempting to prevent the unwanted effects. For these reasons the UMS daily analyses PM levels and counts a number of consecutive days in which PM levels exceed 35 or 50 μg/m 3 (Fig.1). It gives one simple rule-based reasoning scheme governing the COPD care application, which can be easily extended by additional clauses incrementally. Two simplest triggering rules are: IF (Number_of_days_with_PM 10 >50 IS 20) OR (Number_of_days_with_PM 2.5 >35 IS 10) THEN Deliver_Feedback ( Reduce outdoor stay ).

101 Ubiquitous Monitoring & Service Robots for Care 95 mcg/mc abnormal PM 2,5 concentrations Air pollution in Turin (PM2,5) IF (AirPollution IS Bad) AND (Venture_Outdoor) THEN MoveRobot; Say( Don t venture outdoor ) PM2,5 date 02/01/ /01/ /01/ /02/ /02/ /03/ /03/ /04/ /04/ /05/ /05/ /06/ /06/ /07/ /07/ /07/ /08/ /08/ /09/ /09/ /10/ /10/ /11/ /11/ /12/ /12/2009 Figure 1. Daily samples of particle pollution make a trend originating the robot s reaction. The timing of feedback delivery is handled in more sophisticated way. The reasoning might deliver many triggers daily, while their use depends on the user intentions being captured. In the example, KSERA recognizes the patient s intention to venture outdoor being expressed and it delivers the useful advice once. Using SARs, a wide range of social and cognitive skills needs to be seamlessly and in a personalized manner integrated to enable them to interact effectively [6]. To do so, we added explicit dialogs with speech recognition features. They are especially effective since recall the natural cues of human interaction. The novel time management option is embedded in the respective scripts/state machines. In a situation, in which high PM levels persist for more days and human tells about the planned outdoor activities, the anthropomorphic SAR conveys a warning to patient by moving itself nearby him/her. This interaction is very complex. To avoid any obstacle, the optimal walking path has to be calculated first. To establish an eye contact with the patient, SAR has to find the user and align the viewpoint by calculating gaze vector. Finally it tells him/her the message in natural language. In COPD care, the respiratory training programs are adopted to improve the healthy condition of patients because prolonging their wellbeing. The respiratory activity is monitored indirectly through SpO 2 parameter using pulsimetry, which is a non-invasive technique checking the saturation of oxygen in the arterial blood. Since the difficulty levels of exercises depend on SpO 2 thresholds, the aforementioned values have to be acquired by UMS before prompting to undertake the training set in the patient s care protocol. In the absence of the updated values, KSERA prototype invites patient to acquire new measurements by moving the robot in the neighbourhood. The system proposes one personalized respiratory training program based on the robot in the place-of-need then. 3 Complementing the Human Robot interaction The anthropomorphic robot is the most visible component of the system. It is the major communication interface between the elderly, the smart home, and the external world. It enables natural human-robot interaction, but requires some sophisticated artificial cognition components to interplay system components. We took one off-theshelf (NAO) robot and modified its behavior based on the laboratory user studies in

102 96 M. Simonov, M. Bazzani, and A. Frisiello the Netherlands and Italy and user feedback from field trials in Austria and Israel with elderly COPD patients. In our vision, the UMS complements the HRI by means of an enriched context sensitiveness that includes the main interaction agents (robot, patient, and their changing positions) resulting in their improved reliability and trust. Thus, we integrated robot mobility as a part of the human robot interaction with novel localization and navigation methods. The ceiling-mounted camera gives positions of a person and a robot. The person localization uses a hybrid probabilistic model, while the robot localization is based on particle filter prediction. Navigation uses the position information as input for determining where to move the robot. The robot moves to a specific position in the room according to the status of the interaction with the person and external events. The choice of the target position is made by the rule engine that interprets the environmental parameters and external events and triggers appropriate robotic mobile behaviors. KSERA contains behavior-based navigation providing a smart obstacleavoidance strategy and map-based navigation which learns the spatial knowledge by observing the human's movement in a room. KSERA provides a mobile assistant to follow and monitor the health and behavior of a senior, video and internet communication services including needed alerts to caregivers and emergency personnel, and a robot integrated with smart household technology to monitor the environment and warn the senior or caregivers of anomalous or dangerous situations. The ubiquitous monitoring of physiological and behavioral data through direct measurements and interaction with household sensors is used in conjunction with human-robot interaction including shared environmental processing, affective technology, and adaptable multimodal interfaces. A single robot, hosting entertainment and communication aids, contributes to the senior s health and quality of life. At the same time, it provides an assistant that monitors the environment and the senior s behavior. It uses contextual information and adaptive decision making algorithms to continually update the monitoring and mobile behavior for improved interaction with the senior and to provide information and support at the right time and place. The mix of quantitative and qualitative metrics is used to evaluate holistically KSERA. Quantitative metrics include Godspeed, WHO QoL, PANAS, ad-hoc questionnaires (Likert-scale based), and system performance metrics. Qualitative metrics include: focus groups, free interviews, user tests and thinking aloud protocol. We tested the embodiment discovering that our participants do like the robot more than smart home interfaces and that they consider it more alive. The reaction time in categorization tasks was slower for the robot than for the smart home. One possible explanation is that the robot used gestures, which caused the participants to wait for them to complete before starting with the categorization task. 4 Conclusions The KSERA system with several UMS sensors and web services processing the information describing indoor, outdoor, patient mobility and health/living parameters

103 Ubiquitous Monitoring & Service Robots for Care 97 daily gives an example of organic computing. Using historical series of data, we discriminate among different patient conditions and care trends. Using trend monitoring functionality, the system gained the capacity to friendly inform people about the harmful outdoor conditions according with their intentions (venturing outdoor or opening windows during polluted days in example). User tests and field trials showed high acceptance of the UMS, which is able to improve the patient s engagement in disease self-management process [7] through the social robot. As opposed to [5] grounded on RFID to detect the sequences of events and mobility patterns, the present work uses a combination of different classes of sensors. The next intention is not calculated based on temporal constraints, but it is made using speech recognition algorithms and disambiguation (dialogs). Extending the monitoring functions of previous prototype [7], more complex data-warehousing/trend analysis options are used to match the patient s history with the formalized disease management pathway (rulebased). The SpO 2 values were correlated with PM levels to derive a new combined situational indicator of wellbeing. Using the humanoid robot to inform patients about pollution and exacerbation risks appears convincing. The social robot acting inside the house implements ubiquitous monitoring features and makes them acceptable by users through a natural interaction. Future artificial cognitive systems correlating the collected data with the intentions might implement more communication variants. Acknowledgment This research work undertaken in the context of KSERA project (Grant n ) was partly funded from the European Community's 7th Framework Program. References 1. D. Norman, S. Draper, User-Centered System Design: New Perspectives on Human-Computer Interaction, Lawrence Earlbaum Associates, Hillsdale, NJ (1986). 2. R. Wichert, B. Eberhardt, Ambient Assisted Living - Advanced Technologies and Societal Change, Springer (2011). 3. -, Knowledgeable Service Robotics for Ageing (KSERA) project, 4. E. K. Zavadskas, Multi-criteria decision support system of intelligent ambient-assisted living environment, in proc. of ISARC-2008 conf. (2008). 5. M. Simonov, F. Mazzitelli, Near miss detection in nursing: rules and semantics, in H. Chen, Semantic e-science Book, Springer (2010). 6. K. Dautenhahn, Socially intelligent robots: dimensions of human-robot interaction. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1480), /rstb (2007). 7. M. Simonov, A. Frisiello, M. Bazzani, Using Humanoid Robot In Ambient-Assisted Living with COPD, in proc. of Med-e-Tel 12 conf. (2012).

104 Citation Context Sentiment Analysis for Structured Summarization of Research Papers Niket Tandon 1,3 and Ashish Jain 2,3 1 Max Planck Institute for Informatics, Saarbrücken, Germany 2 IIIT Hyderabad, India 3 PQRS Research (pqrs-research.org) Abstract. Structured tabular summarization tremendously helps humans understand a topic, e.g. Wikipedia infoboxes. However, few methods exist to generate summary of research papers although it is time taking and painstaking to read a paper and even more difficult to infer its merits and limitations. We propose a method to generate structured summary of research papers. We turn to opinion of citing papers, because they are shown to be more focused than abstracts and contain additional information. This paper is a first step towards structured summarization of research papers using citing papers. 1 Introduction There is a plethora of research papers, making it hard for students and researchers to be abreast with the literature. Skimming through a research paper to get the broad ideas in the paper is an art. Judging positive and negative is best left to experts and requires time. Thus, the problem of quickly gaining insights about a research paper remains unaddressed. We propose to utilize opinion and summaries in citation contexts to address this problem. A citation context of a paper P, is the set of sentences about P in other articles which cite P. Citation context contains concise and precise analysis about a paper due to the space limitations in papers and due to the high quality of a paper in terms of correctness. This paper envisions to summarize these opinions and summaries from all citing papers and present in a table with five columns: summary, related work, strengths, limitations, and extensions. Such an example summary of an Information Extraction system, KnowItAll 4 is presented in Figure 1. Problem statement Our problem can be divided into two sub-problems. (i) Classifying citation context into one or more of the five classes. This is challenging because we have limited training data that classifies citation context. Secondly, very few techniques exist for sentiment analysis of research papers in more than two classes. 4 cs.washington.edu/research/knowitall S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

105 Citation Context Sentiment Analysis for Summarization of Research Papers 99 (ii) Generating summary snippets and merging similar statements from the classified citation contexts. e.g. given a negative citation context: We use the CPS transformation (citation) but our implementation is simplified by the fact that we start from a normalized direct style representation. We want a summary statement from this that says, CPS transformation s implementation is not simple. We keep this as future work. Fig. 1. Our Vision: Structured summarization Related Work. Although sentiment analysis is a well-studied topic, sentiment analysis of citation context got surprisingly less attention in the research community. Sentiments in citation contexts differ in both structure and language from standard use cases e.g. product reviews, thus standard sentiment analysis of citation context using lexical resources for instance, leads to poor coverage and accuracy [1]. Some attempts [2] have been made to utilize the sentiments of citation context by manually classifying sentiments as positive or negative. Other approaches [3] rely on manually defined phrase based rules for classifying sentiments. However, no large scale automated sentiment analysis over citation context exists. Further, these approaches have considered only classified citation context in two or three classes [3][1], although citation context can be leveraged in more than these two classes. In one of the first attempts, [4] describe paper summarization as a classification task and classify each sentence in an article into aim, contrast and background. They do not leverage the citation context for summarization. A seminal approach in [5], [6] leverages citation context for summarization. Their approach is primarily geared towards multiple document summarization and less focused on single paper summarization. They consider the phrases in citation context as unstructured summarization i.e. consisting of uncategorized sentences, but do not consider the sentiments of the citation context. Unlike existing approaches that provide unstructured summarization of a research paper, our goal is to perform structured summarization of a research paper.

106 100 N. Tandon and A. Jain Contribution. This paper aims at filling the gap between sentiment analysis and citation context summarization by proposing a structured summarization approach. An example that depicts our goal is shown in Figure 1. Unlike standard sentiment analysis for items like product review, we propose an automated approach directed towards research papers. Unlike standard summarization approach that is unstructured, we provide structured summarization of research papers that is more desired. 2 Methodology Structured summarization of a research paper can be viewed as classification of a citation context into one or more of the following classes: summary, related work, strengths, limitations, and extensions. The classification problem here is multilabel because a citation context can belong to one or more classes. Consider the following citation context that summarizes the paper as well as describes an application of the work, The (Know- ItAll system) employs the same generic patterns as Hearst (e.g., NPs such as NP1, NP2, ), and more besides, to extract a whole range of facts that can be exploited for web-based question-answering. Multilabel classification can lead to an intermediate summarization as presented in Figure 2. We use a Language Model(LM) approach for classification of citation context. In brief, language models are constructed for each of the five classes. Subsequently, the most likely language models that would have generated a citation context are estimated. Our LM based classification approach is similar to a sentiment classification approach used in [7]. LM construction. Given a collection of citation contexts D, we manually annotate them into the five classes : summary, related work, strengths, limitations, and extensions. We identify the opinion vocabulary, consisting of two kinds of terms: phrases denoting the context, and opinion terms describing opinions on the cited paper. In the opinion vocabulary, bigrams are taken as context while unigram verbs, adjectives, and adverbs are assumed to be opinion related. An LM M ci of a particular class c i is estimated as the interpolation of a bigram phrase denoting context B and a unigram opinion term U over all phrases and opinion terms in the collection. Such an interpolated LM benefits from two LMs. P M ci (t i D) = (1 α)p B (t i D) + αp U (t i D) where P B (t i D): LM of D over binary terms, P U (t i D):LM of D over unary terms, t i : a term and α: interpolation parameter (estimated by minimizing perplexity). The unigram and bigram models are obtained using the general form: P (t i D) = i,d) c(t t j D c(tj,d) where c(t j, D) denotes the frequency of term t j in the collection D. Further, Good Turing smoothing is applied because several out of vocabulary(oov) words could exist.

107 Citation Context Sentiment Analysis for Summarization of Research Papers 101 Classifying the citation context. The citation context is modeled as a query Q CT by extracting the binary and unigram patterns from the citation context, Q CT = {B U} Similarly to LM usage in information retrieval, we estimate the query likelihood of a query given the LM of each class c i. P (Q CT M ci ) = P (t i M ci ) t i Q CT In case of single label classification, the model that has the highest likelihood of generating the query is selected. However, we consider a multilabel classification that requires an additional step. Our hypothesis is that if two or more LMs have query likelihood in the neighborhood δ of the best LM, then the LMs should also be accepted since the problem is multilabel classification. On the other hand, LMs whose query likelihood is futher off from the best LM should not be considered. We empirically estimate δ. 3 Experiments Fig. 2. Classification of citation contexts Experimental Setup. We use a standard multilabel classification metric, Average Precision, which computes accuracy(precision) of each class and averages them over all the classes. Citation contexts for research papers are available online on Microsoft Academic search engine 5. There is no annotated dataset for our purpose, so we create an annotated set of 30 research papers, totaling an annotation of 500 citation contexts. 5 academic.research.microsoft.com

108 102 N. Tandon and A. Jain Baseline: As a baseline for multilabel classification, we use Random k-labelsets with Naive Bayes algorithm as the basis [8]. As features for the baseline, we consider the combinations of the following: (i) Adjectives in each class, (ii) Verbs, (iii) n-grams. Experimental results. The baseline is trained and language models are constructed on a total of 500 labeled citation contexts in the collection D. A combination of adjectives, verbs and bigrams achieves 68.54% average precision, see Table 1, marginally beating the LM. We postulate that the LM accuracy could be further improved by increasing and cleaning the collection e.g. limitation class has only 47 instances annotated out of 500. Learning difficult underlying patterns with small dataset leads to low precision for that class as well as reduces the overall average precision of the experiment. Classifier Features Average Precision(%) Baseline Adj Baseline Verb Baseline Adj+Verb Baseline Adj+Verb+Bigram LM Bigram terms B + Unigram terms U Table 1. Average precision of MultiLabel Classifier 4 Conclusion We introduced a new framework based on citation sentiments for structured summarization of a research paper. Our results are encouraging given the simplicity of our model i.e. multilabel classification. In the future, we would enhance our approach by employing more sophisticated algorithms like LDA and address snippet generation. References 1. Piao, S., Ananiadou, S., Tsuruoka, Y., Sasaki, Y., McNaught, J.: Mining opinion polarity relations of citations. In: International Workshop on Computational Semantics (IWCS 2007) 2. Stamou, S., Mpouloumpasis, N., Kozanidis, L.: Deriving the impact of scientific publications by mining citation opinion terms. IJDIM (5) (2009) 3. Nanba, H., Kando, N., Okumura, M., et al.: Classification of research papers using citation links and citation types: Towards automatic review article generation. (2000) 4. Teufel, S.: Argumentative zoning for improved citation indexing. Computing Attitude and Affect in Text: Theory and Applications (2006) Qazvinian, V., Radev, D.: Scientific paper summarization using citation summary networks. In: COLING (2008) Elkiss, A., Shen, S., Fader, A., Erkan, G., Radev, D., et al.: Blind men and elephants: What do citation summaries tell us about a research article? JASIST 59(1) (2008) Awadallah, R., Ramanath, M., Weikum, G.: Language-model-based pro/con classification of political text. In: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. (2010) Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. Machine Learning: ECML 2007 (2007)

109 Towards Robust Spontaneous Speech Recognition with Emotional Speech Adapted Acoustic Models Bogdan Vlasenko, Dmytro Prylipko, and Andreas Wendemuth Cognitive Systems, IESK & Center for Behavioral Brain Sciences, Otto von Guericke University, D Magdeburg, Germany Abstract. Speech signal in addition to the linguistic information contains additional information about the speaker: age, gender, social status, accent (foreign accent, dialects, etc.), emotional state, health etc. Some of these informational channels induce changes of the speech acoustic characteristics. This article presents evaluation of the ASR acoustic models (first trained on neutral, read speech) on acted and spontaneous emotional speech. In our research we used adaptation approaches to compensate the mismatch of acoustic characteristics between neutral speech samples and affective speech material. During experiments we observed that the affective-speech-adapted ASR acoustic models provide better emotional-speech-recognition performance. The improvements of affective speech recognition performance were 6.24% absolute (7.1% relative) for speaker-independent evaluations on the EMO-DB database and 7.08% absolute (25.43% relative) for cross-corpora evaluation on the VAM database. Keywords: Emotional Speech, Adaptation, ASR 1 Introduction The speech signal comprises not only linguistic content but also various additional information about the speaker: age, gender, social status, accent, emotional state, health etc. Characterization of the influence of some of these speech signal variations, together with related methods to improve automatic speech recognition (ASR) performance, is an important research field. In order to deal with spontaneous speech we should not cut the above mentioned information channels from the input signal, but use them as an additional knowledge source and thus boost the performance. In real-life applications training and evaluation conditions (speaking rate, acoustic environment, vocal tracts variability, affected state etc.) usually do not match, which cause a severe degradation of the recognition performance. In our previous research [7] we characterized acoustical difference between emotional S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

110 104 B. Vlasenko, D. Prylipko, and A. Wendemuth and neutral speech. We have shown a significant difference between vowel triangles form and their position in F1-/F2-dimensional space for emotionally colored and neutral speech samples. This difference illustrates why ASR models trained on neutral speech are not able to provide a reliable performance for affective speech recognition. To compensate such a mismatch, acoustic models adaptation techniques are usually applied. However, these techniques are usually employed to compensate the mismatch of acoustic characteristics between various speaker, acoustic channels and noisy environments. Acoustic models adaptation towards affective speech is a less popular adaptation concept. In our research, we used adaptation approaches to compensate the mismatch of acoustic characteristics between neutral speech samples and affective speech material. We used acted affective speech samples from the popular public available database EMO-DB to adapt acoustic models trained on emotionally neutral speech samples from The Kiel Corpus of Read Speech. 2 Corpora For initial acoustic models training we used a part of The Kiel Corpus of Read Speech [5] which contains emotionally neutral German read speech samples. For our evaluation we used speech samples from 1041 utterances produced by 6 female and 1033 utterances spoken by 6 male speakers. For affective speech we decided to use the popular studio recorded Berlin Emotional Speech Database (EMO-DB) [2] and The Vera am Mittag (VAM) corpus [4]. The EMO-DB contains acted emotional speech samples. 10 professional actors (5 male and 5 female) spoke 10 German sentences with emotionally neutral linguistic meaning. For our evaluations we used 494 sentences classified as more than 60% natural and at least 80% clearly assignable throughout perception tests. The VAM database [4] consists of 12 hours of audio-visual recordings taken from a German TV talk show. The corpus contains 947 utterances with spontaneous emotions from 47 guests of the talk show which were recorded from unscripted, authentic discussions. Since VAM corpus does not provide such a lexicon, we created it by ourselves using two ways. The major part of the word transcriptions (1216 words) has been taken from other German corpora, namely Verbmobil and SmartKom. For the rest (688 words) we created transcriptions using a grapheme-to-phoneme conversion with Sequitur G2P converter [1]. The converter was trained on a joined lexicon based on SmartKom and Verbmobil lexicons (12460 German words at all). 3 Emotional adaptation of acoustic models For our evaluations we used the HTK toolkit to create and test continuous density hidden Markov models (HMMs) based on a multivariate Gaussian mixture model (GMM) with 32 mixture components. We created left-to-right mono-

111 Speech Recognition with Emotional Speech Adapted Acoustic Models 105 phone models with three emitting states for acoustic modeling. Speech input is processed using a 25 ms Hamming window, with a frame rate of 10 ms. We employed 39-dimensional MFCC feature vectors (12 cepstral coefficients + log frame energy plus speed and acceleration coefficients). 3.1 Adaptation configuration Two adaptation schemes have been tested: Maximum Likelihood Linear Regression (MLLR) and Maximum a Posteriori (MAP) [8]. During the adaptation only the mean values of Gaussian mixture were updated because variance compensation provides only minor imrovement and requires additional computational overhead of non-diagonal Gaussian likelihood calculations [3]. Prior to adaptation and recognition on VAM, optimal parameters for each scheme should be determined. For MLLR a number of regression classes is important. MAP depends on the τ parameter (weight of the prior knowledge). For the MLLR, regression trees with 2, 4, 8, 16 and 32 terminal nodes have been tested. Prior knowledge weight for MAP has been evaluated in range of τ = 2,..., 20. For MLLR adaptation the best emotion recognition performance on EMO-DB samples has beed obtained with 32 regression class trees (rc = 32). These configurations have been used further for adaptation and test on the VAM corpus. For MAP adaptation the best emotion recognition performance has been obtained with τ = 2 (see Table 1). Table 1. Optimal adaptation parameters selection. Basic models trained on Kiel, adapted and evaluated with LOSO on EMO-DB Acoustic model Parameters Word accuracy [%] Non-adapted basic MLLR-adapted basic rc= MAP-adapted basic τ = EMO-DB trained As one can see from Table 1, HMM/GMM models trained on neutral speech samples from the Kiel dataset are not able to provide acceptable emotionalspeech-recognition performance without adaptation on affective speech samples. For this configuration (72 words in lexicon, only 10 possible sentences) state-ofthe-art recognition accuracy is higher than 95% [6]. However, MLLR-adapted basic models provide better recognition performance, which is close to the value achieved during the evaluation of EMO-DB-trained (native) acoustic models. 3.2 Experiments and results Prior to adaptation we tested the baseline performance of the acoustic models trained on the Kiel corpus. Except acoustic models, other components such as

112 106 B. Vlasenko, D. Prylipko, and A. Wendemuth lexicon or language models have been taken from the test database. Training and testing on the VAM (single corpora mode) has been done in a speaker independent fashion using Leave-One-Speaker-Group-Out (LOSGO with 5 speaker groups at all) strategy. Table 2. Word accuracy rates for cross-corpora evaluation of acoustic models with and without pre-adaptation on EMO-DB samples, evaluated on the VAM database Training set Adaptation scheme Evaluation set Word accuracy [%] Kiel VAM Kiel MLLR on EMO-DB VAM Kiel MAP on EMO-DB VAM VAM VAM The results presented in Table 2 show that training ASR models on neutral speech, and subsequent adaptation on affective speech samples, does have an impact on the recognition performance within emotional speech recognition. These results have been obtained after evaluations in a cross-corpora way. We used speech samples from the Kiel and EMO-DB databases for training and adaptation purposes, respectively. Finally, these acoustic models have been evaluated on the VAM database speech samples. 44 Word accuracy [%] pre-adapted initial Sentences [#] Fig. 1. Performance evolution during the incremental unsupervised adaptation on VAM database. Initial models trained on only Kiel samples, pre-adapted - trained on Kiel and adapted on EMO-DB samples. Also, we compared initial acoustic models with pre-adapted ones with unsupervised incremental MLLR adaptation. 30 adaptation sentences were selected randomly from the whole VAM corpus. During the adaptation process they were fed to HVite sequentially. Other 917 sentences formed the test set. This difference in procedure is the reason why initial values of both curves depicted in Fig. 1 slightly differ from the values provided in Table 2. The transformations were applied after some number of frame occurrences (namely 800) which in our case corresponds to 6 sentences or 6.52 seconds of speech. One can see from Fig. 1, if

113 Speech Recognition with Emotional Speech Adapted Acoustic Models 107 we do not have at least 25 sentences for unsupervised adaptation, pre-adapted acoustic models provide much better speech recognition performance. 4 Discussion and conclusions The main issue of this research is to show that training ASR models on neutral speech, and subsequent adaptation on affective speech samples, does have an impact on the recognition performance within emotional speech recognition. It has been found that the adaptation on acted emotional speech samples favor a significant gain (about 25.43% relative improvement for word-accuracy rate) in spontaneous emotional speech recognition performance (34.92% with adapted models) over the basic ASR models trained on neutral speech samples. In comparison to results presented for the EMO-DB database, speech recognition performance for the VAM database obtained with adapted models is relatively low. This result can be compared with a low word-accuracy rate of 42.75% obtained during speaker-independent LOSGO evaluation on the VAM database. Comparison of these values to the state-of-the-art speech recognition performance is unfortunately hardly possible, due to the nature of the corpora. Both EMO-DB and VAM were designed for research in emotion recognition from speech, rather than for speech recognition. That is why most publications report on accuracies in emotion classication. For our best knowledge, there is no paper reporting on the accuracies of speech recognition on EMO-DB or VAM. As a conclusion, we showed that acoustic models trained on read speech samples and adapted to acted emotional speech could provide better performance of spontaneous emotional speech recognition. References 1. M. Bisani and H. Ney. Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication, 50(5): , May F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss. A database of German emotional speech. In Proc. of EUROSPEECH, pages , M. Gales, D. Pye, and P. Woodland. Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In Proc. of ICSLP, pages IEEE, M. Grimm, K. Kroschel, and S. Narayanan. The Vera am Mittag German audiovisual emotional speech database. In Proc. of ICME, pages , K. J. Kohler. Labelled data bank of spoken standard German - the Kiel Corpus of read and spontaneous speech. In Proc. of ICSLP, pages , D. Pallett. A Look at NIST s Benchmark ASR tests: Past, Present, and Future, B. Vlasenko, D. Prylipko, D. Philippou-Hübner, and A. Wendemuth. Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In Proc. of Interspeech, Florence, Italy, P. Woodland. Speaker adaptation for continuous density HMMs: A review. In ISCA Tutorial and Research Workshop (ITRW) on Adaptation Methods for Speech Recognition, pages 11 19, Antipolis, France, 2001.

114 Tool Support for Activity Recognition with Computational Causal Behaviour Models Kristina Yordanova, Frank Krüger, and Thomas Kirste University of Rostock, Institute of Computer Science, MMIS Group, Germany Abstract. Context-aware activity recognition plays an important role in different types of assistive systems and the approaches with which the context information is represented is a topic of various current projects. Here we present a tool support for activity recognition using computational causal behaviour models that allow the combination of symbolic causal model representation and probabilistic inference. The aim of the tool is to provide a flexible way of generating probabilistic inference engines from prior knowledge which reduces the need for collecting expensive training data. Keywords: Activity Recognition, Context-Awareness, Causal Models 1 Introduction and Motivation The area of activity recognition is expanding rapidly in the resent years which results in the need for powerful and reliable tools for building models for activity recognition and for inferring the user actions and intentions. In this paper we present such tool support that allows the building of Computational Causal Behaviour Models (CCBM) and their usage for recognising user activities. CCBM are human behaviour models that use symbolic causal representation to describe activities and which are compiled into probabilistic inference machines. Symbolic human behaviour models are well known to the activity recognition community as they allow the representation of user actions and the reasoning over them in order to infer not only the current user actions but also to what more complex activity it belongs [3, 1, 4]. On the other hand, as we are dealing with observations produced from unreliable sensors, the symbolic representation is not able to cope with the implications of the noisy readings. Thus, probabilistic models are often preferred in situations where the inference is applied under some uncertainty [6]. To cross the bridge between these two approaches, here we present our tool that compiles causal human behaviour models into a probabilistic inference machine, taking advantage of the features of both methods. The tool is aimed at providing an estimation of the trajectories of dynamic systems from observation data in domains where prior knowledge and context information can be used to reason over the causality of the user behaviour. Furthermore, CCBM synthesises probabilistic models from prior knowledge aiming at reducing the need for training data. This is done by substituting the training data with S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

115 Tool Support for Activity Recognition with CCBM 109 context information based on room topology, types of sensors, typical actions durations and functionality of a given setting. 2 CCBM Features and Goals The functionality and feature set of the CCBM tool has been designed with the aim of providing reliable and flexible system for activity recognition that is able to cope with real world problems. During the tool design, the following goals were kept in mind. (1) allow to infer goals not only as labels but also as desired states that would allow the system to plan corresponding assistance strategies; (2) allow to recognise individual actions and to infer plans that lead to specific goals, i.e. intentions; (3) allow to recognise environmental states and predict actions and intentions from these states; (4) ensure mechanism for coping with noisy or unreliable sensor data, i.e. probabilistic sensor models; (5) reduce the need for training data by employing context information, i.e. a-priori models; (6) enable the use of symbolic causal modelling paradigms for building a-priori models; (7) allow the development of reusable inference models; (8) allow the inference in large state-spaces; (9) allow the inference of actions in multi-agent scenarios and the recognition not only of the team behaviour but also of the individual agents; (10) allow probabilistic duration densities that can cope with variations in the expected actions durations; (11) allow real-time inference; (12) employ probabilistic inference, i.e. Bayesian filtering, smoothing, prediction, parameter estimation; (13) allow the usage of different or combined heuristics for action selection in order to adequately represent the dynamics of human behaviour 3 CCBM System Components Based on the features and goals described in the previous section, the CCBM tool consists of different components that provide the tool functionality. Computational Causal Behaviour Models: The CCBMs consist of symbolic causal human behaviour model described in a PDDL-like notation and an observation model that are translated into a probabilistic inference system. The symbolic model consists of two parts domain and problem descriptions. The domain description containing the available user actions represented as precondition-effect operators, the object types used and the domain constants. A formal definition of the domain file can be seen in Fig. 1 a typical domain entry consists of declaration of the types, declarations of the constants invariant for different problems, declaration of the observations that connect the described states to the observations, and declaration of the actions described as precondition effect formula. On the other hand, the problem description contains the problem constants, the initial world state and the goal. The formal definition of the problem file can be seen in Fig. 2, where every typical problem entry contains the domain name, the problem specific constants (or objects), the initial state to which also a duration could be assigned, and the goal state. The third component needed for compiling the model into a probabilistic filter is the observation model (OM) that describes the probability of observing a state

116 110 K. Yordanova, F. Krüger, and T. Kirste domain = (define (domain name){domain-entry}) domain-entry = (:types {name{name} - name}{name}) (:predicates {(name[declarations])}) (:constants {constant{constant} - type-name}{constant}) (:observation {observation-formula}) (:action {action-formula}) Fig. 1: CCBM domain definition components problem = (define (problem name){problem-entry}) problem-entry = (:domain name) (:objects {constant{constant} - type-name}{constant}) (:init [:duration atomic-formula]{init-elem}) (:goal formula) Fig. 2: CCBM problem definition components x given an observation y P (y x). The OM could be trained on sensor data,but it also could rely only on context information. In difference with other human behaviour models [7], in CCBM one does not need to explicitly describe the execution sequence of the user actions. They are rather compiled lated based on the preconditions and effects of every action. Thus, the model is able to describe multiple hypotheses without the need for the system designer to waste time on describing these sequences. Additionally, the causal actions description allows for producing much more valid execution sequences than when doing that by hand. In short, by describing only a small set of actions, one could generate huge state space with all logically valid executions. Compiler: The compiler takes the domain and the problem files and translates them into a C-code module which is compiled and linked against the modules containing the filter routines. Additionally, the OM containing the routines for computing the observation probabilities is compiled and linked into the final executable. Analyser: The analyser computes the state space of the causal model (the domainproblem combination) and calculates the distance to the goal. The state-space is computed by a depth-first search of the state graph with edges the available actions. Later the goal distances are calculated by running the Dijkstra s algorithm on the transposed graph and states that are not reachable are assigned a goal distance of infinity. Filter: The filter uses the compiled model and filters the sensor observations by using an action selection formula. For an action a and states x, x such that x = a(x), let δ(x ) be the goal distance of state x, and let s(a) be the saliency value of a. The probability of selecting a in state x is then defined by: { γ(x )s(a)(β + e λδ(x ) ) if x = pre(a) P (a x) (1) 0 otherwise, where β is the bias and λ is the weight factor. Note that in the presence of states with infinite goal distance, it is required that λ 0. The factor γ(x ) is determined by the

117 Tool Support for Activity Recognition with CCBM 111 history: if x has not been visited before, γ(x ) equals 1. Otherwise, it will be 0. This forces the system not to re-enter states it has already visited. The term e λδ(x ) assumes that an agent will pursue its goal based on a Boltzmann policy [5]. Validator: The plan validator is developed as a help tool for the model developer to check if a given plan is valid according to the model description. The validator can output three possible plan outcomes: the plan is successful and the goal is reached; the plan is successful but the goal is not reached; and the plan has failed. The third output provides the additional extremely useful information of the time slot and agent where the preconditions for successful plan execution were not met. 4 Application Domains The CCBM is developed with the idea to perform activity recognition in context-aware domains where prior knowledge could be used for substituting training data and thus reducing the need for training the model. To use the tool for inferring the user activity and intention, a CCBM model of the domain in question is build beforehand, containing the set of actions, their durations and any additional problem specific information. The tool then compiles the model into probabilistic inference machine and uses an HMM (for exact inference) or a particle filter (for approximate inference) in order to estimate the user state (at present forward filtering is used). For estimating the user goal, the model could be run in parallel with different initial and goal states and the one with the highest likelihood is assumed to be the user goal. However, at present the tool evaluation was centered on the activity recognition process and the intention (goal) recognition is a matter of current and future research. So far the tool was successfully used in a smart meeting room scenario where several 3-Person meetings with different agenda and durations took place. The CCBM tool was able to recognise the performed team and agent activities relying only on context information with accuracy of about 90% [8]. Another application domain of the CCBM tool is an office scenario where two colleagues act autonomously while preparing coffee, fixing the printer and printing documents. Although the tool was receiving only scarce sensor information (location sensors detected whether a person is standing next to one of the objects in question) and was heavily relying on causal reasoning, it was able to recognise the executed activity and to provide a reasonable plan execution sequence of the two non-interacting agents [2]. Furthermore, the tool is currently being applied to a kitchen task assessment problem where the preliminary results show that the tool is successful in reasoning not only about the performed action but also about the objects being used, regardless of the fact that the observed data represents only the actions. 5 Conclusion and Future Work In this paper we presented a tool support for activity recognition using computational causal models. The tool is able to combine symbolic causal model representations with probabilistic sensor information in order to perform probabilistic activity recognition. Among other features, the tool provides real time inference and tracing of multiple

118 112 K. Yordanova, F. Krüger, and T. Kirste users, thus the ability to recognise the team behaviour and the separate users goals. The CCBM tool was successfully applied to two different domains and is currently being used for building models for a third domain. In the future, the tool functionality will be extended (e.g with sub-models for movement trajectories) and it will be tested more extensively on activity recognition problems from the daily living domain. Additionally, it will be applied to problems where not only the current activity has to be recognised, but also the future user goal, or intention, thus providing a vital information for smart assistance systems that need to configure themselves in order to assist the user proactively. References 1. S. Miksch K. Kaiser. Treating Temporal Information in Plan and Process Modeling. Technical Report Asgaard-TR , Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Frank Krüger and Thomas Kirste. Synthesizing sequential bayesian filters for plan and activity recognition from extended precondition-effect rules. In Adj. Proc. 2nd International Joint Conference on Ambient Intelligence. Springer, M. Wurdel and C. Burghardt and P. Forbrig. Supporting ambient environments by extended task models. In Proc. AMI07 Wor. Model Driven Software Engineering for Ambient Intelligence Application, G. Okeyo, L. Chen, H. Wang, and R. Sterritt. Ontology-Based Learning Framework for Activity Assistance in an Adaptive Smart Home. In Activity Recognition in Pervasive Intelligent Environments, volume 4. Atlantis Press, M. Ramirez and H. Geffner. Goal recognition over pomdps: Inferring the intention of a pomdp agent. In Proc. 21st International Conference on Automated Planning and Scheduling, T. van Kasteren, A. Noulas, and G. Englebienne B. Kr se. Accurate activity recognition in a home setting. In Proc. 10th International Conference on Ubiquitous Computing, Kristina Yordanova. Toward a unified human behaviour modelling approach. Technical Report CS-02-11, Institut für Informatik, Universität Rostock, Rostock, Germany, May ISSN Kristina Yordanova, Frank Krüger, and Thomas Kirste. Context aware approach for activity recognition based on precondition-effect rules. In Proc. Wor. COMOREA, PerCom 12, 2012.

119 Demo Contributions

120

121 Auto Classifier Explaining Customers a Machine-Learning Model Benjamin Adrian 1, Markus Ebbecke 1, and Sebastian Ebert 1,2 1 Insiders Technologies GmbH, Kaiserslautern, Germany {B.Adrian, M.Ebbecke, S.Ebert}@insiders-technologies.de 2 Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany Abstract. When explaining customers that the artificial intelligence approach of our products automatically adapts document classifiers on training documents by applying statistical machine-learning, their reaction is similar like if we would tell them about an artificial intelligence in car breaks. Most likely they would dislike it, because they want full control on their data processors. Hence, we sell the Auto Classifier approach, which is the transparent and explainable extension of the respective machine learning components. This demo description presents this approach of providing customers full controls over document classifiers, which is part of nearly all products within Insiders Technologies product line. Keywords. Machine Learning, Explanation 1 Introduction Customers want maximum control. However, they also demand high degrees of automation. When developing, customizing, and selling our products, we had to learn this lesson, by hard. Black box machine-learning models, such as artificial neural networks will not be accepted by any customer, who processes sensitive data. Customers are willing to accept errors in classifications, if the classification system provides transparent and therefore understandable explanations [1]. However, explaining customers the need for collecting a preferable large set of training documents for each class is not a trivial task. At latest, when sufficient classification ratios cannot be achieved, because of bad training data, responsible consultants require a solid basis for discussion and explanation. Hence, we developed an interaction and visualization framework for native document classifiers, the Auto Classifier [2]. It reveals and explains each phase of the machine learning process, including, sensing, segmentation, feature extraction, classification and post processing [3]. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

122 116 B. Adrian, M. Ebbecke, and S. Ebert 2 Machine-Learning Components in Document Classification The main contribution of the Auto Classifier is to support users in required process steps for training and running a document classifier [2]. Every day, customers such as insurance companies receive a large number of documents at their document entry point. By using a categorization scheme, customers classify these documents, i.e., as invoice, damage report, complaint. It is not unusual that customers maintain large categorization hierarchies consisting of more than 200 categories. The document classification assigns an incoming document to one or several predefined categories. For reaching this goal, each category is described by a preferably large set of example documents. Therefore, a training algorithm computes statistics on discriminative features of each category. These statistics can be created by models such as Naïve Bayes [3], Support Vector Machines [4], or Logistic Regression [3]. The general process for creating a classifier is described as follows [3]: Sensing: Incoming documents have to be converted to a standard format. In case of paper based documents, these papers have to be scanned and processed by an optical character recognition (OCR). The document sensing results in a representation form providing information about contained text and layout. Some documents are of low OCR quality. Hence, they are not suitable as training examples and would be misclassified, later on. Customers should understand such problem documents and use the Auto Classifier to ignore documents with a limited number of recognized characters. Segmentation: Either resulting from OCR processing, or resulting from performing white-space-based word segmentation, each document is represented as a bag of words. The Auto Classifier enables users to inspect and visualize this representation in terms of documents and finally of classes. Feature extraction: By using term-based metrics such as term frequency (TF), or the inverse document frequency (IDF), words can be rated with respect to their distribution in training data. Words of high frequency can also be removed by performing a stop word removal since they do not contain any class specific information. The Auto Classifier visualizes the word rating and clarifies removed words. After modifying the feature description, users can easily retrain and re-evaluate their classifiers for inspecting the impact of these modifications on the accuracy of classification results. Classification: Following [5], we train a linear classifier from the LIBLINEAR [4] library. Here, we perform a grid search for finding optimal values for the penalty parameter C, which defines costs for misclassified examples in training data. The overall process is performed by an x-fold cross validation. P, recall, and F-measure [5] are used as metrics for rating a classifier s performance. Depending on the size of training data the user can chose the number x of folds. He is also enabled to set the value of C by hand. The result of training and cross validation can be inspected and visualized by using confusion matrices or several forms of diagrams (see Section 3).

123 Auto Classifier Explaining Customers a Machine-Learning Model 117 Fig. 1. Screenshot of the Auto Classifier Training UI Post processing: After training classifiers, users can inspect the representation of each class by looking at the most important words. Here, the Auto Classifier allows a manual filtering of words (e.g., Remove a person s name if the original training documents are all s from a single customer employee.). The Auto Classifier allows users inspecting the performance of each classifier. Because Logistic Regression [4] provides certainty values for each classified example, we decided to use this classification model. This enables users to define accuracy thresholds within a range of 0.0 and 1.0 on certainties. We defined these ranges as certain, likely, unsure, misclassified. The following colors are used referring to these ranges: green, blue, yellow, red. 3 Auto Classifier Training User Interface The screenshot of the Auto Classifier training user interface (UI) in Fig. 1 shows how we implemented the interaction features that were mentioned in the last section. The UI allows choosing between different classifiers and training and test sets. It consists of the following components:

124 118 B. Adrian, M. Ebbecke, and S. Ebert (A) Classification area: The table on the top side of the UI provides an overview of a classifier s performance. It lists containing classes of the training set. Relating to the selected test set, the numbers of examples as well as the percentage of correctly and incorrectly classified examples are listed. Dependent on given certainty thresholds, a class is marked as certain, likely, and unsure. (B) Pie charts: The pie charts on the right side of the UI visualize the distribution of certain, likely, unsure, and misclassified training examples. Class details: Beneath the classification area, details and properties of selected classes as well as distinctions to other classes are given: (C) Feature distribution: This list on the left figures out importance of word features of a certain class, including their class weights. (D) Confusion matrix: This table on the right is a serialized confusion matrix. It provides details on false positives and false negatives and therefore shows potentially overlapping classes. (E) Degree of automation: On the bottom left side of the UI, the user can define the three ranges of certain, likely, and unsure classifications on a function of classification errors. (F) Histograms: As an alternative visualization form, the histogram also allows users to define the three ranges of certain, likely, and unsure classifications directly on the distribution of training examples on bins of classification certainties. 4 Demo outline The presentation of the Auto Classifier is going to be a live system demonstration on several datasets with an existing categorization scheme. Visitors are invited to using the Auto Classifier for training, modifying the feature space or just exploring different threshold ratios. The live demo of the user interface provides insights on end customers expectations and needs when integrating a machine-learning system into their business processes. 5 Related Work Compared to existing machine-learning toolkits such as WEKA [6], Rapid Miner [7], or KNIME [8], the Auto Classifiers focus is not set on providing various data analyses and processing techniques. Instead, it offers a single way for solving a document categorization problem. For this way, the Auto Classifier provides plenty of transparent user interactions and data visualizations. Finally, it may not offer the best technology for solving a problem up to 99 % precision and recall. But it explains users the shape of their data and provides details and interactions on each single classification item. This enables users exploring the best solution for a given corpus of documents and set of categories.

125 Auto Classifier Explaining Customers a Machine-Learning Model Conclusion The Auto Classifier is a key component of Insiders Technologies product portfolio [2]. We showed how the UI can support the user in understanding classification internals, such that he is able to modify the results according to his needs. The ability to figure out, explain, and interact with classification internals was often the determining argument for customers to buy our product. Acknowledgements This work was funded by the German Federal Ministry of Education and Research (BMBF) in the INDINET project under grant number 01IC10S04I. References 1. Forcher, B., Agne, S., Dengel, A., Gillmann, M., Roth-Berghofer, T.: Towards understandable explanations for Document Analysis Systems. 10th IAPR International Workshop on Document Analysis Systems (DAS 2012), Surfers Paradise, Queensland, Australia, (2012) 2. Klein, B., Dengel, A., Fordan, A.: smartfix: An Adaptive System for Document Analysis and Understanding. In: Reading and Learning-Adaptive Content Recognition, LNCS 2956, Springer Verlag, (2004) 3. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001) 4. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J:. LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research 9, , (2008) 5. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization, Proceedings of the 7th int. conference on Information and knowledge management, p , Bethesda, Maryland, United States (1998) 6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H: The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1 (2009). 7. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), (2006). 8. Silipo, R., Mazanetz, M.P.: The KNIME Cookbook, KNIME Press, Zürich, Switzerland, (2012)

126 Please Tell Me Where I Am: A Fundament for a Semantic Labeling Approach Frank Bahrmann, Sven Hellbach, and Hans-Joachim Böhme HTW Dresden, Fakultät Informatik/Mathematik, Dresden, Germany {bahrmann,hellbach,boehme}@htw-dresden.de Abstract. This paper presents a fast but yet simple solution to create areas on metrical occupancy grid maps which can easily be converted to topological maps. Those maps with their areas provide an understandable non-expert view on a robots environment and allow semantic labeling. This serves as a foundation for supporting navigation tasks like path planning, localization and human-machine-interaction which are not in the scope of this paper. Keywords: Occypancy Grid Map, Metric Topological Map, Semantic Place Labeling, Human-Machine Interaction 1 Introduction The proposed method is part of a scenario where a mobile service robot platform is introduced to a new office or home environment. In this state the robot needs to learn how to interact with its new surroundings. There exists a lot of other research addressing that problem by detecting and classifying significant features through object detecting or similar [1,5,7,4], regardless of the ambiguity of the labels with respect to the features and vice versa. However, in our opinion the best way to accomplish this familiarization is by interacting with the robot in a very human-like manner. This idea is inspired by the human habit to show around new co-workers. To handle this scenario it is necessary for the robot to detect and follow humans, to interact with them through a dialogue and to recognize rooms by splitting a built map in semantic parts - a topological map [6]. The last will be the main subject in this paper. The proposed method could become beneficial to localization tasks, reactive motion control and semantic path planning. 2 Approach Our method is divided into three subsequent steps (see figure 1b-d). To explain the single stages, a rather simple simulated occupancy grid map is used as raw material (see Fig. 1a), more complex maps are shown in the results section. The first step is filtering and dilating the original map (Fig. 1b), followed by thinning (blue line in Fig. 1c) and the last step is separating areas (Fig. 1c and d). S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

127 Please Tell Me Where I Am: A Semantic Labeling Approach 121 (a) (b) (c) (d) Fig. 1: (a) The raw map (where free space is represented by white pixels and occupied space by black pixels) and subsequent processing steps: (b) The dilated and gap closed binary map. (c) The map skeleton (blue) with starting and intersection points (purple) and critical lines (red). (d) The constructed areas. Filtering and Dilating for gap closing and distortion removal: First of all the occupancy grid map is converted in a binary grid map in which the cell state can either be free or occupied by applying a threshold slightly below uncertainty. The next step is meant to close smaller gaps which can occur during the mapping process. A common way to achieve this is by dilating all occupied cells and eroding them back to their original state. To filter out smaller distortions like chair or table legs, a 8-directional flood fill algorithm is used on all occupied cells to find those which are not part of a larger structure like walls. If a filled area is smaller than a specified threshold, then this area will be deleted. Finally, to achieve a higher stability later in the processing chain, all occupied cells are dilated with the robot s radius (see Fig. 1b). After this kind of dilatation the robot can be assumed as having the size of a single grid cell for robot navigation. Thinning to build a skeleton map: The established Zhang & Suen thinning algorithm is used like proposed in [8] to create map skeleton, which is one pixel in width. Fig. 1c shows the thinned example map with blue lines. Classifying intersection- and start-cells to ensure a graph like structure: Ko et al. [2] described a way to use a map skeleton to build topological maps. Unfortunately, their algorithm defines only starting cells leaving out connecting edges, which is insufficient in our case. For the reconstruction of a topological map it is additionally necessary to determine the position of intersections of the skeleton edges. The fastest and most reliable method was to use predefined 3x3 pixel templates. Two cases are shown in Fig. 2, where the middle red pixel represents the tested one. Detecting door hypotheses by finding Critical Lines on the skeleton: All detected starting cells are now used to store a predecessor and a successor per skeleton cell by following along the neighboring cells until each cell on the

128 122 F. Bahrmann, S. Hellbach, and H.-J. Böhme (a) (b) (c) (d) Fig. 2: Example of predefined templates (a) with and (b) without intersection. (c) Ordered neighborhood notation for grid cell P i.(d)areacenter-reddotmarks the center of gravity, green the logically correct area center on the skeleton (blue line). skeleton is visited. The result are connected grid-lines, which can be used to interpolate a map graph later. It is now possible to calculate a normal on each cell P i. That can be done by a local gradient approximation by averaging the point pairs (P i+1,p i+2 ) and (P i 1,P i 2 ) (see Fig. 2c). The averaging of these point pairs was chosen to achieve a higher local stability. The resulting normal is identified as a critical line (expected to be doors which are shown as red lines in Fig. 1c) by the total side clearance (average length of all normals) and scaling their length to serve as an adaptive threshold or by using a constant threshold. Approximating room hypotheses: Determining the proportions of areas (room hypotheses) is done by using a 4-way flood-fill algorithm to fill free cells with IDs, in which the IDs serve as an identifier of the area. The area construction is completed when there is no remaining free cell anymore. With this method it is possible to cover areas regardless of their geometrical shape. To determine the center coordinate of an area, we calculate the center of gravity and look for the area related skeleton cell with the smallest euclidean distance. Hence we ensure to get a coordinate which lies on the skeleton within free space and additionally not outside of the area (shown in Fig. 2d). Integration in path planning: The majority of path planning in mobile robotics is done by using a grid-cell map as a graph in which each cell is represented as a graph node. Where adjacent (orthogonal & diagonal) grid cells are connected with edges. To apply existing grid map based implementations of path planners, we simply transferred all of the resulting skeleton cells in a grid map. Human guided labeling As indicated in the introduction, the map is labeled as follows: The mobile robot system will be delivered into a new environment where a human supervisor will show it around. Our system is already able to navigate collision free, detect and follow people [3] and robustly map indoor environments (SLAM). Even though the used speech dialog system is in an

129 Please Tell Me Where I Am: A Semantic Labeling Approach 123 (a) (b) (c) (d) (e) (f) Fig. 3: Achieved results: (a)/(b) our research lab (163x90 px b= m 2 in s), (c)/(d) the Intel research lab dataset (579x581 px b= m 2 in s), (e)/(f) the fr079 data-set (911x368 px b= m 2 in s) early stage, a small context sensitive grammar has been built to achieve the proposed behavior. With all these abilities, the robot is able to follow its guide, who will verbally label the robot s current position as a coordinate within the grid map. After the tour is completed, the SLAM algorithm produced a grid map, which serves as starting point (raw map) for the proposed method. The verbally defined labels are then propagated throughout the constructed areas. 3 Results This section presents our preliminary results with some qualitative evaluations (shown in Fig 3). Those first results indicate that the method is quite robust. Cluttered maps (see Fig. 3e) seem to demand further research. Furthermore,

130 124 F. Bahrmann, S. Hellbach, and H.-J. Böhme real-world experiments were successfully performed on an open door day at our university. A K-Team Koala robot was placed in a model apartment similar to the raw map shown in Fig. 1a. The visitors task was to verbally show the robot around by using a speech recognition framework to define the robot s goal and label the rooms. For such an event it is expected to provide a stable and reliable system. 4 Conclusion & Future Work This paper covered first steps towards an approach for semantic map labeling in cooperation with a human teacher. A main disadvantage, which has to be addressed, is detecting false positive critical lines. Further steps are to be taken to drastically reduce this problem by applying a pattern recognition system on the detected critical lines and its surroundings. This system does not need to be scale or rotation invariant because of the prior knowledge of the critical lines width and orientation. References 1. Fabrizi, E., Saffiotti, A.: Extracting topology-based maps from gridmaps. In: Robotics and Automation, Proceedings. ICRA 00. IEEE International Conference on. vol. 3, pp IEEE (2000) 2. Ko, B., Song, J., Lee, S.: Real-time building of a thinning-based topological map with metric features. In: Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings IEEE/RSJ International Conference on. vol. 2, pp IEEE (2004) 3. Poschmann, P., Hellbach, S., Böhme, H.J.: Multi-modal people tracking for an awareness behavior of an interactive tour-guide robot. In: Proceedings of the International Conference on Intelligent Robotics and Applications (ICIRA) (2012), in press 4. Sousa, P., Araiijo, R., Nunes, U.: Real-time labeling of places using support vector machines. In: Industrial Electronics, ISIE IEEE International Symposium on. pp IEEE (2007) 5. Tarutoko, Y., Kobayashi, K., Watanabe, K.: Topological map generation based on delaunay triangulation for mobile robot. In: SICE-ICASE, International Joint Conference. pp IEEE (2006) 6. Thrun, S.: Learning metric-topological maps for indoor mobile robot navigation. Artificial Intelligence 99(1), (1998) 7. Topp, E., Christensen, H.: Topological modelling for human augmented mapping. In: Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. pp Ieee (2006) 8. Zhang, T., Suen, C.: A fast parallel algorithm for thinning digital patterns. Communications of the ACM 27(3), (1984)

131 MAINSIM MultimodAl INnercity SIMulation Jörg Dallmeyer 1 and Ingo J. Timm 2 1 Information Systems and Simulation, Institute of Computer Science Goethe University Frankfurt, P.O. Box , Frankfurt, Germany 2 Business Informatics I, University of Trier, D Trier, Germany Abstract MAINSIM is a traffic simulation system for urban multimodal traffic. The simulation graph gets generated from cartographical material in an entirely automatic way. MAINSIM is capable of simulating the traffic of whole cities with cars, bicycles and pedestrians. 1 Introduction The field of traffic research is of ongoing interest and relevance since the first halve of the 20th century. After a time of studying traffic in field studies, first models for car movement were constructed. At first, macroscopic models based on gas kinetic or fluid dynamic equations modeled whole roads. With an increment of computational power, microscopic simulation models emerged, simulating each car as one simulation entity. For the field of AI (Artificial Intelligence), traffic scenarios have been undertaken in different studies. For example, Bazzan et al. [1] investigate models of human travelers reacting to traffic patterns and control measures of these traffic patterns, focusing on distributed and decentralized methods. Da Silva et al. present an approach for optimizing traffic based on reinforcement learning [13]. The traffic lights of isolated crossings adapt themselves to the traffic situation without explicit communication with neighboured junctions. Many traffic simulation systems got developed. Two of the best known systems are MATSIM [12] and VISSIM [9]. MATSIM models cars with queuing theory and thus is suitable for large scale simulations with focus on travel demand modeling and agent plan optimization. On the reverse side, VISSIM models cars with a high-fidelity model based on the Wiedemann-Model [14]. VISSIM is thus suitable for highly realistic studies accompanying with a lack of suitability for large scale simulation studies. One system that fills the gap between MATSIM and VISSIM is SUMO [2]. However, SUMO focuses on vehicular traffic and especially on the modeling of communicating cars. SUMO comes with a mechanism to import cartographical material and generate an input for traffic simulation in an automatic manner. The support of the OSM 1 -format did not exist by the time, the development of MAINSIM started. SUMO is not capable of the simulation of multimodal 1 S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

132 126 J. Dallmeyer and I. J. Timm traffic, because of the lack of microscopic simulation models for bicycles and pedestrians. The simulation system, presented in this work aims on large scale simulation under consideration of multimodality. The modeling complexity in order to set up a simulation in a specific area gets minimized with help of an automatic graph generation technique. 2 Simulation System The simulation system presented in this work is called MAINSIM and has been implemented in Java. The aim of MAINSIM is to support users by efficient modelling of the traffic setup. Therefore, it uses the geographical information system toolkit GeoTools 2. The process of traffic simulation in MAINSIM follows the major steps: environment modelling, setting the parameters of the road infrastructure, modelling road users and specifying individual behavior for them. In a first step, an arbitrarily OSM-file gets parsed and a user defined excerpt gets calculated. The smaller new file gets analyzed and split into several shapefile layers for different types of geometry and logical grouping (e.g., roads, waterways, polygons, points). This is an important step, because on the one hand, rendering of the simulation area gets much easier, when the data is grouped this way. On the other hand, additional information can simply be included into the simulation, e.g., a digital terrain model (DTM). The road layer is used to compute a simulation graph, enriched with selected additional information from other layers like, e.g., the area of a town in order to set a maximum velocity restriction in this area. Available information is directly added to the graph (e.g., maximum velocity, number of lanes, one-way roads). OSM does not provide full information about every road. Thus, several analysis steps find roundabouts, connect crossing roads, set realistic velocity restrictions and numbers of lanes in relation to the type of the corresponding roads. The curvatures of the road geometries get analyzed in order to find velocities, that are realistically drivable. This is due to the fact, that the velocity restrictions do not always reflect these velocities and missing information in the material from OSM may lead to an overestimation of velocities. Traffic lights and crosswalks are placed at the listed positions. MAINSIM provides simulation models for cars, bicycles and pedestrians. All models are microscopic models, continuous in space and discrete in time with one simulation iteration lasting one second real time. The models are focusing on urban traffic and the interdependencies between different types of road users. For example, cars are able to overtake bicycles if the road has a sufficient width or when a fair gap exists so that oncoming traffic is not hindered. Pedestrians cross roads under preference of crosswalks and traffic lights. On the other hand, pedestrians look for sufficient gaps. Each pedestrian has an own value of gap acceptance, affected by a level of aggressiveness. Each road user has an individual 2

133 MAINSIM MultimodAl INnercity SIMulation 127 route plan, is able to replan and has custom values of acceleration, maximum velocity, length of vehicle et cetera. MAINSIM offers methods for analysis of traffic patterns as well as the ability to change the behavior of specific road users. More information about the simulation system, as well as the included microscopic models for road users and the corresponding model evaluations can be found in [3 5]. Several extensions and case studies were done. In [8], MAINSIM was extended by a fuel consumption and emission model, calculating realistic values in different scenarios as well as a CO 2 map for the city of Hanau. A novel method for routing cars on basis of an ant-inspired approach was presented in [7]. In [10], the interplay between pedestrians and road users was examined, leading to a method for integrating the effects without really simulating pedestrians. Machine learning techniques were used in order to find adaption strategies on basis of the current measured traffic state to downsize the risk of jams on motorways [11]. A study about the effect of road users behaving selfish and breaking traffic rules was done for different scenarios on motorways and urban traffic [6]. 3 Conclusion and Future Work MAINSIM is a traffic simulation tool for fast what-if-analyses. It is easy to set up a traffic scenario. Parameters like the amount of traffic, the routing behavior or the composition of traffic can be set arbitrarily. In the near future, a study about CO 2 -emissions in the city of Frankfurt am Main will be finished. The study is done in the map excerpt, shown in Figure 1. The sub-division is done into cells that are related to Origin-Destination- Matrices. The larger detail shows an excerpt with underlying height information provided by a DTM. The smaller detail shows a crossing with a traffic light. The simulation graph consists of 64,839 edges and 51,486 nodes and has an overall length of road of about 6,316 km. The simulation runs are done on standard desktop PCs. The aim of this study is the combination of MAINSIM with a gas dispersion simulation system in order to simulate, where the emissions are drifting over time under consideration of real world meteorology data of the simulation area. In the described scenario, about 1.6 mio trips are generated within 24 hours simulated time. One simulation run with two hours settlement phase and 24 hours measurement phase needs about two days simulation time on a PC with an Intel E6750-Processor (2,66Ghz). This is due to the on-line calculation of routes for each simulated car. MAINSIM is an important tool for the field of AI, because of the fact, that the overall traffic situation emerges from the interplay of the actions of the different road users and even the different types of road users. Behavioral variations, e.g., different potentials of aggressiveness can be easily implemented for groups of road users or even single road users. The traffic situation can be undertaken from the perspectives of cars (passenger cars, trucks and delivery vans), as well as bicycles and pedestrians. This leads to the opportunity to stress current road

134 128 J. Dallmeyer and I. J. Timm Figure 1. Current study about CO2 -emissions in the area of Frankfurt am Main. networks and potential changes in road networks for fitness under consideration of the different traffic perspectives. In the future, MAINSIM needs to be extended by a calibration mechanism towards sensor data from real roads. Additionally, a component to generate travel demands from statistical information about the simulation area, as well as a method for automatic integration of traffic lights with real world circuits has to be generated. More information about MAINSIM, as well as videos that show characteristics of the simulation can be found on Acknowledgments. This work was partly supported by the MainCampus scholarship of the Stiftung Polytechnische Gesellschaft Frankfurt am Main. Special thanks to Hessisches Landesamt fu r Bodenmanagement und Geoinformation for providing the DTM and Hessen Mobil - Straßen- und Verkehrsmanagement for calculation of the used ODM. References 1. Bazzan, A.L.C., Nagel, K., Klu gl, F.: Integrating MATSim and ITSUMO for daily replanning under congestion. In: Proceedings of the 35th Latin-American Informatics Conference, CLEI. Pelotas, Brazil (September 2009), maslab/pergamus/pubs/bazzan+2009.pdf.zip

135 MAINSIM MultimodAl INnercity SIMulation Behrisch, M., Bieker, L., Erdmann, J., Krajzewicz, D.: Sumo - simulation of urban mobility: An overview. In: SIMUL 2011, The Third International Conference on Advances in System Simulation. pp Barcelona, Spain (October 2011), ISBN Dallmeyer, J., Lattner, A.D., Timm, I.J.: From GIS to Mixed Traffic Simulation in Urban Scenarios. In: Liu, J., Quaglia, F., Eidenbenz, S., Gilmore, S. (eds.) 4th International ICST Conference on Simulation Tools and Techniques, SIMUTools 11, Barcelona, Spain, March 22-24, pp ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brüssel (2011), ISBN Dallmeyer, J., Lattner, A.D., Timm, I.J.: Data Mining for Geoinformatics: Methods and Applications, chap. GIS-based Traffic Simulation using OSM. Cervone, Guido; Lin, Jessica und Waters, Nigel (eds.) Springer (2012), (accepted) 5. Dallmeyer, J., Lattner, A.D., Timm, I.J.: Pedestrian Simulation for Urban Traffic Scenarios. In: Bruzzone, A.G. (ed.) Proceedings of the Summer Computer Simulation Conference rd Summer Simulation Multi-Conference (Summer- Sim 12). pp (2012), Curran Associates, Inc. 6. Dallmeyer, J., Lattner, A.D., Timm, I.J.: Selfish Road Users - Case Studies on Rule Breaking Agents for Traffic Simulation. In: 10th German Conference on Multiagent System Technologies (2012), (accepted) 7. Dallmeyer, J., Schumann, R., Lattner, A.D., Timm, I.J.: Don t Go with the Ant Flow: Ant-inspired Traffic Routing in Urban Environments. Seventh International Workshop on Agents in Traffic and Transportation, Valencia, Spain (2012) 8. Dallmeyer, J., Taubert, C., Lattner, A.D., Timm, I.J.: Fuel Consumption and Emission Modeling for Urban Scenarios. In: Troitzsch, K.G., Möhring, M., Lotzmann, U. (eds.) Proceedings of the 26th EUROPEAN Conference on Modelling and Simulation. pp (2012), ISBN Fellendorf, M.: VISSIM: A microscopic Simulation Tool to Evaluate Actuated Signal Control including Bus Priority. 64th Institute of Transportation Engineers Annual Meeting (1994) 10. Lattner, A.D., Dallmeyer, J., Paraskevopoulos, D., Timm, I.J.: Approximation of Pedestrian Effects in Urban Traffic Simulation by Distribution Fitting. In: Troitzsch, K.G., Möhring, M., Lotzmann, U. (eds.) Proceedings of the 26th EU- ROPEAN Conference on Modelling and Simulation. pp (2012), ISBN Lattner, A.D., Dallmeyer, J., Timm, I.J.: Learning Dynamic Adaptation Strategies in Agent-based Traffic Simulation Experiments. In: Ninth German Conference on Multi-Agent System Technologies (MATES 2011). pp Springer: Berlin, LNCS 6973, Klügl, F.; Ossowski, S. (2011), ISBN MATSim development team: MATSim: Aims, approach and implementation. Presentation (2007), IVT, ETH Zürich, Switzerland 13. Silva, B.C.d., Oliveira, D.d., Bazzan, A.L.C., Basso, E.W.: Adaptive traffic control with reinforcement learning. In: Bazzan, A.L.C., Chaib-Draa, B., Klügl, F., Ossowski, S. (eds.) Proceedings of the 4th Workshop on Agents in Traffic and Transportation (AAMAS 2006). pp (May 2006), downloads/ws28att.pdf 14. Wiedemann, R.: Simulation des Straßenverkehrsflusses. Institut für Verkehrswesen. Schriftenreihe. Heft 8, Institut für Verkehrswesen der Universität Karlsruhe (1974)

136 Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D Saarbrücken, Germany Abstract. We describe Appraise, an open-source toolkit which can be used to do manual evaluation of Machine Translation output. Appraise allows to collect human judgments on translation output, implementing annotation tasks such as 1) translation quality checking, 2) ranking of translations, 3) error classification, and 4) manual post-editing. It uses an extensible format for import/export of data and can easily be adapted to new annotation tasks. The annotation tasks are explained in more detail in the paper. The current version of Appraise also includes automatic computation of inter-annotator agreement scores resulting in quick access to evaluation results. Appraise has successfully been used for a wide variety of research projects. Keywords: Machine Translation, Evaluation, Applications 1 Introduction Evaluation of Machine Translation (MT) output to assess translation quality is a difficult task. There exist automatic metrics such as BLEU [11] or Meteor [4] which are widely used in minimum error rate training [10] for tuning of MT systems and as evaluation metric for shared tasks such as, e.g., the Workshop on Statistical Machine Translation (WMT) [3]. The main problem in designing automatic quality metrics for MT is to achieve a high correlation with human judgments on the same translation output. While current metrics show promising performance in this respect, manual inspection and evaluation of MT results is still equally important as it allows for a more targeted and detailed analysis of the given translation output. The manual analysis of a machine translated text is, however, a time-consuming and laborious process; it involves training of annotators, requires detailed and concise annotation guidelines, and last but not least an annotation software that allows annotators to get their job done quickly and efficiently. As we have mentioned before, the collection of manual judgments on machine translation output is a tedious task; this holds for simple tasks such as translation ranking but also for more complex challenges like word-level error analysis or post-editing of translation output. Annotators tend to lose focus after several S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

137 Appraise: A Toolkit for MT Evaluation 131 sentences, resulting in reduced intra-annotator agreement and increased annotation time. In our experience with manual evaluation campaigns it has shown that a well-designed annotation tool can help to overcome these issues. In this paper, we describe Appraise, an open-source application toolkit that allows to perform manual evaluation of Machine Translation output. Appraise can be used to collect human judgments on translation output, implementing several annotation tasks. Development of the Appraise software package started back in 2009 as part of the EuroMatrixPlus project where the tool was used to quickly compare different sets of candidate translations from our hybrid machine translation engine to get an indication whether our system improved or degraded in terms of translation quality. A first version of Appraise was released and described by [5]. The remainder of this paper is structured as follows: Section 2 provides a brief description of the evaluation system before we highlight the different annotation tasks that have been implemented in Section 3. Finally, we describe several experiments where Appraise has proven useful (see Section 4) and give some concluding remarks in Section 5. 2 System Description In a nutshell, Appraise is an open-source tool for manual evaluation of machine translation output. It allows to collect human judgments on given translation output, implementing annotation tasks such as (but not limited to): - translation quality checking; - ranking of translations; - error classification; - manual post-editing. The software features an extensible XML import/output format 1 and can easily be adapted to new annotation tasks. The tool also includes code supporting the automatic computation of inter-annotator agreement scores, allowing quick access to evaluation results. We currently support computation of the following inter-annotator agreement scores: - Krippendorff s α as described by [9]; - Fleiss κ as published in [8]; - Bennett, Alpert, and Goldsteins S as defined in [1]; - Scott s π as introduced in [12]. Agreement computation relies on code from the NLTK project [2]. Additional agreement metrics can be added easily. 1 An example of this XML format is available at GitHub: cfedermann/appraise/master/examples/sample-ranking-task.xml.

138 132 C. Federmann We have opened up Appraise development and released the source code on GitHub at Anybody may fork the project and create an own version of the software. Due to the flexibility of the git source code management system, it is easy to re-integrate external changes into the master repository, allowing other developers to feed back bugfixes and new features, thus improving and extending the original software. Appraise is available under an open, BSD-style license. 2 3 Annotation Tasks We have developed several annotation tasks which are useful for the evaluation of machine translation output. All of these have been tested and used during the experiments described in Section 4. The following task types are available for the GitHub version of Appraise: 1. Ranking The annotator is shown 1) the source sentence and 2) several (n 2) candidate translations. It is also possible to additionally present the reference translation. Wherever available, one sentence of left/right context is displayed to support the annotator during the ranking process. We also have implemented a special 3-way ranking task which works for pairs of candidate translations and gives the annotator an intuitive interface for quick A > B, A = B, or A < B classification. 2. Quality Estimation The annotator is given 1) the source sentence and 2) one candidate translation which has to be classified as Acceptable, Can easily be fixed, or None of both. We also show the reference sentence and again present left/right context if available. This task can be used to get a quick estimate on the acceptability of a set of translations. 3. Error Classification The annotator sees 1) the source (or target) sentence and 2) a candidate translation which has to be inspected wrt. errors that can be observed in the translation output. Error annotation is possible on the sentence level as well as for individual words. The annotator can choose to skip a translation marking it as containing too many errors and is able to differentiate between minor and severe errors in the annotation. 4. Post-editing The annotator is shown 1) the source sentence, with left/right context wherever available, and 2) one or several candidate translation. The task is defined as choosing the translation which is easiest to post-edit and then performing the post-editing operation on the selected translation. 2 See

139 Appraise: A Toolkit for MT Evaluation Experiments We have created Appraise to support research work on hybrid MT, especially during the EuroMatrixPlus project. We have also used Appraise in the taraxü project, conducting several large annotation campaigns involving professional translators and language service providers. In the T4ME project, we investigate how hybrid machine translation can be changed towards optimal selection from the given candidate translations. Part of the experimental setup is a shared task (ML4HMT) in which participants have to implement this optimal choice step. We use Appraise to assess the translation quality of the resulting systems. Appraise has also been used in research work related to the creation of standalone hybrid machine translation approaches. Finally, we use Appraise in the context of terminology translation for the financial domain in the MONNET project. 4.1 Results As Appraise is a tool supporting evaluation it is difficult to point to individual results achieved through its usage. We were, however, able to find experimental proof that aforementioned automated evaluation metric Meteor correlates best with results from human judgement. This work has been described and published in [6]. Also, using Appraise, we were able to show that rule-based systems which performed worse than statistical MT systems according to automatic metrics were actually better in translation quality. This is described in our submission to last year s WMT shared task [7]. 5 Conclusion and Outlook We have described Appraise, an open-source tool for manual evaluation of MT output, implementing various annotation tasks such as error classification or post-editing. We also briefly reported on research projects in which different versions of the Appraise toolkit have been used, feeding back into and supporting the tool s development, eventually leading to its current version. Maintenance and development efforts of the Appraise software package are ongoing. By publicly releasing the tool on GitHub, we hope to attract both new users and new developers to further extend and improve it. Future modifications will focus on new annotation tasks and a more accessible administration interface for large numbers of tasks. Last but not least, we intend to incorporate detailed visualisation of annotation results into Appraise. Acknowledgments The research work in this paper has been funded through the T4ME contract (grant agreement no.: ) under the Seventh Framework Programme for Research and Technological Development of the European Commission. We are grateful to the anonymous reviewers for their valuable feedback.

140 134 C. Federmann References 1. Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications through limitedresponse questioning. Public Opinion Quarterly 18(3), (1954) 2. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O Reilly, Beijing (2009), nltk.org/book 3. Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L. (eds.): Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics, Montréal, Canada (June 2012), http: // 4. Denkowski, M., Lavie, A.: Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In: Proceedings of the Sixth Workshop on Statistical Machine Translation. pp Association for Computational Linguistics, Edinburgh, Scotland (July 2011), anthology-new/w/w11/w Federmann, C.: Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 10). Valetta, Malta (May 2010), 6. Federmann, C.: Results from the ml4hmt shared task on applying machine learning techniques to optimise the division of labour in hybrid machine translation. In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4. META-NET ( ) 7. Federmann, C., Hunsicker, S.: Stochastic parse tree selection for an existing rbmt system. In: Proceedings of the Sixth Workshop on Statistical Machine Translation. pp Association for Computational Linguistics, Edinburgh, Scotland (July 2011), 8. Fleiss, J.: Measuring Nominal Scale Agreement among Many Raters. Psychological Bulletin 76(5), (1971) 9. Krippendorff, K.: Reliability in Content Analysis. Some Common Misconceptions and Recommendations. Human Communication Research 30(3), (2004) 10. Och, F.J.: Minimum error rate training in statistical machine translation. In: ACL 03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics. pp Association for Computational Linguistics, Morristown, NJ, USA (2003) 11. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. pp ACL 02, Association for Computational Linguistics, Stroudsburg, PA, USA (2002), upenn.edu/p/p02/p pdf 12. Scott, W.A.: Reliability of Content Analysis: The Case of Nominal Scale Coding. The Public Opinion Quarterly 19(3), (1955)

141 A Conversational System for Multi-Session Child-Robot Interaction with Several Games Ivana Kruij -Korbayová 1, Heriberto Cuayáhuitl 1,BerndKiefer 1, Stefania Racioppa 1, Piero Cosi 2, Giulio Paci 2, Giacomo Sommavilla 2, Fabio Tesser 2, Hichem Sahli 3, Georgios Athanasopoulos 3, Weiyi Wang 3, Valentin Enescu 3,WernerVerhelst 3, Lola Cañamero 4, Aryel Beck 4, Antoine Hiolle 4, Raquel Ros Espinoza 5, and Yiannis Demiris 5 1 Language Technology Lab, DFKI, Saarbrücken, Germany ivana.kruijff@dfki.de 2 Istituto di Scienze e Tecnologie della Cognizione, ISTC, C.N.R., Italy 3 Interdisciplinary Institute for Broadband Technology - IBBT, Vrije Universiteit Brussel, Dept. of Electronics and Informatics, Belgium 4 Adaptive Systems Research Group, School of Computer Science, University of Hertfordshire, United Kingdom 5 Department of Electrical and Electronic Engineering, Imperial College London, UK 1 Introduction Children are keen users of new technologies and new technologies can provide interesting opportunities to enrich children s experience, e.g., for educational and therapeutic purposes.as children are not small adults, it is necessary to research their specific needs and develop systems that address them. The aliz-e project 6 develops cognitive robots for adaptive social interaction with young users over several sessions in real-world settings. We demonstrate a conversational system developed in aliz-e using the Nao robot 7. It engages a user in the following activities (Fig.1): quiz: the child and the robot ask each other series of multiple-choice quiz questions from various domains, the robot provides evaluation feedback; imitation: either the child or the robot presents a sequence of simple arm poses that the other tries to memorize and imitate; dance: the robot explores various dance moves with the child and then teaches the child a dance sequence according to its abilities These activities were chosen with regard to the target application domain of the system, namely long-term interaction with children hospitalized due to metabolic disorders, in particular diabetes. Quiz is a knowledge-exchange ativity meant to support learning of health-related concepts. Due to its prediminantly verbal character and constrained interaction structure it is a good testbed for 6 The EU-FP7 project aliz-e (ICT ), S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

142 136 I. Kruijff-Korbayová et al. Fig. 1. Left to right: Nao in the measurement setup in a sound lab at VUB and playing Quiz, Imitation and Dance during experiments in the San Ra aele hospital in Milan. speech-processing technologies. Dance is an activity that promotes physical exercise. At the same time it provides a challenging domain for motion modeling and processing. Finally, Imitation on the one hand involves memory-exercise, and on the other hand provides a gentle introduction to physical movement for those users who are too shy to join the dance activity. It also involves an interesting mixture of verbal and non-verbal interaction, but more structured than Dance. Besides activity-specific conversation, the interactions involve also a social component (greetings, introductions). During an activity, the robot provides performance feedback to the user. The social aspect here requires careful handling of the evaluation process so as not to discourage the user with negative feedback. As the system is designed to have multiple encounters with a user, the robot s behavior di ers in various aspects from the first session (meeting for the first time) to the subsequent sessions ( knowing the user and their performance). 2 System Description Fig. 2 depicts the system components (more details below). We use the Urbi middleware [3] to implement an event-based approach to integration [10]. Quiz DB User Model Question+answers GMG Name, history, Game results, Dialogue Manager(s) for Quiz/Imitation/Dance Gesture activity Speech activity NLU result NLU Control (start or stop) ASR result Text to synthesize Comm.goal Motor command GU and Pose result NLG Comm.goal Text to synthesize URBI EVENT HANDLER Audio ASR result segments of interest ASR Speech activity Source Localization Control (start or stop) VAD Speech activity Audio data AFE GU and Pose result Comm.goal Control (start or stop) SSL Gesture activity GRU NVBP Motor command Text to synthesize TTS Motor Control Fig. 2. Left: The components of the integrated system. Filled boxes: components implemented in Java, double-line boxes: C/C++, and plain boxes: UrbiScript. The TTS component is either the Acapela TTS on the Nao or the Mary TTS implemented in Java. Right: The Quiz Game Wizard GUI.

143 A Conversational Child-Robot Interaction System 137 Speech Signal Detection and Capture. The Audio Front-End (AFE) component captures the speech signal from the microphones, makes preliminary preprocessing such as sample rate conversion, and sends the audio bu ers to the Voice Activity Detection (VAD) component. VAD allows the robot to detect that dynamically varying sound sources for further analysis are active, using a robust energy based algorithm [8]. For the Sound Source Localization (SSL) component we implemented a Generalized Cross-Correlation based method with a set of pre-measured Time Delays On Arrival followed by parabolic interpolation [2]. Spoken Input Processing The Automatic Speech Recognition (ASR) component uses the Open-Source Large Vocabulary CSR Engine Julius 8 for which we trained an Italian child acoustic model. For demonstration purposes we also use an o the-shelf ASR component for English. Further processing in the Natural Language Understanding (NLU) component proceeds along two paths: For the recognition of quiz questions, answer options and answers we use fuzzy matching of recignized content words against the Quiz DB entries. For the recognition of other dialogue acts we use either partial parsing or keyword spotting as a fallback. Visual Input Processing For the Gesture recognition and Understanding component (GRU) we have been experimenting with various methods. For example, one method to trace hands in the Imitation Game uses skin detection [9] enhanced with motion history [5]. An alternative method uses face detection and tracking algorithm [1] to define the vertical areas where the hands might move, and either motion detection or various optical flow algorithms. Dialogue Management Speech and gesture interpretations go to the Dialogue Manager (DM) that bears primary responsibility for controlling the robot s conversational behaviour and the game progress. It keeps track of the interaction state, integrates the interpretations of the user s input/actions w.r.t. this state, and selects the next action of the system as a transition to another state, making progress towards a goal. How exactly this is done depends on the game. For example, selecting the next suitable question in Quiz is done by a separate Game Move Generator (GMG) component that also accesses the Quiz DB. Inspired by the Zone of Proximal Development theory proposed by Vygotsky, the system takes the user s performance into account. For example, in the dance activity, the key point is to propose dance moves with gradually increasing or decreasing complexity based on the user s performance. On the one hand, the selected move should be within the user s capabilities to avoid discouraging the child, and on the other hand, challenging enough to maintain the child engaged in the task and willing to continue. The dance move selection mechanism thus considers: the level of complexity of the dance moves, a hierarchical representation of dance moves and the user s current potential capability to perform the di erent moves. The quiz question selection algorithm similarly takes into account the di culty level of a question, whether it was asked already, and whether 8

144 138 I. Kruijff-Korbayová et al. the user knew the answer. In both quiz and imitation, the robot makes mistakes on purpose, to maintain a performance level approximating that of the user. User-specific information (e.g., name, age) and interaction history (e.g., games played, achieved performance) are kept in the User Model (UM), which also receives updates from the DM. Game-specific information is also stored. To experiment with the system without relying on fully automatic processing, we developed a Wizard-of-Oz interface (Fig. 2). Given a user input, the wizard can select the corresponding user dialogue act. The DM then selects the next action. In the automatic mode, the DM passes a dialogue act to the NLG and NVBP components. In a non-automatic mode, the action selected by the DM is highlighted in the interface for the wizard to aprove or override. It is possible to switch between automatic and wizarded DM at any time during a session. Spoken Output Production Spoken output is produced by the Natural Language Generation (NLG) and Text-To-Speech Synthesis (TTS) components. The system action selected by the DM specifies the type of dialogue act and the values of information state variables important for verbalization selection. Verbalization is determined by an utterance planner using a set of graph rewriting rules. The output is either a string passed directly to the TTS, or a logical form that serves as input to a grammar-based lexical realization component using OpenCCG 9. Since repetitive verbalization of system output could be annoying and thus negatively influence engagement, we implemented a large range of verbal output variation. Selection among variants is either random or controlled by selectional criteria, taking into account the content to be conveyed and the dialogue context. To foster a sense of familiarity between the robot and the user in interactions over multiple sessions, the robot explicitly acknowledges and refers to common ground with a given user, thus making it explicit that it is familiar with them. For speech synthesis the commercial Acapela TTS system 10 is available by default on the Nao. However, we also integrated the open source Mary TTS platform 11, for which we developed a new Italian voice. Mary TTS supports state of the art HMM-synthesis technology, and enables us to experiment with the manipulation of para-verbal parameters (e.g. pitch shape, speech rate, voice intensity, pause durations) for the purpose of expressive speech synthesis, and the voice quality and timbre modifications algorithms [11] useful to convert an adult TTS voice into a child like voice. To further enhance the contextual appropriateness of the output speech we experiment with modifications of the spoken output prosody using the support for controling the prosody of TTS voices with symbolic markup of speech rate, pitch and contour. So far, we implemented prosodic prominence modification (stress) on words that realize the focus of a sentence and emotional prosody modification according to the emotional state of the robot (sad and happy)

145 A Conversational Child-Robot Interaction System 139 Nonverbal Behavior Production The Non-verbal Behavior Planning (NVBP) and Motor Control (MC) components produce arm gestures and head&body poses. Besides the game-specific moves and poses in the imitation and dance games, static key poses are produced to display emotions, namely anger, sadness, fear, happiness, excitement and pride [4]. 3 Demonstrated Features The present demonstration will focus in particular on spoken language input and output processing. This includes robust natural language understanding, dialogue management based on hierarchical reinforcement learning [6] using flexible hierarchical dialogue control [7], varied verbalization production, familiarity across multiple sessions and contextually controlled speech synthesis. References 1. OpenCV library website ((accessed )), 2. Athanasopoulos, G., Brouckxon, H., Verhelst, W.: Sound source localization for real-world humanoid robots. In: Proceedings of 11th International Conference on Signal Processing (SIP 2012). pp WSEAS (Mar 2012) 3. Baillie, J.: URBI: Towards a Universal Robotic Low-Level Programming Language. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp IEEE (2005) 4. Beck, A., Cañamero, L., Bard, K.: Towards an a ect space for robots to display emotional body language. In: Proceedings of the 19th IEEE international symposium on robot and human interactive communication. pp Ro-Man 2010, IEEE (2010) 5. Bradski, G., Davis, J.: Motion segmentation and pose recognition with motion history gradients. Machine Vision and Applications 13, (2002) 6. Cuayáhuitl, H.: Learning dialogue agents with bayesian relational state representations. In: Proceedings of the IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (IJCAI-KRPDS), Barcelona, Spain. pp (2011) 7. Cuayáhuitl, H., Kruij -Korbayová, I.: An interactive humanoid robot exhibiting flexible sub-dialogues. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL- HLT), Montreal, Canada (2012) 8. Dekens, T., Verhelst, W.: On the noise robustness of voice activity detection algorithms. In: Proceedings of 12th Annual Conference of the International Speech Communication Assosiation (INTERSPEECH). pp ISCA (Sep 2011) 9. Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. Int. J. Comput. Vision 46, (2002) 10. Kruij -Korbayová, I., Athanasopoulos, G., Beck, A., Cosi, P., Cuayáhuitl, H., Dekens, T., Enescu, V., Hiolle, A., Kiefer, B., Sahli, H., Schröder, M., Sommavilla, G., Tesser, F., Verhelst, W.: An event-based conversational system for the nao robot. In: IWSDS Granada, Spain (Sep 2011) 11. Tesser, F., Zovato, E., Nicolao, M., Cosi, P.: Two Vocoder Techniques for Neutral to Emotional Timbre Conversion. In: Yoshinori Sagisaka, Tokuda, K. (eds.) 7th Speech Synthesis Workshop (SSW). pp ISCA, Kyoto, Japan (2010)

146 Making Virtual Pancakes Acquiring and Analyzing Data of Everyday Manipulation Tasks through Interactive Physics-based Simulations Lars Kunze 1, Andrei Haidu 1, and Michael Beetz 2 1 Intelligent Autonomous Systems, Technische Universität München, Germany kunzel@cs.tum.edu 2 Artificial Intelligence, University of Bremen, Germany beetz@tzi.de Abstract. Teaching robots everyday tasks like making pancakes by instructions requires interfaces that can be intuitively operated by nonexperts. By performing manipulation tasks in a virtual environment using a data glove task-related information of the demonstrated actions can directly be accessed and extracted from the simulator. We translate low-level data structures of these simulations into meaningful first-order representations, called timelines, whereby we are able to select data segments and analyze them at an abstract level. Hence, the proposed system is a powerful tool for acquiring examples of manipulation actions and for analyzing them whereby robots can be informed how to perform a task. 1 Introduction In their daily routines personal robot assistants are supposed to accomplish novel tasks for which they have not been pre-programmed in advance. In [6], it is demonstrated how robots can extend their task repertoire by extracting natural language step-by-step descriptions from the Web and translating them into well-defined executable plans. For example, the instructions for making a pancake read as follows: 1) pour the pancake mix into a pan, 2) flip the pancake using a spatula, 3) place the pancake onto a plate. These instructions are descriptive enough for humans to understand the task. However, for robots these instructions are highly underspecified. That is, a robot has to infer the appropriate parameters of these actions by other means. By observing humans performing the task the robot can estimate some of the missing parameters. For example, the robot could estimate parameters like height and angle of the container while the pouring action is performed. Also the duration of this action could be estimated. Such information could be extracted from instruction videos retrieved Fig. 1. Rosie preparing a pancake. from S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

147 Everyday Object Manipulation in Virtual Environments 141 the Web or from a human tracking system [2]. Since our goal is to acquire a deep understanding of the physical effects of such manipulation actions, we propose a virtual manipulation environment based on a physics-based simulation. Objects within this virtual environment can be manipulated using a data glove and a 3D position sensor where the sensor information is directly translated into a pose and articulation of the simulated hand model. Since we have complete knowledge about the simulated world state we are able to extract different kinds of information of the task-related objects. These information include, for example, an object s position, orientation, linear and angular velocities as well as its bounding box. Also contacts between objects are reported in each time step. In contrast to vision-based systems we do not have to deal with occlusions and other typical problems like the recognition of transparent objects. The virtual manipulation framework, that we have designed and implemented, can be used as a tool for the acquisition of task-related information by logging the internal states of the simulator. The logged simulations are then translated into interval-based first-order representations, called timelines, as described in [5]. By formulating logical queries we can extract task-related information from these timelines semantically. For example, we can ask for the series of poses of the container while it was held in the hand. Then, further methods can be applied on the trajectory data to analyze the manipulation action with respect to various aspects. 2 Virtual Manipulation Environment The virtual environment is based on Gazebo3, a 3D multi-robot simulator with rigid-body physics. In the environment a user wearing a data glove controls a robotic hand which allows him/her to interact with various objects. Figure 2 show the hardware equipment, a user controlling the robot and a screenshot from the virtual environment. The virtual robotic hand (DLR/HIT) has four fingers with four joints, except the thumb which has an extra degree of freedom for easier manipulation. The hand is controlled with the help of a proportional-integral (PI) force controller acting on the wrist. For easier control the gravity acting on the hand is disabled. The data glove we use (X-IST Dataglove) is equipped with 15 bend sensors (three per finger, one for each joint). To get the pose of the 3 Fig. 2. Virtual Manipulation Environment.

148 142 L. Kunze, A. Haidu, and M. Beetz hand within six degrees of freedom we use Razer Hydra, a game controller using a weak magnetic field to detect its absolute position and orientation. The sensor was disassembled from the game controller and attached to the data glove. 3 Preliminary Experimental Results A user performed two tasks related to the pancake scenario: pouring some mix onto a pancake maker and flipping a pancake. We have monitored and logged the data structures of the simulator and translated them to first-order representations (timelines). Figure 3 illustrates steps from both tasks. Fig. 3. Virtual Manipulation Tasks: Pouring liquids and flipping a pancake. By translating the data structures of the simulator into timelines we can use first-order logic to query task-related data semantically. We access the timelines by using predicates similar to those in the Event Calculus [3]. The notation is based on two concepts, namely fluents and events. Fluents are conditions that change over time, e.g., a mug contains a pancake mix: contains(mug,mix). Events (or actions) are temporal entities that have effects and occur at specific points in time, e.g., consider the action of pouring the mix from the mug onto the pancake maker: occurs(pour(mix,mug,pancake maker)). Logical statements about both fluents and events are expressed by using the predicate: Holds(f,t,tl) where f denotes a fluent or event, t simply denotes a point in time, and tl a timeline. Using the predicate Holds tt we can query for a time interval throughout the fluent holds. For example, we can ask for pose, velocities, and bounding box of the mug in a time interval where there was a contact between mug and the robotic hand as follows:?- holds_tt(contacts(mug,hand),i,tl), simulator_values(position(mug,ps),i,tl), simulator_values(orientation(mug,os),i,tl), simulator_values(linear_velocities(mug,lvs),i,tl), simulator_values(angular_velocities(mug,avs),i,tl), simulator_values(bboxes(mug,bbs),i,tl).

149 Everyday Object Manipulation in Virtual Environments 143 where I denotes a time interval, T L a timeline, and the other variables denote lists of their respective data types. Similarly, we can get the last pose of the mug in that interval, e.g., to analyze where the user has placed it after the pouring. In the experiments liquid was poured from different heights which can be seen by clustering the trajectories (Figure 4). First, we applied dynamic time warping to align the trajectories and then we clustered the trajectories as in [1] Fig. 4. Trajectories of the mug when it was in contact with the hand. Raw (left) and clustered (right) trajectories after aligning them using dynamic time warping. Logical queries allow us to select data segments of the logged simulations on an abstract level. For example, we can select only data when the mug is over the pancake maker or when it is tilted at an angle in a certain range. 4 Conclusions and Future Work In this paper we have presented a system for acquiring data of manipulation actions by controlling a robotic hand in a virtual environment using a data glove. By translating the data to timelines we are able to analyze and interpret the performed actions at a semantic level. In future work, we will deliberately tweak the underlying physics of the simulation to produce behaviors that deal with various physical phenomena such as the viscosity of liquids. We will also apply the found parameter values as seeds in our envisioning system for robots [4]. In the long run, we would like to integrate a vision-based tracking system using a physics-based simulation to acquire examples of manipulation actions more naturally.

150 144 L. Kunze, A. Haidu, and M. Beetz Acknowledgments This work has been supported by the EU FP7 Project RoboHow (grant number ) and the cluster of excellence Cognition for Technical Systems (Excellence Initiative of the German Research Foundation (DFG)). References 1. S. Albrecht, K. Ramirez-Amaro, F. Ruiz-Ugalde, D. Weikersdorfer, M. Leibold, M. Ulbrich, and M. Beetz. Imitating human reaching motions using physically inspired optimization principles. In 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia, October, M. Beetz, J. Bandouch, D. Jain, and M. Tenorth. Towards Automated Models of Activities of Daily Life. In First International Symposium on Quality of Life Technology Intelligent Systems for Better Living, Pittsburgh, Pennsylvania USA, R. Kowalski and M. Sergot. A logic-based calculus of events. New generation computing, 4(1):67 95, L. Kunze, M. E. Dolha, and M. Beetz. Logic Programming with Simulation-based Temporal Projection for Everyday Robot Object Manipulation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, September, L. Kunze, M. E. Dolha, E. Guzman, and M. Beetz. Simulation-based temporal projection of everyday robot object manipulation. In Yolum, Tumer, Stone, and Sonenberg, editors, Proc. of the 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May, IFAAMAS. 6. M. Tenorth, D. Nyga, and M. Beetz. Understanding and Executing Instructions for Everyday Manipulation Tasks from the World Wide Web. In IEEE International Conference on Robotics and Automation (ICRA), pages , Anchorage, AK, USA, May

151 BendIT An Interactive Game with two Robots Tim Niemueller, Stefan Schiffer, Albert Helligrath, Safoura Rezapour Lakani, and Gerhard Lakemeyer Knowledge-based Systems Group RWTH Aachen University, Aachen, Germany Abstract. In this paper we report on an interactive game with two robots and review its components. A human user uses his torso movements to steer a Robotino robot along a pre-defined course. Our domestic service robot Caesar acts as a referee and autonomously follows the Robotino and makes sure that it stays within a corridor along the path. If the user manages to keep the Robotino within the corridor for the whole path he wins. The game can be used, for example, to engage people in physical training such as a rehabilitation after an injury. It was designed and implemented as a student project in winter term 2011/ Introduction Humans and robots interact in various ways, often including speech or gestures. In this paper, we present an interactive game that two robots and a human play with each other. The human can control the movements of a simple robot by his torso movement, virtually acting like a joystick, to steer it along a virtual path by bending and turning his own upper body. A second more powerful robot acts as a referee, employing methods for self-localization, navigation, and 3D perception to overview the game and to judge on the human s performance in the game. The two robots involved in this demo are the Festo Robotino 1 as the robot controlled by the human, and our custom built domestic service robot Caesar as the referee. The game setup with the involved robots is shown in Figure 1. To realize such a task several challenges need to be addressed. First, a system must be in place to allow the referee robot to detect the Robotino to judge on the state of the game. Here, we employ well-known methods from the Point Cloud Library (PCL [1]) as described in Section 2. Then, to control the movements of the Robotino by a human bending and turning his upper body, our system uses an RGB-D camera mounted beside the playing field. It does so employing a custom body posture estimation approach described in Section 3. Finally, new and existing components must be integrated with the robot base systems like selflocalization and navigation, and enriched with user interaction and the behavior to facilitate the game as described in Section 4. We conclude in Section S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

152 146 T. Niemueller, S. Schiffer, A. Helligrath, S. Lakani, and G. Lakemeyer Point Cloud w/ Clusters Kinect Bending/ Turning Robotino Cluster BendIT Agent Referee Robot Caesar with Follow Program Robotino Goal Start Virtual Path Human User Fig. 1. The game setup and its components 2 Robotino Detection To detect the Robotino we use the point cloud generated from the RGB-D camera mounted on a pan-tilt unit on the head of the referee robot (cf. Figure 1) combining existing methods and implementations. The overall approach is separated into two phases. First, a model of the Robotino is learned and later this very model is used to recognize the Robotino in a set of candidate point clusters. For both phases point clouds are acquired. First, the point cloud is down-sampled using a voxel grid, meaning that for each volumetric unit in a 3D grid an averaged point is chosen. Afterwards, all planar areas in the point cloud are determined and removed. The remaining points are segmented into clusters. To create the model a-priori, the cluster known to represent the Robotino is manually selected and a model is generated using Viewpoint Feature Histograms (VFH [2]). The cluster is segmented into patches. In each patch the relative pan, tilt, yaw angles, and the distance between the central point translated to the central viewing direction are calculated, as well as the angle between the central viewing point and the normal of each point. Binning these values produces the desired feature histograms. During the on-line recognition, for each of the clusters, the VFH signatures are determined and compared to the model s feature histograms. The closest matching cluster is chosen as the target Robotino. The computing power requirements are moderate. Interestingly, one of the most expensive parts is the down-sampling of the point cloud after acquisition. It works by taking the average for a volumetric grid with a edge length of 2 cm. The learned VFH models depend on this parameter, so adjusting and tuning it requires re-learning of the models each time. With this limitation we can currently operate at about 5 Hz. 3 Human Body Posture Estimation Fast and on-line estimation of the human body posture is done by combining well-known methods with a custom, simple but effective approach. The posture

153 BendIT An Interactive Game with two Robots 147 recognition roughly works as follows. During the game, point clouds are acquired continuously. For each cycle, the point cloud is down-sampled using a voxel grid and the points are vertically constrained depending on the room height to cut off ground and ceiling planes. The algorithm works with small data-sets and can be implemented with readily available Open Source libraries as opposed to [3]. In a first step, we segment clusters in the remaining points. For every cluster we verify if it represents a human using a VFH model of a human that we trained with data collected from different humans. If the human bends towards the robot, this may fail, because then head and torso are segmented as two separate clusters. In such a situation, we look for small clusters close to a larger one and merge them and check again if the VFH model applies now. Then, on the human cluster we start with calibrating a neutral posture of the human by finding the hips. The movement indicators are then computed as deviations of the body posture, for one from the perpendicular (bending vector) and from the angle facing the camera (turning angle). The human is tracked until it is lost for a certain period of time, after which we start the detection on all clusters again. The first step of the calibration is to identify the height of the hips. Therefore, we start looking for the head in the human cluster as the smallest subset at the vertically highest point of the cluster for which we can match a sphere using RANSAC [4]. Then, starting from the centroid of the human cluster, we move down in slices parallel to the ground plane, clustering the points in each of the slices. In those slices we are looking for the cluster with the biggest width. In each slice, the biggest cluster belongs to the torso. We assume the biggest cluster among all slices to be the hips. After calibration has been completed we switch to a tracking mode re-using the estimated data to detect the human more quickly. If multiple human are in the image, we take the cluster within a certain radius around the previous sighting. During tracking, we take the slice at the same height as was found for the hips during calibration. Starting from this slice, we move upwards detecting the shoulders. We first remove the points belonging to the head from the human cluster. Then, we move upwards from the hips centroid of the human-cluster in slices again, looking for the topmost points of the body, the leftmost and rightmost points being the left and the right shoulder, respectively. We compute movement indicators from the posture of the upper body as two vectors as follows. First, the deviation of the vector between the centroid of the hips and the centroid of the shoulders to an upright vector is interpreted as a movement command in the xy plane. Second, we interpret the deviation of the vector between the two shoulders and a horizontal vector in the image plane of the RGB-D camera for turning commands. There are other approaches to body posture estimation such as the one presented in [5]. However, with some assumptions that follow from the design of the game and the corresponding restrictions in the space of possible postures our method only needs the simple steps sketched above to yield sufficiently accurate results in very little time. We have implemented the described cylinder fitting for arms, but did not need it for the given game.

154 148 T. Niemueller, S. Schiffer, A. Helligrath, S. Lakani, and G. Lakemeyer 90 Step Time (ms) Tracking Detection Plane Removal Downsample Running Time of Experiment (sec) Fig. 2. Processing time of steps of the approach in a game With the described approach we are able to process about 15 point clouds per second on an Intel E6750 at 2.66 GHz. In Figure 2 you see the performance plotted for a game of three minutes. A considerable amount of time is used for the down-sampling of the point cloud. The plane removal step is very quick and helps to filter out large parts of the point clouds. The detection step takes about 10 ms. But this step can become more costly. For example, the merging steps if torso and head are separate clusters involves running the VFH step more often. Also, if the user is lost we need to restart the more expensive detection on all clusters. The tracking takes about constant time. 4 An Interactive Human-Robot Game The described perception modules are integrated using the Fawkes robot software framework [6]. For the visualization of the game, we use ROS rviz [7]. The actual game logic was implemented using our Lua-based Behavior Engine [8]. The human initiates a new game for example by instructing the referee robot using speech. The stationary Kinect is calibrated to detect the human. Afterwards, the forward and sideward bending angles of the human s upper body are converted into holonomic forward and sideward motions. Twisting the torso and hence rotating the shoulders adds a rotation to that movement. The referee robot uses one out of a set of pre-defined paths ( levels ), which differ in length and deviation tolerance and thus in difficulty. It waits until the Robotino has reached the starting position, at which point the referee announces that the game starts. The output of the human posture recognition is transformed into the exact same data that a 3-axis joystick would produce. Using this data, the human steers the Robotino through the environment. In the visualization, he can observe the referee robot s perception of the situation, including the positions of the referee and the Robotino, and the corridor along which the Robotino is to move. The referee robot will autonomously follow the Robotino at a certain distance to assert that it remains visible to the referee, even for extended games. If the Robotino stays outside of the allowed path margin for a certain amount of time (in the order of a few seconds), the referee declares that the human lost the game. If, however, the human manages to steer the Robotino towards the end

155 BendIT An Interactive Game with two Robots 149 of the path within the allowed tolerance, he wins the game. The referee robot s checking of the Robotino following the path uses a global localization [9] for its own position and the relative position at which it perceives the Robotino. The overall performance is good enough to introduce new users to the game within a few minutes. At the moment the game is explained by a human. A further step would be to have the robot instruct the player on what to do. 5 Conclusion We presented an interactive game that a human can play with two robots. The human s objective is to steer the smaller of the two robots along a virtual path to a goal position using his upper body movements as a joystick -like control. The second robot acts as a referee overseeing the performance of the human player. We briefly reviewed the components needed to implement such a game ranging from perception of the human and the robot to translating the body movements to control commands. We built on existing methods where possible and implemented a novel approach to human body posture recognition using the Kinect RGB-D camera. We think that such a game could potentially be applied in rehabilitation or, in general, as a motivation to engage in physical training. References 1. Rusu, R.B., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: IEEE Int l Conf. on Robotics and Automation (ICRA), Shanghai, China (2011) 2. Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: IEEE/RSJ Int l Conf. on Intelligent Robots and Systems (IROS), IEEE (2010) Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2011) 4. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (1981) Droeschel, D., Behnke, S.: 3D Body Pose Estimation Using an Adaptive Person Model for Articulated ICP. In Jeschke, S., Liu, H., Schilberg, D., eds.: Intelligent Robotics and Applications. Volume 7102 of LNCS. Springer (2011) Niemueller, T., Ferrein, A., Beck, D., Lakemeyer, G.: Design Principles of the Component-Based Robot Software Framework Fawkes. In: Int l Conf. on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). (2010) 7. Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: ROS: an open-source Robot Operating System. In: ICRA Workshop on Open Source Software. (2009) 8. Niemueller, T., Ferrein, A., Lakemeyer, G.: A Lua-based Behavior Engine for Controlling the Humanoid Robot Nao. In: RoboCup Symposium (2009) 9. Strack, A., Ferrein, A., Lakemeyer, G.: Laser-Based Localization with Sparse Landmarks. In Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y., eds.: RoboCup 2005: Robot Soccer World Cup IX. Volume 4020 of LNCS., Springer (2006)

156 robocd: Robotic Order Cups Demo An Interactive Domestic Service Robotics Demo Stefan Schiffer, Tobias Baumgartner, Daniel Beck, Bahram Maleki-Fard, Tim Niemüller, Christoph Schwering, and Gerhard Lakemeyer Knowledge-based Systems Group, RWTH Aachen University, Aachen, Germany Abstract. This paper describes an interactive demonstration by the AllemaniACs domestic service robot Caesar. In a home-like environment Caesar s task is to help setting the table. Besides basic capabilities of an autonomous mobile robot it uses methods for human-robot interaction and it also has a sophisticated high-level control that allows for decision-theoretic planning. We use this demo to illustrate the interplay of several modules of our robot control software in carrying out complex tasks. The overall system allows to perform robust reliable service robotics in domestic settings like in the RoboCup@Home league. Also, we show how our high-level programming language provides a powerful framework for agent behavior specification that can be beneficially deployed for service robotic applications. The system was showcased repeatedly, most notably at a national RoboCup competition and at an international conference. 1 Introduction In RoboCup, apart from making agents and robots play soccer, there are also competitions in rescue scenarios as well as in a domestic service robotics setting. While in the soccer leagues quick decision making, cooperation, and teamplay are crucial, the challenge in the RoboCup@Home league is to build a robust robotic system that can safely operate in a human environment and that can interact with humans. As the complexity of the tasks to solve in domestic settings rises, so increases the benefit a robot has from using sophisticated means for decision-making and deliberation. The high-level control of Caesar is based on the language Readylog [1], a variant of the logic-based language Golog [2] which combines explicit agent programming as in imperative languages with the possibility to reasons about actions and their effects. In this paper, we present a demo application of Readylog in a domestic setting to showcase its benefits and its applicability. After we briefly introduce our robot we sketch the domestic service robotics domain. Then we present the robotic order cups demo before we conclude. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

157 robocd: An Interactive Domestic Service Robotics Demo (a) Caesar (b) User pointing at a cup (c) Scene perception 151 (d) Manipulation Fig. 1. Our robot Caesar, the robocd scenario with a user giving instructions and perception and motion planning in simulation for manipulation in an extended setup. 2 The AllemaniACs Domestic Service Robot Caesar An increasingly popular application domain for autonomous mobile robots is domestic service robotics (DSR) where robots perform assistive tasks in a home environment. A competition that focuses on these kinds of applications is RoboCup@Home [3]. Apart from the demands that are commonly put on autonomous mobile robots, a service robot in a domestic setting must meet additional requirements with respect to interactivity, robustness, and accessibility. What is more, it can be more helpful in complex assistive tasks if it also features a sophisticated high-level control. Our robot Caesar, shown in Figure 1(a), is designed and was built to operate in human-populated environments in domestic scenarios. It should be helpful around the house, assisting elderly or disabled people with their daily activities. Caesar meets all the basic requirements put on an autonomous mobile robot, that is, it can navigate in its environment safely and it can localize itself reliably with high accuracy in known environments. Further it is able to detect and recognize people and objects and it can also manipulate objects with its robotic arm. Of particular importance for the demo we discuss in this paper are its robust speech recognition [4] and a component for gesture recognition [5]. Those components are orchestrated in the Fawkes robot framework [6] to form a robust assistive robotic system. Above all the low-level components mentioned so far and a mid-level Luabased behavior engine for basic skills of the robot [7], Caesar has a logicbased high-level control that allows for deliberation and flexible decision-making. We use Readylog [1], a dialect of the robot programming and plan language Golog [2]. Golog is based on Reiter s version of the situation calculus [8] which is a sorted second-order language to reason about dynamic systems with actions and situations. Readylog features several extensions to the original Golog language, most notably it allows for decision-theoretic planning in the spirit of DTGolog [9]. On Caesar we use an implementation of a Readylog interpreter in ECLi PSe -CLP,1 a Prolog dialect. 1 Website at

158 152 S. Schiffer et al. Algorithm 1: Readylog program for the Order Cups Demonstration The terms p 1 to p 4 denote four positions on the table, while I i and P i are variables that hold the color of a cup at position i in the initial and the goal situation, respectively. pos(c) returns the position of the cup with color C. 1 proc main, 2 get Initial Order(I 1, I 2, I 3, Init); %% perceive initial order 3 get Goal Order(P 1, P 2, P 3, Goal); %% inquire about goal order 4 sort cups(p 1, P 2, P 3, 4); %% start planner 5 endproc 6 proc sort cups(p 1, P 2, P 3, H), 7 solve(h, reward cup(p 1, P 2, P 3), 8 while( (p1 = pos(p 1) p 2 = pos(p 2) p 3 = pos(p 3))) do 9 pickbest(cup, {red, green, blue}, 10 pickbest(to, {p 1, p 2, p 3, p 4}, move cup(cup, pos(cup), to))) 11 endwhile 12 endsolve 13 endproc 3 Robotic Order Cups Demo We now discuss a special helping task that Caesar is able to perform: the Robotic Order Cups Demo (robocd). 2 The robot s task in this demo is to help decorating a table. In the scenario there are three differently colored cups (red, green, and blue) on a table. To complete the re-arranging they have to be put in a specific order. A human user is instructing the robot on the desired order by pointing to positions on the table and by simultaneously specifying which cup should be placed at that very position using speech. Figure 1(b) shows a user specifying the desired order of the cups by pointing. Alg. 1 shows the Readylog procedures used in the demo. The procedure main is a sequential program calling sub-procedures for specific tasks. The first step is a call to get Initial Order to perceive the initial order of the cups on the desk. That is, the robot uses its vision system to detect and recognize three cups, namely one cup for each of the colors red, green, and blue. It stores the initial order of the cups in variables I i. The call to get Goal Order then initiates an interactive procedure where the robot asks the user to specify the desired goal positions for the three cups. To do so, the user is requested to point at a position on the table and to say which cup should be placed at that position. For this to work Caesar s modules for speech and gesture recognition are constantly running. They post the results of their recognition along with a timestamp to a central blackboard. All other modules and the high-level control can access the information there. In the robocd scenario, the system uses the simultaneous occurence of position keywords like there in the speech recognition 2 A video of the demonstration is available at

159 robocd: An Interactive Domestic Service Robotics Demo 153 Algorithm 2: The simplified Readylog policy for an example run. The initial order was green, blue, red, the desired order is red, green, blue. 1 exogf Update, if done then 2 move cup(blue, cup position(blue), p 4), 3 exogf Update, if done then 4 move cup(green, cup position(green), p 2), 5 exogf Update, if done then 6 move cup(red, cup position(red), p 1), 7 exogf Update, if done then 8 move cup(blue, cup position(blue), p 3) 9 done data and a pointing gesture in the gesture recognition output to determine the desired goal position of a specific cup. Apart from constructs known from imperative programming languages, e.g., if-then-else, loops, and procedures, Readylog also offers less common constructs like solve and pickbest. The former initiates decision-theoretic planning, the latter is the non-deterministic choice of argument. During planning, the logical specification of the dynamics in the world can be used to reason about the state of the world after executing a program. The non-determinism is resolved by opting for those choices that maximize a reward function. For details we refer the interested reader to [1]. Once Caesar has collected all the necessary information as described above, it determines an execution strategy for the non-deterministic procedure sort cups such that the desired arrangement of the cups is achieved eventually. This is done by means of the decision-theoretic planning just mentioned. In our scenario, the reward function only considers the number of actions needed to reach the desired goal situation. Therefore, Caesar computes the re-ordering with a minimum number of movements. The outcome of the decision-theoretic planning is a so-called policy, i.e. a conditional Readylog program that contains the optimal course of action to achieve the goal. Alg. 2 shows the simplified policy for an example run of robocd. In the policy, positions p 1 p 3 are the initial positions of the three cups while position p 4 is used as a temporary spot in re-ordering the cups. The method exogf Update is used to update the robot s world model to account for changes which were not due to the robot s actions. The current position of the cup with a specific color is retrieved again every time with the helper function cup position. The initial order of the cups in the example was green, blue, red. The desired final order is red, green, blue. The optimal re-ordering, i.e. the re-ordering with the minimum number of actions consists of four move cup actions. To achieve the goal, the robot needs to first move the blue cup to the spare position, then move the green cup to the second spot. After this, the red cup is moved to the first spot and finally the blue cup can be put at the third position. This yields the final order. The application described above shows the potential benefit of deliberation and that it can be integrated in the behavior specification of a robot very easily.

160 154 S. Schiffer et al. The high-level control with its decision-theoretic optimization capabilities could be used for path-planning or for more complicated tasks such as planning the course of actions in daily activities of an elderly person. In an extended version of the demonstration the robot drives around looking for different tables with cups on them, picks up cups from those tables and moves them from one table to another. Then, the robot also uses its localization and navigation capabilities to get around and remember table positions. The different versions of the system were succesfully showcased repeatedly, most notably at a national RoboCup competition and later at an international conference. 4 Conclusions In this paper, we presented an interactive demonstration where our domestic service robot Caesar orders cups on a table. The demo integrates methods for robust human-robot interaction like speech and gesture recognition with a logicbased high-level control. The robot can perform a complex task in a domestic setting where it benefits from its deliberation capabilities by using decisiontheoretic planning to determine an optimal course of action. Although the demo application of sorting cups seems simplistic at first glance, the system is extendable to more sophisticated tasks with only little effort. References 1. Ferrein, A., Lakemeyer, G.: Logic-based robot control in highly dynamic domains. Robotics and Autonomous Systems 56 (2008) Levesque, H.J., Reiter, R., Lespérance, Y., Lin, F., Scherl, R.B.: Golog: A logic programming language for dynamic domains. J Logic Program 31 (1997) Wisspeintner, T., van der Zant, T., Iocchi, L., Schiffer, S.: Robocup@home: Scientific Competition and Benchmarking for Domestic Service Robots. Interaction Studies. Special Issue on Robots in the Wild 10 (2009) Doostdar, M., Schiffer, S., Lakemeyer, G.: Robust speech recognition for service robotics applications. In: Proc. of the Int l RoboCup Symposium 2008 (RoboCup 2008). Volume 5399 of LNCS., Springer (2008) Schiffer, S., Baumgartner, T., Lakemeyer, G.: A modular approach to gesture recognition for interaction with a domestic service robot. In: Proc. Int l Conf. on Intelligent Robotics and Applications. LNCS, Springer (2011) Niemueller, T., Ferrein, A., Beck, D., Lakemeyer, G.: Design Principles of the Component-Based Robot Software Framework Fawkes. In: Proc. Int l Conf. on Simulation, Modeling, and Programming for Autonomous Robots, Springer (2010) 7. Niemueller, T., Ferrein, A., Lakemeyer, G.: A Lua-based Behavior Engine for Controlling the Humanoid Robot Nao. In: Proc. Int l RoboCup Symposium 2009 (RoboCup 2009). Volume 5949 of LNCS., Springer (2009) 8. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press (2001) 9. Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S.: Decision-theoretic, high-level agent programming in the situation calculus. In: Proc. of the 17th Nat l Conf. on Artificial Intelligence (AAAI 00), Menlo Park, CA (2000)

161 Using a Discourse and Dialogue Infrastructure for Collaborative Radiology Daniel Sonntag and Christian Schulz German Research Center for AI (DFKI) Stuhlsatzenhausweg 3, Saarbruecken, Germany Abstract. We provided an infrastructure of the disseminated industrial prototype Radspeech a semantic speech dialogue system for radiologists. The major contribution of this demo paper is the description of a new speech-based interaction scenario of Radspeech where two radiologists use two independent but related mobile speech devices (ipad and iphone), and collaborate via a connected large screen installation using related speech commands. With traditional user interfaces, users may browse or explore patient data, but little to no help is given when it comes to structuring the collaborative user input (into attribute/value pairs for example) and annotating radiology images in real-time with ontology-based medical annotations. A distinctive feature is that the interaction design includes the screens of the mobile devices for touchscreen interaction for more complex tasks rather than a mere remote control of the widgets on the large screen. 1 Introduction Over the last several years, the market for speech technology has seen significant developments [6] and powerful commercial off-the-shelf solutions for speech recognition (ASR) or speech synthesis (TTS). For industrial application tasks such as medical radiology, we implemented a discourse and dialogue infrastructure for semantic access to structured and unstructured information repositories [12]. In this paper, we basically provide two new contributions. First, we provide distinctive features of our new dialogue infrastructure for radiology. Second, we discuss the radiology interaction system in greater detail and explain the implemented dialogue sequences which constitute a running demo system. Thereby we also focus on the special technical components and implementation aspects that are needed to convey the requirements of dialogical interaction in a medical application domain. With traditional user interfaces in the radiology domain (most of which are desktop-based monomodal keyboard input systems), users may browse or explore patient data, but little to no help is given when it comes to structuring the collaborative user input and annotate radiology images in real-time with ontology-based medical annotations. To meet these objectives, we implemented a distributed, ontology-based dialogue system architecture where every major component can be run on a different host and the graphical interface and microphone on mobile devices, increasing the scalability of the overall system. S. Wölfl (Ed.): Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp , The Authors, 2012

162 156 D. Sonntag and C. Schulz In earlier projects [13, 7] we integrated different sub-components into multimodal interaction systems. Thereby, hub-and-spoke dialogue frameworks played a major role [8]. We also learned some lessons which we use as guidelines in the development of semantic dialogue systems [4, 10]; the whole architecture can be found in [9]. Thereby, the dialogue system acts as the middleware between the clients and the backend services that hide complexity from the user by presenting aggregated ontological data. One of the resulting speech system, RadSpeech ( is the implementation of a multimodal dialogue system for structured radiology reports. 2 Task Requirements and Implementation Recently, structured reporting was introduced in radiology that allows radiologists to use predefined standardised forms for a limited but growing number of specific examinations. However, radiologists feel restricted by these standardised forms and fear a decrease in focus and eye dwell time on the images [1, 14]. As a result, the acceptance for structured reporting is still low among radiologists while referring physicians and hospital administrative staff are generally supportive of structured standardised reporting since it eases the communication with the radiologists and can be used more easily for further processing. With our technology developed over the last 5 years, we implemented the first mobile dialogue system on the ipad and iphone, which is tuned for the standardised radiology reporting process. Our solution not only provides more robustness compared to speech-to-text systems (we use a rather small, dedicated, and context-based speech grammar which is also very robust to background noise), it also fits very well into new radiology reporting processes which will be established in Germany and the U.S. over the next several years: in structured reporting you directly have to create database entries of a special vocabulary (according to a medical ontology) instead of text. The semantic dialogue system presented in this demo should be used to ask questions about the image annotations while engaging the clinician in a natural speech dialogue. Different semantic views of the same medical images (such as structural, functional, and disease aspects) can be explicitly stated, integrated, and asked for. This is the essential part of the knowledge acquisition process during the speech dialogue: the grammar of the ASR system only accepts the annotations of a specific radiology sub-grammar which stems from the used medical ontologies; this allows us to reject arbitrary annotations and recognitions with low probability which makes the system very reliable. Upon touching a region on the interaction device, the ASR is activated. After recognition, the speech and gesture modalities are fused into a complex annotation using a combination of medical ontologies. For disease annotations for example, the complete Radlex ( terminology can be used, we but we also use an Web Ontology Language (OWL) version of the international classification of diseases (ICD-10) [3] and the foundational model of anatomy (FMA) [2].

163 Using a Discourse and Dialogue Infrastructure for Collaborative Radiology 157 User 1 User 2 Touch! Touch! Large Collaboration Screen Fig. 1. Multimodal Speech Dialogue Scenario with Multiple Input/Output Devices In addition to ASR, dialogue tasks include the interpretation of the speech signal and other input modalities, the context-based generation of multimedia presentations, and the modelling of discourse structures. According to the utility issues and medical user requirements we identified (system robustness/usability and processing transparency play the major roles), we provide for a special rulebased fusion engine of different input modalities such as speech and pointing gestures. We use a production-rules-based fusion and discourse engine which follows the implementation in [5]. Within the dialogue infrastructure, this component plays a major role since it provides basic and configurable dialogue processing capabilities that can be adapted to specific industrial application scenarios (e.g., the co-ordination of pointing gesture sand ASR activation on the medical images). More processing robustness is achieved through the application of a special robust parsing feature in the context of RDF graphs as a result of the input parsing process. The domain-specific dialogue system RadSpeech is able to process the following medical multiuser-system dialogue on multiple devices: 1 U1: Show me the CTs, last examination, patient XY. 2 S: Shows corresponding patient CT studies as DICOM picture series and MR videos. 3 U1: Show me the internal organs: lungs, liver, then spleen and colon. 4 S: Shows corresponding patient image data according to referral record on the ipad. 5 U1: Annotate this picture with Heart (+ pointing gesture on the ipad) 6 S: Picture has been annotated with Heart. 7 U1: Show it on screen. 8 S: Shows patient XY on the large screen, automatically rendering the picture with the heart annotation in the foreground. 9 U2: and Heart chamber (+ pointing gesture on the iphone) 10 S: Adds the second annotation on screen. 11 U1: Synchronise annotations with my ipad. 12 S: Shows new annotation on the ipad. 13 U2: Search for similar patients. 14 S: The search obtained this list of patients with similar annotations including Heart and Heart chamber. 15 U1: Okay.