LIG Dossier de contractualisation 685 RAPPORT D'ACTIVITE et PROJET SCIENTIFIQUE The PRIMA Group Perception Recognition and Integration for Perception of Human Activity Responsable : James L. Crowley, Professeur, I.N.P. Grenoble 1. PERSONNEL / PERSONNEL Academic Researchers / Enseignants-chercheurs : A. Lux, Professeur, I.N.P. Grenoble P. Reignier, MdC UJF D. Vaufreydaz, MdC UPMF Current Contractual Engineers / Ingénieurs : D. Hall, CDD INPG-SA A. Ferrer-Biosca, CDD INRIA J. Vallet, CDD UJF Doctoral Students / Doctorants : 11 Equivalent Full time researchers / Nombre d'équivalents chercheurs) (NE) : 9.5 2. RESEARCH RESULTS FOR THE PERIOD 2002-2005 / BILAN DES ACTIVITES DE RECHERCHE 02-05 2.1 Scientific Themes and Objectives / Thématique scientifique et objectifs généraux The mission of the PRIMA group is to develop the scientific foundations for a technology for observing humans and human activities using computer vision, acoustic perception and other sensor modalities. Such a foundation will provide a "mother technology", enabling new forms of man-machine interaction, context-aware information services, entertainment, collaborative work and meeting support, context aware video communications, as well as services and security based on visual surveillance. The design and integration of systems for perception of humans and their actions raises a number of hard scientific and technological challenges. These include Robust, view-invariant computer vision. Robust architectures for multi-modal perception. PRIMA - 1
LIG Dossier de contractualisation 686 Context aware interactive environments. New forms of man-machine interaction based on perception Within PRIMA, research towards of these challenges is tightly integrated, with each area providing problems, and experimental domains, and technologies for the other areas. For example, work our context aware environments enables new forms of man-machine interaction. Both environments, and interaction technologies employ our robust real time architecture. The robust architecture includes real time implementations of robust, view invariant image description. In the following text we develop some of the aspects of each of these challenges. 2.1.1. Thème 1: Robust view-invariant Computer Vision. PRIMA members working on this problem / Participants à ce thème de recherché : James Crowley Augustin Lux Daniela Hall Nicolas Gourier Hai Tranh Amaury Negre Keywords Mots-clés : Local Appearance, Affine Invariance, Receptive Fields a) Summary of results A long-term grand challenge in computer vision has been to develop a descriptor for image information that can be reliably used for a wide variety of computer vision tasks. Such a descriptor must capture the information in an image in a manner that is robust to changes the relative position of the camera as well as the position, pattern and spectrum of illumination. Members of PRIMA have a long history of innovation in this area, with important results in the area of Multi-resolution pyramids, scale invariant image description, appearance based object recognition and receptive field histograms published during the period 1987 to 2000. During the period 2002-2004, the group has demonstrated several innovations in this area. b) Detailed Description The visual appearance of a neighbourhood can be described by a local Taylor series [Koe87]. The coefficients of this series constitute a feature vector that compactly represents the neighbourhood appearance for indexing and matching. The set of possible local image neighbourhoods that project to the same feature vector are referred to as the "Local Jet". A key problem in computing the local jet is determining the scale at which to evaluate the image derivatives. Lindeberg [Lin98] has described scale invariant features based on profiles of Gaussian derivatives across scales. In particular, the profile of the Laplacian, evaluated over a range of scales at an image point, provides a local description that is "equi-variant to changes in scale. Equi-variance means that the feature vector translates exactly with scale and can thus be used to track, index, match and recognize structures in the presence of changes in scale. PRIMA - 2
LIG Dossier de contractualisation 687 A receptive field is a local function G(s,x,y) defined over a region of an image [Sch00]. We employ a set of receptive fields based on derivatives of the Gaussian functions as a basis for describing the local appearance. These functions resemble the receptive fields observed in the visual cortex of mammals. These receptive fields are applied to color images in which we have separated the chrominance and luminance components. Such functions are easily normalized to an intrinsic scale using the maximum of the Laplacian [Lin98], and normalized in orientation using direction of the first derivatives [Sch00]. The product of a Laplacian operator with the image that is a local maxima in x and y and scale of the Laplacian provides a "Natural interest point" [Low99]. Such natural interest points are salient points that may be robustly detected and used for matching. A problem with this approach is that the computational cost of determining intrinsic scale at each image position can potentially make real-time implementation unfeasible. We have recently achieved video rate calculation of intrinsic scale by interpolating pixels within a Binomial Pyramid computing using an O(N) algorithm [Cro03]. A vector of scale and orientation normalized Gaussian derivatives provides a characteristic vector for matching and indexing. The oriented Gaussian derivatives can easily be synthesized using the "steerability property" [Fre91] of Gaussian derivatives. The problem is to determine the appropriate orientation. In earlier work by PRIMA members Colin de Verdiere [Co98], Schiele [Sch00] and Hall [Hal00], proposed normalising the local jet independently at each pixel to the direction of the first derivatives calculated at the intrinsic scale. This has provided promising results for many view invariant image recognition tasks. In particularly, this method has been used in the real time BrandDetect system described below. Lux and Hai have very recently developed an alternative method with provides a direct measurement of affine invariant local features based on extending natural interest points to "natural interest lines" [Tra05]. The orientation of natural interest lines provides a local orientation in the region of an image structure. Early results indicate an important gain in discrimination rates. Color is a powerful discriminator for object recognition. Color images are commonly acquired in the Cartesian color space, RGB. The RGB color space has certain advantages for image acquisition, but is not the most appropriate space for recognizing objects or describing their shape. An alternative is to compute a Cartesian representation for chrominance, using differences of R, G and B. Such differences yield color opponent receptive fields resembling those found in biological visual systems. The components C 1 and C 2 encodes the chromatic information in a Cartesian representation, while L is the luminance direction. Chromatic Gaussian receptive fields are computed by applying the Gaussian derivatives independently to each of the three components, (L, C 1, C 2 ). Permutations of RGB lead to different opponent color spaces. The choice of the most appropriate space depends on the chromatic composition of the scene. An example of a second order steerable chromatic basis is the set of color opponent filters shown in figure 1. PRIMA - 3
LIG Dossier de contractualisation 688 Figure 1. Chromatic Gaussian derivatives Key results in this area include r G c = (G L x,g c 1,Gc 2,G c 1 c x,g 2 x,g L xx,g L xy,g L yy ) 1) Fast, video rate, calculation of scale and orientation for image description with normalized chromatic receptive fields [Cro03]. 2) Real time indexing and recognition using a novel indexing tree to represent multi-dimensional receptive field histograms [Pel03]. 3) Robust visual features for face tracking [Hal04], [Gou04]. 4) Affine invariant detection and tracking using natural interest lines [Tra04]. 5) Direct computation of time to collision over the entire visual field using rate of change of intrinsic scale [Neg06]. 2.1.3. Thème 2 - Robust, architectures for multi-modal perception PRIMA members working on this area / Participants à ce thème de recherché: James Crowley Augustin Lux Patrick Reignier Daniela Hall Sebastien Pesnel Remi Emonet Keywords: MultiModal Perception, Robust Perceptual Components, Autonomic Computing, Process Architectures. a. Summary Machine perception is notoriously unreliable. Even in controlled laboratory conditions, programs for speech recognition or computer vision generally require supervision by highly trained engineers. Practical real-world use of machine perception requires fundamental progress in the way perceptual components are designed and implemented. A theoretical foundation for robust design can dramatically reduce the cost of implementing new services, both by reducing the cost of building components, and more importantly, by reducing the obscure, unpredictable behaviour that unreliable components can create in highly complex systems. To meet this challenge, we propose to adapt recent progress in autonomic computing to the problem of producing reliable, robust perceptual components. Autonomic computing has emerged as an effort inspired by biological systems to render computing systems robust [Hor01]. Such systems monitor their environment and internal state in order to adapt to changes in resource availability and service requirements. Monitoring can have a variety of forms and raises a spectrum of problems. PRIMA - 4
LIG Dossier de contractualisation 689 An important form of monitoring relies on a description of the system architecture in terms of software components and their interconnection. Such a model provides the basis for collecting and integrating information from components about current reliability, in order to detect and respond to failure or degradation in a component or changes in resource availability (autoconfiguration). However, automatic configuration, itself, imposes constraints on the way components are designed, as well as requirements on the design of the overall system [Gar04]. Robust software design begins with the design of components. The PRIMA project has developed an autonomic software architecture as a foundation for robust perceptual components. This architecture allows experimental design with components exhibiting: Auto-criticism: Every computational result produced by a component is accompanied by an estimate of its reliability. Auto-regulation: The component regulates its internal parameters so as to satisfy a quality requirement such as reliability, precision, rapidity, or throughput. Auto-description: The component can provide a symbolic description of its own functionality, state, and parameters. Auto-Monitoring: the component can provide a report on its internal state in the form of a set of quality metrics such as throughput and load. Auto-configuration: The component reconfigures its own modules so as to respond to changes in the operating environment or quality requirements [Pol04]. Maintenance of such autonomic properties can result in additional computing overhead within components, but can pay back important dividends in system reliability. b) Detailed Description As a general model, we propose a layered architecture for context aware services, as shown in figure 2. At the lowest level, the service's view of the world is provided by a collection of physical sensors and actuators. This corresponds to the Sensor-actuator layer. This layer, which is dependent on the technology, encapsulates the diversity of sensors and actuators that may provide similar function. It provides logical interfaces, or standard API's that are function centered and device independent. Ambient Services Situation Model Perception and Action Sensors and Actuators Figure 2 Conceptual layers for context-aware services. Services interact with humans through sensors and actuators. For performance reasons, the services may skip the situation layer and exchange information directly with perception-action or sensor and actuator layers. However, service provides semantics for interpretation. Hardwiring service behavior to sensor signals can provide only simplistic services that are hardware dependent and of limited utility. Hardware independence and generality require abstract perception and abstract task specification. Perception interprets sensor signals by recognizing and observing entities. Abstract tasks are expressed in terms of a desired result rather than a blind action. Perception and action operate in terms of environmental state while sensors and actuators operate on device-specific signals. PRIMA - 5
LIG Dossier de contractualisation 690 The PRIMA software architecture for supervised autonomic perceptual components is shown in figure 3. Components are constructed as a cyclic detection and tracking process monitored and controlled by an autonomic supervisor. Process Supervisor Detection Regions Target Detection Target Prediction Video Demon Image Detection Modules Detection Modules Detection Modules Detection Modules Detection Modules Update Targets Recognition Entities Recognition Processes Figure 3. Architecture for an autonomic perceptual component The supervisory controller provides five fundamental functions: command interpretation, execution scheduling, event handling, parameter regulation, and reflexive description. The supervisor acts as a programmable interpreter, receiving snippets of code script that determine the composition and nature of the process execution cycle and the manner in which the process reacts to events. The supervisor acts as a scheduler, invoking execution of modules in a synchronous manner. The supervisor handles event dispatching to other processes, and reacts to events from other processes. The supervisor regulates module parameters based on the execution results. Auto-critical reports from modules permit the supervisor to dynamically adapt processing. Finally, the supervisor responds to external queries with a description of the current state and capabilities. Real-time visual processing for the perceptual component is provided by tracking. Tracking conserves information about over time, thus provides object constancy. Object constancy assures that a label applied to a blob at time T 1 can be used at time T 2. Tracking enables the system focus attention, applying the appropriate detection processes only to the region of an image where a target is likely to be detected. Also the information about position and speed provided by tracking can be very important for describing situations. Tracking is classically composed of four phases: Predict, observe, detect, and update. The prediction phase updates the previously estimated attributes for a set of entities to a value predicted for a specified time. The observation phase applies the prediction to the current data to update the state of each target. The detect phase detects new targets. The update phase updates the list of targets to account for new and lost targets. The ability to execute different image processing procedures to process target information with an individual ROI is useful to simultaneously observe a variety of entities. The PRIMA perceptual component architecture adds additional phases for interpretation, auto-regulation, and communication. In the interpretation phase, the tracker executes procedures that have been downloaded to the process by a configuration tool. These are interpreted by a RAVI interpreter [lux03] and may result in the generation of events or the output to a stream. The auto-regulation phase determines the quality of service metric, such as total cycle time and adapts the list of targets as well as the target parameters to maintain a PRIMA - 6
LIG Dossier de contractualisation 691 desired quality. During the communication phase, the supervisor responds to requests from other processes. These requests may ask for descriptions of process state, or capabilities, or may provide specification of new recognition methods. Homeostasis, or "autonomic regulation of internal state" is a fundamental property for robust operation in an uncontrolled environment. A process is auto-regulated when processing is monitored and controlled so as to maintain a certain quality of service. For example, processing time and precision are two important state variables for a tracking process. These two may be traded off against each other. The component supervisor maintains homeostasis by adapting module parameters using the auto-critical reports from modules An auto-descriptive controller can provide a symbolic description of its capabilities and state. The description of the capabilities includes both the basic command set of the controller and a set of services that the controller may provide to a more abstract supervisor. Such descriptions are useful for both manual and automatic composition of federations of processes. In the context of recent National projects (RNTL ContAct) and European Projects (FAME, CAVIAR, CHIL), the PRIMA perceptual component has been demonstrated with the construction of perceptual components for 1) Tracking individuals and groups in large areas to provide services, 2) Monitoring a parking lot to assist in navigation for an autonomous vehicle. 3) Observing participants in an meeting environment to automatically orient cameras 4) Observing faces of meeting participants to estimate gaze direction and interest. 5) Observing hands of meeting participants to detect 2-D and 3D gestures. 2.1.4. Thème 3 - Context aware interactive environments. PRIMA members working on this problem / Participants à ce thème de recherché : James Crowley Patrick Reignier Dominique Vaufreydaz Jerome Maisonasse Olivier Brdczska Sophia Zaidenberg Alba Ferrer Keywords / Mots-clés : Situation Modeling, Context Aware Environments, Smart Environments a. Summary An environment is a connected volume of space. An environment is said to be "interactive" when it is capable of perceiving, acting, and communicating with its occupants. The construction of such environments offers a rich set of problems related to interpretation of sensor information, learning, machine understanding and man-machine interaction. Our goal is make progress on a theoretical foundation for cognitive or "aware" systems by using interactive environments as a source of example problems, as well as to develop new forms of man machine interaction. PRIMA - 7
LIG Dossier de contractualisation 692 An environment is said to be "perceptive" when it is capable of recognizing and describing things, people and activities within its volume. Simple forms of applications-specific perception may be constructed using a single sensor. However, to be general purpose and robust, perception must integrate information from multiple sensors and multiple modalities. The PRIMA group develops and employs machine perception techniques using acoustics, speech, computer vision and mechanical sensors. An environment is said to be "active" when it is capable of changing its internal state. Trivial forms of state change include regulating ambient temperature and illumination. Automatic presentation of information and communication constitutes a challenging new form of "action" with many applications. The use of multiple display surfaces coupled with location awareness of occupants offers the possibility of automatically adapting presentation to fit the current activity of groups. The use of activity recognition and acoustic topic spotting offers the possibility to provide relevant information without disruption. The use of steerable video projectors (with integrated visual sensing) offers the possibilities of using any surface as for presentation and interaction with information. An environment may be considered as "interactive" when it is capable responding to humans using tightly coupled perception and action. Simple forms of interaction may be based on sensing grasping and manipulation of sensor-enabled devices, or on visual sensing of fingers or objects placed into projected interaction widgets. Richer forms of interaction require perceiving and modeling of the current task of users. PRIMA explores multiple forms of interaction, including projected interaction widgets, observation of manipulation of objects, fusion of acoustic and visual information, and dynamic composition of systems that model interaction context in order to predict appropriate action by the environment. The experiments in project PRIMA are oriented towards context aware observation of human activity. Over the last few years, the group has developed a technology for describing activity in terms of a network of situations. Such networks provide scripts of activities that tell a system what actions to expect from each individual and the appropriate behavior for the system. Current technology allows us to handcraft real-time systems for a specific service. The current hard challenge is to create a technology for automatically learning and adapting situation models with minimal or no disruption of users. b. Detailed Description We have developed situation models based on the notion of a script. A theatrical script provides more than dialog for actors. A script establishes abstract characters that provide actors with a space of activity for expression of emotion. It establishes a scene within which directors can layout a stage and place characters. Situation models are based on the same principle. A script describes an activity in terms of a scene occupied by a set of actors and props. Each actor plays a role, thus defining a set of actions, including dialog, movement and emotional expressions. An audience understands the theatrical play by recognizing the roles played by characters. In a similar manner, a user service uses the situation model to understand the actions of users. However, a theatrical script is organised as a linear sequence of scenes, while human activity involves alternatives. In our approach, the situation model is not a linear sequence, but a network of possible situations, modeled as a directed graph. PRIMA - 8
LIG Dossier de contractualisation 693 Situation models are defined using roles and relations. A role is an abstract agent or object that enables an action or activity. Entities are bound to roles based on an acceptance test. This acceptance test can be seen as a form of discriminative recognition. Currently situation models are constructed by hand. Our current challenge is to provide a technology by which situation models may be adapted and extended by explicit and implicit interaction with the user. An important aspect of taking services to the real world is an ability to adapt and extend service behaviour to accommodate individual preferences and interaction styles. Our approach is to adapt and extend an explicit model of user activity. While such adaptation requires feedback from users, it must avoid or at least minimize disruption. The PRIMA group has refined its approach to context aware observation in the development of a process for real time production of a synchronized audio-visual stream based using multiple cameras, microphones and other information sources to observe meetings and lectures. This "context aware video acquisition system" is an automatic recording system that encompasses the roles of both the camera-man and the director. The system determines the target for each camera, and selects the most appropriate camera and microphone to record the current activity at each instant of time. Determining the most appropriate camera and microphone requires a model of activities of the actors, and an understanding of the video composition rules. The model of the activities of the actors is provided by a "situation model" as described above. Version 1.0 of the video acquisition system was used to record 8 three-hour lectures in Barcelona in July 2004. Since that time, successive versions of the system have been used for recording testimonial's at the FAME demo at the IST conference, at the Festival of Science in Grenoble in October 2004, and as part of the final integrated system for the national RNTL ContAct project. In addition to these public demonstrations, the system has been in frequent demand for recording local lectures and seminars. In most cases, these installations made use of a limited number of video sources, primarily switching between a lecturer, his slides and the audience based on speech activity and slide changes. Such actual use has allowed us to gradually improve system reliability. Version 2.0, released in December 2004, incorporated a number of innovations, including 3D tracking of the lecturer and detection of face orientation and pointing gestures. This version is currently used to record the InTech lecture series a the INRIA amphitheater. Discussions are underway to commercialise this technology with the creation of a start up company. 2.1.2. Thème 4: New forms of man-machine interaction based on perception PRIMA members working on this problem / Participants à ce thème de recherché: James Crowley Stan Borkowski Dominique Vaufreydaz Julien Letessier Keywords / Mots-clés : Steerable Camera Projector, Projected interaction widgets, Augmented Reality Surfaces are pervasive and play a predominant role in human perception of the environment. Augmenting surfaces with projected information provides an easy-to-use interaction modality that can easily be adopted for a variety of tasks. Projection is an ecological (non-intrusive) PRIMA - 9
LIG Dossier de contractualisation 694 way of augmenting the environment. Ordinary objects such as walls, shelves, and cups may become physical supports for virtual functionalities [Pin01]. The original functionality of the objects does not change, only its appearance. An example of object enhancement is presented in [Ber03], where users can interact with both physical and virtual ink on a projectionaugmented whiteboard. Combinations of a camera and a video projector on a steerable assembly [Bor03] are increasingly used in augmented environment systems [Pin03, Ras98] as an inexpensive means of making projected images interactive. Steerable projectors [Bor03, Pin01] provide an attractive solution overcoming the limited flexibility in creating interaction spaces of standard rigid video-projectors (e.g. by moving sub windows within the cone of projection in a small projection area [Ver02]). The PRIMA group has recently constructed a new form of interaction device based on a Steerable Camera-Projector (SCP) assembly. This device allows experiments with multiple interactive surfaces in both meeting and office environments. The SCP pair, shown in figure 4, is a device with two mechanical degrees of freedom, pan and tilt, mounted in such a way that the projected beam overlaps with the camera view. This creates a powerful actuator-sensor pair enabling observation of user actions within the camera field of view. This approach has been validated by a number of research projects as the DigitalDesk [Wel91], the Magic Table [Ber03] or the Tele-Graffiti application [Tak03]. Figure 4. Steerable camera-projector pair (left) and surfaces defined to detect touch-like gestures over a widget (right) For the user interaction, we are experimenting with interaction widgets that detect fingers dwelling over button-style UI elements, as shown to the right in figure 4. 2.2 Major Results / Résultats majeurs Recent Major results from the PRIMA group include 1) Robust real-time detection and tracking system 2) Creation of the company Blue Eye Video 3) BrandDetect: A software for monitoring broadcast video 4) ImaLab: A programming environment for real time vision systems 5) PRIMA Automatic Audio-Visual Recording System. 2.2.1 Robust Real-Time Detection and Tracking System Tracking is a basic enabling technology for observing and recognizing human actions. Project PRIMA has implemented an industrial version of its robust real time detection and tracking system under that name CAR. This system is designed for observing the actions of individuals in a commercial or public environment, and is designed to be general so as to be easily integrated into other applications. This system has been filed with the APP "Agence pour la PRIMA - 10
LIG Dossier de contractualisation 695 Protection des Programmes" and has Interdeposit Digital number of IDDN.FR.001.350009.000. R.P.2002.0000.00000. The CAR system is robust in the sense that it uses multiple, complementary detection methods are used to ensure reliable detection. Targets are detected by pixel level detection processes based on back-ground subtraction, motion patterns and color statistics. The module architecture permits additional detection modes to be integrated into the process. A process supervisor adapts the parameters of tracking so as to minimize lost targets and to maintain real time response. Individuals are tracked using a recursive estimation process. Predicted position and spatial extent are used to recalculate estimates for position and size using the first and second moments. Detection confidence is based on the detection energy. Tracking confidence is based on a confidence factor maintained for each target. The CAR system uses techniques based on statistical estimation theory and robust statistics to predict, locate and track multiple targets. The location of targets are determined by calculating the center of gravity of detected regions. The spatial extent of a targets are estimated by computing the second moment (covariance) of detected regions. A form or recursive estimator (or Kalman filter) is used to integrate information from the multiple detection modes. All targets, and all detections are labeled with a confidence factor. The confidence factor is used to control the tracking process and the selection of detection mode. 2.2.2 Creation of the company Blue Eye Video In 2003, with the assistance by INRIA Transfert and the GRAIN, the PRIMA group has founded a small enterprise, Blue Eye Video to develop commercial applications based on the CAR system. Blue Eye Video has been awarded an exclusive license for commercial application of the CAR tracker. In June 2003, Blue Eye Video was named Laureat of the national competition for the creation of enterprises. Since mid 2004, the number of system installed by Blue Eye Video has been doubling roughly every 6 months. As of September 2005, Blue Eye Video has 9 employees, over 100 systems installed and a capital reserve of roughly 400 000 Euros. 2.2.3 BrandDetect: Software for monitoring Broadcast video In the period 2002 to 2004, in the context of European IST project DETECT, PRIMA has developed a software for detection, tracking and recognition of publicity in broadcast video of sports events. The result is a commercial software system named BrandDetect. BandDetect is a system for detection, tracking and recognition of corporate logos, commercial trademarks and other publicity panels in broadcast television video streams. BrandDetect collects statistics on the frequency of occurrence, size, appearance and duration of presentation of the publicity. It is especially designed for use in the production of broadcast video of sports events such as football matches and formula one racing. The BrandDetect software can permanently monitor streaming video input from pre-recorded media (MPEG, AVI and other formats) as well as from real time video. BrandDetect looks for occurrences of a predefined set of publicity panels in a manner that is independent of size, rotation and position. Once detected, a publicity panel is tracked in order to collect statistics on duration, size, image quality, and position relative to the center of the screen. These PRIMA - 11
LIG Dossier de contractualisation 696 statistics are used to produce an objective report that may be used to establish the potential impact and commercial value of publicity. BrandDetect has been filed with the l'app (Agence pour la Protection des Programmes) the 07 Nov 03. (IDDN.FR.450046.000.S.P.2003.000.21000.). A license commercial exploitation has been negotiated with the Austrian company HSArt. 2.2.4 ImaLab - Software Environment for Robust Machine Perception The Imalab system represents a longstanding effort within the Prima team to (1) capitalize on the work of successive generations of students, (2) provide a coherent software framework for the development of new research, and (3) to supply a powerful toolbox for sophisticated applications. In its current form, Imalab serves as a development environment for researchers in the Prima team, and represents a considerable amount of effort (probably largely more than 10 manyears). There are two major elements of the Imalab system: the PrimaVision library, which is a C++ based class library for the fundamental requirements of research in computer vision; and the Ravi system, which is an extensible system kernel providing an interactive programming language shell. With respect to other well known computer vision systems, e.g. KHOROS [Ras94] the most prominent features of Imalab are: Data structures and algorithms for the implementation of new algorithms. A subset of C++ statements as interaction language. Extensibility through dynamic loading. A multi language facility including C++, Scheme, Clips, Prolog. The combination of these facilities is instrumental for achieving efficiency and generality in a large Artificial Intelligence system: efficiency is obtained through the use of C++ coding for all critical pieces of code; this code is seamlessly integrated with declarative programs that strive for generality. Imalab's system kernel is built on the Ravi system first described in Bruno Zoppis's thesis [Zop97]. The particular strength of this kernel comes from a combination of dynamic loading and automatic program generation within an interactive shell, in order to integrate new code, even new libraries, in a completely automatic way. The Imalab system has, in particular, been used for the development of the BrandDetect software described above. The Imalab system has proven to be extremely efficient tool for the development of systems such as BrandDetect that extensive performance evaluation as well as incremental design of of a complex user interface. We currently are in the process of registering of ImaLab with the APP (Agence pour la Protection des Programmes). Imalab has been distributed as shareware to several research laboratories around Europe. Imalab has been installed and is in use at: XRCE - Xerox European Research Centre, Meylan France JOANNEUM RESEARCH Forschungsgesellschaft mbh, Austria HS-ART Digital Service GmbH, Austria VIDEOCATION Fernseh-Systeme GmbH, Germany Univ. of Edinburgh, Edinburgh, UK Instituto Superior Tecnico, Lisbon, Portugal Neural Networks Research Centre, Helsinki University of Technology (HUT), Finland PRIMA - 12
LIG Dossier de contractualisation 697 Jaakko Pöyry Consulting, Helsinki, Finland Université de Liège, Belgium France Télécom R&D, Meylan France 2.2.6 PRIMA Automatic Audio-Visual Recording System The PRIMA automatic audio-visual recording system controls a battery of cameras and microphones to record and transmit the most relevant audio and video events in a meeting or lecture. The system uses a can employ both steerable and fixed cameras, as well as a variety of microphones to record synchronized audio-video streams. Steerable cameras automatically oriented and zoomed to record faces, gestures or documents. At each moment the most appropriate camera and microphone is automatically selected for recording. System behaviour is specified by a context model. This model, and the resulting system behaviour, can be easily edited using a graphical user interface. In video-conferencing mode, this system can be used to support collaborative interaction of geographically distributed groups of individuals. In this mode, the system records a streaming video, selecting the most appropriate camera and microphone to record speaking inviduals, workspaces, recorded documents, or an entire group. In meeting minute mode, the system records a audio-visual record of "who" said "what". The system is appropriate for business, academic and governmental organizations in which geographically remote groups must collaborate, or in which important meetings are to be recorded for future reference. The primary innovative features are: 1) Dynamic real time detection and tracking of individuals and workspace tools 2) Dynamic real time 3D modeling of the scene layout. 3)Dynamic recognition and modeling of human activity using stereotypical graphs of situations. This system can dramatically improve the effectiveness of group video conferencing. It can eliminate the need for human camera operators and editors for recording public meetings. The product reduces energy consumption by allowing video conferencing to serve as an effective alternative to airline travel. Market for this system is determined by 1) The number of "business meetings" between remote collaborators 2) The number of important meetings for which an audio-visual record should be maintained. Rights to this system are jointly owned by INP Grenoble, UJF and INRIA. The system is currently undergoing Deposit at APP and will shortly be available for licensing. We are investigating plans to create a start up to commercially exploit this system. PRIMA - 13
LIG Dossier de contractualisation 698 3. RESEARCH TRAINING / ACTIVITES D ENCADREMENT 3.1 Doctoral Theses and Habilitations / Thèses et HDR soutenues 1) Fabien PELISSON, "Indexation Basée sur l'apparances", Thèse doctorale de l I.N.P.G, 29 janvier 2003. 2) Christophe LEGAL, Intégration et contrôle de processus de vision répartis pour les environnements intelligents, Thèse doctorale de l I.N.P.G, 14 janvier 2003. 3) Daniela HALL, Viewpoint Independent Recognition of Objects from Local Appearance, Thèse doctorale de l I.N.P.G, 10 octobre 2001. 3.2 Doctoral Theses and Habilitations Underway / Thèses et HDR en cours 1) BORKOWSKI, Stan, "Capteurs et Actuateurs pour l'interaction", ED MSTI, INPG, Bourse EGIDE, première inscription oct. 2003 2) ANNE, Matthieu, "Intégration et Contrôle de Processus de Perception", ED MSTI, INPG, Bourse CIFRE, première inscription déc. 2003 3) GOURIER, Nicolas, ED MSTI, INPG, "Perception Visuelle Robustes des Visages", ED MSTI, INPG, Bourse INRIA, première inscription oct. 2004 4) LETESSIER, Julien,ED MSTI, INPG, "Perception pour l'interaction Homme Machine", ED MSTI, INPG, Bourse INRIA, première inscription oct. 2004 5) BRDICZSKA Oliver, ED MSTI, INPG, "Learning models of human activity", ED MSTI, INPG, Bourse INRIA, première inscription. 2004 6) MAISONASSE Jerome, "Automatic Observation of Human Activity", ED MSTI, INPG, CDD INPG-SA, première inscription janv. 2004 7) Sophia ZAIDENBERG "Learning models of human activity", ED MSTI, INPG, BDI CNRS, première inscription oct. 2005 8) Remi EMONET, Autonomic Computing Techniques for Machine Perception, ED MSTI, INPG, Bourse MENRT, première inscription oct. 2005. 9) NEGRE, Amaury, "View Invariant Vision for Navigation", ED MSTI, INPG, Bourse MENRT, première inscription oct. 2005. 10) TRAN Thi Thanh, "Reconnaissance d'objets mixte symbolique-numérique" ED MSTI, INPG, Cotutelle avec l'institut Polytechnique de Hanoi (Vietnam), Bourse Egide, première inscription oct. 2002 11) CHUNWIPHAT Suphot, "Reconnaissance de contexte par réseaux de Petri flous", ED MSTI, INPG, bourse Thailandaise, première inscription oct. 2002 12) BERTRAND Olivier, "Raisonnement Bayesien pour la reconnaissance d'objets multiéchelle", ED MSTI, INPG, Bourse Normalien, première inscription oct. septembre 2003 (interrompue en 2004/05 pour agrégation) PRIMA - 14
LIG Dossier de contractualisation 699 4. INDUSTRIAL RELATIONS / COLLABORATION ET VALORISATION 4.1 Principle Collaborations Other than Contracts / Principales relations scientifiques hors contrats All of our collaborations are covered by research contracts, in order to protect Intelleectual property. Invited Professors: P.C. Yuen, Hong Kong Baptist Univeristy, Collaboration on Perception of Human Activity. Walter Tichy, Univ. of Karlsruhe, Collaboration on Autonomic Computing Architectures Thesis in Co-direction Matthew Anne, Co-directed with G. Privat of FT R&D 4.2 Contrats institutionnels Project HARP (Financed by FT R&D) Starting 1 June 2005, Duration 18 Months Project HARP addresses the problems of automatrically acquiring models to describe human activity in domestic and office environments. 4.3 Resaerch Contracts / Contrats (co)financés par un industriel Projet Européen IST IP CHIL - Computers in the Human Interaction Loop (Janvier 2004 - Décembre 2006 with Fraunhofer Institut Karlsruhe, Universität Karlsruhe (TH), Daimler Chrysler AG, Stuttgart, ELDA, Paris, IBM Czech Republic, Resit, Athens, INPG, ITC Trento, KTH Stockholm, CNRS, TU Eindhoven, UPC Barcelona, Stanford University, Carnegie Mellon University.) The theme of project IP CHIL is to put Computers in the loop of humans interacting with humans. To achieve this goal of Computers in the Human Interaction Loop (CHIL), the computer must engage and act on perceived human needs and intrude as little as possible with only relevant information or on explicit request. The computer must also learn from its interaction with the environment and people. Finally, the computing devices must allow for a dynamically networked and self-healing hardware and software infrastructure. The CHIL consortium will build prototypical, integrated environments providing: Perceptually Aware Interfaces: Cognitive Infrastructure: Context Aware Computing Devices: Novel services: New measures of Performance PRIMA - 15
LIG Dossier de contractualisation 700 FrancoFinish ProAct project : ContAct with XRCE and Univ. of Helsinki. The aim of Project RNTL CONTACT is to explore novel approaches, based mainly on neural nets2, to the detection and manipulation of contextual information to support proactive computing applications, and to make the results available as part of a more extensive toolkit for ubiquitous and proactive computing. The project will investigate the effectiveness of neural networks as part of a "contextual middleware". In particular the project will address two levels of context manipulation: Support for developing and adapting neural network classifiers to compute context variables. Potentially, an end user will be able to specify the learning phase of the network and associate it with a specific new context attribute to be used in the application. To develop these functions a specific scenario will be used: people planning, coordinating and reflecting on their activities in an organization in an augmented smart building (equipped with large screens, wireless networks, video cameras, identification sensors, and mobile devices). The attributes will be articulated at two different levels of abstraction: sensor oriented and application/user oriented To achieve these results project CONTACT will cover four major activities: Definition of an ontology that describes context variables both at the user and at the sensor level. Definition of a platform providing formalism and an appropriate architecture to learn and combine context attributes. Definition of a library of context attributes, general enough to be reusable in support of different scenarios than the one used in the project. Validation of the contextual middleware on a pilot case. The chosen application of personal time management will help guide the development of the middleware and also to conduct an evaluation of our technology using a real-world problem. Project Contact is one of three RNTL projects that have been included in the French-Finland scientific program: ProAct. Projet Européen IST CAVIAR (IST 2001-37540) (nov. 2002 oct. 2005) (Avec U. Edinburgh et Univ. de Lisbon) The main objective of the CAVIAR is to address the scientific question: Can rich local image descriptions from foveal and other image sensors, selected by a hierarchal visual attention process and guided and processed using task, scene, function and object contextual knowledge improve image-based recognition processes? This is clearly addressing issues central to the cognitive vision approach. The two applications that the project has addressed scenarios drawn from city centre surveillance: Many large cities have nighttime crime and antisocial behaviour problems, such as drunkenness, fights, vandalism, breaking and entering shop windows, etc. Often these cities have video cameras already installed, but what is lacking is a semi-automatic analysis of the video stream. Such analysis could detect unusual events, such as patterns of running people, converging people, or stationary people, and then alert human security staff. Européen Project IST FAME (IST-2000-28323) Facilitating Agent for Multi-cultural Communication PRIMA - 16
LIG Dossier de contractualisation 701 PRIMA participated from Oct. 2001 to Fév. 2005 with Sony, Univ. Karlsruhe, IRST (Trento Italie), UPC (Barcelone, Esp.) and Laboratoire CLIPS-IMAG in project FAME. The goal of IST Project FAME has been to construct an intelligent agent to facilitate communication among people from different cultures who collaborate on solving a common problem. This agent will provides services: 1) facilitate human to human communication through multimodal interaction including vision, speech and object manipulation, 2) provide the appropriate information relevant to the context, and 3) make possible the production and manipulation of information blending both electronic and physical representations. The agent serves as an information butler to aid multicultural communication in a transparent way. The agent does not intervene in the conversation, but remains in the background to provide the appropriate support. A two week public demonstration was given at the Barcelona Cultural Fair in July 2004. Européen Project IST DETECT, IST-2001-32157 "Real Time Detection of Motion Picture Content in Live Broadcasts", (novembre 2001 to février 2004) with Joanneum Research (A), Taurus Media Technik (D), HS-ART Digital Services (A), Laboratoire GRAVIR, UMR CNRS 5527, Duvideo (P) and Videocation (D). The goal of the DETECT project has been to implement a general platform for real time detection of semantic blocks and regions within digital video streams. These detected regions are then subjectd to further analysis and processing. The project has focused on the applications problem of detecting and delimiting predefined static and dynamic objects. This issue has currently a very large demand for both cultural and economic reasons. DETECT was an industrial driven project, although its nature is R&D. The project produced a prototype system, which has since been converted into a product for monitoring commercial broadcast video to collect statistics on publicity. 4.4 Start-ups Created / Création d entreprise Blue Eye Video, Founded May 2003, 9 employee, Systems for Observing Human Activity, over 100 systems installed. 4.5 Patents and Licences / Brevets et licences Patent / Brevet A3350-US-NP, "IDENTIFYING OBJECTS TRACKED IN IMAGE USING ACTIVE DEVICE" filed on 17 décembre 2004 in the United States by XRCE. Not yet published Software / Logiciels : 1) "CAR - Robust Tracker" déposé APP avec licence d'exploitation à Blue-Eye Video 2) "BrandDetect - déposé APP avec licence d'exploitation à la société HSArt, (Autriche). PRIMA - 17
LIG Dossier de contractualisation 702 5. SELECTED PUBLICATIONS Publications in International Journals with Review Committees / Articles dans des revues avec comité de lecture internationales et nationales (ACL) As instructed by the minister of research, "major" publications are in bold, PRIMA group members are underlined, authors are listed "family name, initial". International Journals / Revues internationales : Year 2001 / Année 2001 1) Crowley J. L. and Pourraz F., "Continuity properties of the appearance manifold for mobile robot position estimation", Image and Vision Computing, Vol 19, pp 741-752, 2001. 2) Le Gal C., Martin J., Lux A., and Crowley J. L., "Smart Office: An Intelligent Interactive Environment", IEEE Intelligent Systems, 6(4), pp60-66, 2001. Year 2004 / Année 2004 3) Lux, A. The Imalab Method for Vision Systems, Machine Vision and Applications Vol 16, No. 1,: pp 21-26, 2004. 4) Hall D., Pelisson F., Riff O., and Crowley J. L., "Brand Identification Using Gaussian Derivative Histograms, Machine Vision and Applications, in Machine Vision and Applications, Vol 16, No. 1, pp 41-46, 2004. Year 2005 / Année 2005 5) Coutaz J., Crowley J. L., Dobson S., and Garlan D., "Context is Key", Communications of the ACM, March 2005. Invited Keynote Presentations / Conférences invitées (INV) International Conferences / Conférences d'audience internationale Year 2001 / Année 2001 1) Crowley J. L., "Things that see: Perception for Man Machine Interaction", Keynote Presentation, Active Media Technology, Hong Kong, 18-20 décembre 2001 Year 2002 / Année 2002 2) Crowley J. L., "Things that see: Perception for Man Machine Interaction", Keynote Presentation, UIST 2002, User Interface Science and Technology, Paris, 28 Oct 2002. Year 2003 / Année 2003 3) Crowley J. L., "Context Driven Observation of Human Activity", Keynote Presentation, EUSAI 03 European Symposium on Ambient Intelligence, Eindhoven, 3-5 November 2003. 4) Crowley J. L., "A Research Roadmap for Cognitive Vision", Dagstuhl Workshop on Cognitive Vision, October 2003. 5) Crowley J. L., "Dynamic composition of process federations for context aware perception of human activity", Keynote Presentation KIMAS '03, Boston octobre 2003. Year 2004 / Année 2004 6) Crowley J. L., "Context Driven Observation of Human Activity", Présentation invite à France-Taiwan Colloque sur la Technologie Informatique, Ecole Polytechnique, Avril 2004. National Conferences / Conférences d'audience nationale Year 2004 / Année 2004 PRIMA - 18
LIG Dossier de contractualisation 703 7) Crowley J. L., "What does it mean to see? Recent progress in Computer Vision", Plenary presentation at the Polish National Robotics Confernece, Poland, June 2004. 8) Crowley J. L., "Context Driven Observation of Human Activity", Présentated at : PSIPS Processing Sensory Information for Proactive Systems, Oulu, Finland, June 2004. Year 2005 / Année 2005 9) Crowley J. L., "Context Aware Observation of Human Activity", keynote Speaker at ASCI Symposium on Image Analysis, Boxmeer, Netherlands, June 2005. 10) Crowley J. L., "Context Aware Observation of Human Activity", Invited Presentation at A*STAR Cognitive Science Symposium, Agency for Science, Technology and Research (A*STAR) of Singapore, Sept 2005. Selected Papers at Scientific Conferences with Published Proceedings / Communications avec actes internationales et nationales (ACT) International Conferences / Conférences d'audience internationale Year 2002 / Année 2002 1) Crowley J. L., J. Coutaz, G. Rey and P. Reignier, "Perceptual Components for Context Aware Computing", UBICOMP 2002, International Conference on Ubiquitous Computing, Goteborg, Sweden, September 2002. 2) Crowley J. L., "Context Aware Observation of Human Activities", IEEE International Conference on Multimedia and Expo, ICME-2002, Lausanne, Aug 2002. Year 2003 / Année 2003 3) Crowley J. L., Riff O., "Fast Computation of Characteristic Scale using a Half Octave Pyramid", Scale Space 03, 4th International Conference on Scale-Space theories in Computer Vision, 10-12 June 2003, Isle of Skye, Scotland, UK 4) Pelisson, F. Riff O., Hall D., and Crowley J. L., "Brand Identification Using Gaussian Derivative Histograms", International Conference on Vision Systems, ICVS-03, Graz, avril 2003. 5) Lux, A. The Imalab Method for Vision Systems, ICVS 2003, International Conference on Vision Systems, LNCS 2626, p.314-322 Year 2004 / Année 2004 6) Gourier N., Hall D., Crowley J. L., "Facial Features Detection Robust to Pose, Illumination and Identity", International Conference on Systems Man and Cybernetics, The Hague, Oct 2004 7) Tran, H. and Lux A., A method for ridge extraction, ACCV 04, Asian Conference on Computer Vision, 2004 Year 2005 / Année 2005 8) Brdiczka, O. Reignier, P. Crowley, Automatic Development of an Abstract Context Model for an Intelligent Environment, International Conference on Pervasive Computing, PerCom 05,Kuai, Hawaii, March, 2005 9) Brdiczka, O. Maisonnasse J., and Reignier P.. "Automatic detection of interaction groups. In Internation Conference on Multimodal Interfaces", (ICMI 05), Trento, Italy, October 2005. National Conferences / Conférences d'audience nationale Year 2004 / Année 2004 10) Hall D. and Crowley J. L., "Detection de Visages par Charactéristiques Géneriques Calculées à Partir des Images de Luminance", RFIA, Reconnaissance des Formes et Intelligences Artificielle, Toulouse, Jan. 2004. PRIMA - 19
LIG Dossier de contractualisation 704 Contribution to Books / Ouvrages scientifiques (ou chapitres) (OS) Year 2005 / Année 2005 1) Crowley J. L., Things that See: Context-Aware Multi-modal Interaction, in Cognitive Vision Systems: Sampling the Spectrum of Approaches, H. H. Nagel and H. Christensen, Eds, Springer Verlag, To Appear 2005. 2) Crowley J. L. and Reignier P., "Designing Context Aware Services for Ambient Informatics" in True Vision in Ambient Intelligence, Ed. Emile Aarts, Springer verlag, to appear, 2005. Edition of Books of Journals / Directions d ouvrages (DO) Edition d'actes de colloques, de n spéciaux de revues Year 2004 / Année 2004 1) Crowley, J. L., Editor, Special Issue of Machine Vision and Applications, Vol 16, No. 1, Nov. 2004. Thèses et habilitations Year 2001 / Année 2001 1) Hall D., Viewpoint Independent Recognition of Objects from Local Appearance, Thèse doctorale de l I.N.P.G, 10 octobre 2001. Year 2003 / Année 2003 2) LeGal, C. "Intégration et contrôle de processus de vision répartis pour les environnements intelligents, Thèse doctorale de l I.N.P.G, 14 janvier 2003. 3) Pelisson, F., "Indexation Basée sur l'apparances", Thèse doctorale de l I.N.P.G, 29 janvier 2003. 6. PRINCIPLE SCIENTIFIC AND ADMINISTRATIVE RESPONSIBILITIES / PRINCIPALES RESPONSABILITES SCIENTIFIQUES ET ADMINISTRATIVES James L. Crowley Conference Co-President for 1) Smart Objects and Ambient Intelligence, SOC-EUSAI, Grenoble, October 2005. President of Program Committee for 2) International Conference on MultiModal Interaction, ICMI 2005, Trento, Octobre 2005 3) European Symposium on Ambient Intelligence, EUSAI 2004, Eindhoven, Octobre 2004. 4) International Conference on Vision Systems, ICVS 2003, Graz, Autriche, avril 2003. Local Responsibilities 5) Directeur du Laboratoire GRAVIR-IMAG (UMR CNRS 5527). 6) Président du Comité de Thèse INRIA Rhône-Alpes. 7) Membre du Conseil d'administration de l'ensimag (élu 2002). 8) Membre de la Commission des Contrats INPG 9) Membre du Comité de Reforme pour la Nouvelle LOLF, INPG. 10) Membre de la Commission d'habilitation de l'ensimag 11) Responsable du projet INRIA : PRIMA 12) Comité de Spécialistes, ENSIMAG, 2000-2004. 13) Conseil Scientifique de la Fédération IMAG, 2000-2004 Augustin Lux PRIMA - 20
LIG Dossier de contractualisation 705 1) Responsible for the Doctoral Formation "Image, Vision, Robotique" in the Math-Info Doctoral School 2) Responsible for 3rd year end of studies projects at ENSIMAG 7. PERSPECTIVES DE RECHERCHE 06-10 Thème 1: Robust View-Invariant Computer Vision. Natural Interest points are local extremal points in a Laplacian scale space as described by Crowley [Cro87] and Lindenberg [Lin97]. Recent papers by Lowe [Low99] and Schmid [Sch02] have led to rapid growth of interest in the use of such points as scale invariant keypoints on the part of the computer vision community. Schmid has recently extended the detection of such points to correct for affine approximations to local projective transformations using an iterative search method [Sch02]. None-the-less, interest points suffer from a serious problem in the case of elongated structures. In such a case, many local peaks occurs along a ridge in the Laplacian scale space, and the number and position of keypoints becomes highly unstable. This phenomena is very common in real world scenes. In doctoral work directed by Augustin Lux, Hai Tranh has recently revisited the problem of description of ridges in a Laplacian Scale Space. Tranh and Lux have developed simple, robust detection technique based on differential invariants using the Hessian matrix of the Laplacian. They have shown that such ridges, referred to as "natural interest lines" provide a natural complement to "natural interest points". Natural interest lines provide scale and rotational invariant skeleton to organize detection and recognition [Tra04]. In his Masters Research project this year, Amaury Negre has extended the work of Tran and Lux to provide a simple and robust algorithm for detecting and tracking natural interest lines [Neg06] across scale. The PRIMA group is building on this approach to demonstrate methods for use of natural interest lines and points for 1) Detecting and segmenting printed text under projective transformations 2) Estimating parameters for an analytic expression of optical flow in motion sequences. 3) Direct, real time calculation of "time to collision" measures over the entire visual field, 4) Real time detection and description of human posture from low-resolution surveillance images. 5) Real time-detection and description of face orientation in video sequences. Thème 2: Robust, architectures for multi-modal perception In order to exploit autonomic components, user services must be designed with autonomic capabilities. Most important is the ability to launch and configure components, and to automatically reconfigure communications between components. This ability makes it possible to efficiently incorporate redundant components in the system, with the ability to respond to failure or degradation by shutting down the failed component and launching an alternate. Thus the user service must incorporate a form of monitor in order to assure detection and response to degradation. Such a configuration monitoring agent may constructed using agent technologies currently used to provide services. PRIMA - 21
LIG Dossier de contractualisation 706 It is conceivable to program a configuration monitoring agent with hard wired procedures for reconfiguration. However, a proper, robust, design can be obtained using an ontological model. Modern tools for software ontologies can be used as a component registry to record index available sensors and components according to their capabilities. Such a tool can also provide a registry for the data types and events consumed and produced by components. An important current effort concerns dynamic service composition. Dynamic service composition will add flexibility and robustness to services. Dynamic service composition enables a system to create new services based on their semantic description by discovering appropriate services and combining them [Paa03]. Service composition should be semanticsbased so that a service is requested and composed not by its syntax but by its semantics. In order to enable semantics-based dynamic service composition, both the modeling of components as well as the service composition mechanism must support semantics. Auto-criticism will be addressed by applying statistical learning techniques to learn a likelihood function for output reliability. Such a function would provide a numerical measure of the likelihood that the current component output is reliable. Auto-regulation can be provided by a separate monitoring process that detects when output of a component has become unreliable, and reprocesses the most recent data with variations of processing parameters and proposing the parameter set that produces the most reliable result. Rather than exhaustively explore the parameter space, such a monitoring process can be designed to provide a small incremental correction for each data record, thus gradually hill climbing to a local maximum. Auto-description requires that the component be programmed to respond with a description of its capabilities. Auto-monitoring requires that the component continually estimate a set of internal performance metrics that characterize its internal state as well as observe the availability of resources Auto-configuration within a perceptual component requires that the component be built with a possibility to vary its service levels or to enable and disable modules. We propose to equip the component with a scheduling mechanism driven by an explicit list of modules and parameters. Configuration will involve modifications to the execution schedule and service levels. For most human activities, there are a potentially infinite number of entities that could be detected and an infinite number of possible relations for any set of entities. The appropriate entities and relations must be determined with respect to a task or service to be provided. This is the role of the situation model. Situation Models allow focusing attention and computing resources to determine the information required to provide services. For a given state of a service, the situation model acts as a filter for events and streams from perceptual components. In certain well defined cases, arrival of event or interpretation of a stream can result in an event being sent to the service components. In this sense, the situation model acts as a bottom-up filter for events from perceptual components. Services specify a context. The context determines the appropriate entities, roles and relations top observe with perceptual components. Information flow is inverted when a service changes state. In reaction to the user command, or to a change in situation, the service may send events to the situation model forcing a change in the situation graph, and possibly forcing a change in the configuration of perceptual components. In this case, the situation model acts in a topdown manner, elaborating and expanding the service event into commands to the situation model as well as to the perceptual component. PRIMA - 22
LIG Dossier de contractualisation 707 Thème 3: Context aware interactive environments. Our current methods use hand crafted networks of situations to represent the situation model. An important challenge for the coming years is create methods to automatically acquire and develop situation models, as well as to create methods for such models to adapt in order to provide robust invariant services under variations in operating conditions. Adaptation allows a system to maintain consistent behaviour across variations in operating environments. The environment denotes the physical world (e.g., in the street, lighting conditions), the user (identification, location, goals and activities), social settings, and computational, communicational and interactive resources. Development refers to the automatic acquisition of situation and context, and ultimately the acquisition of the entities, roles, relations from which situations and contexts emerge. Adaptation and development are fundamental to providing useful and usable services to a variety of users in the presence of large variations in resources and activities. Context is too complex to be pre-programmed as a fixed set of stable variables: worse, the contract itself, which defines correct behaviour, is not always precisely specifiable in advance. Thus, the context model, contract and adaptation process must develop through observation and interaction with the environment. At the same time, context models must not be disruptive. This creates a dilemma: how can context models evolve and develop without introducing disruption? The challenge is to find the appropriate balance between implicit and explicit interaction for providing the feedback required for development. We must determine the appropriate degree of autonomy, and this problem can impact every level of abstraction. One attractive solution to these constraints is to observe that the correct selection of elements including adaptation strategies, interfaces, devices, information, etc. can only be made in the infrastructure. This suggests replacing explicitly-coded responses to situations and contexts which can only accommodate a fixed range of predicated compositions with a higher-level, more knowledge-intensive, use of machine-readable strategies coupled with reasoning and learning. This goes beyond the usual distinction of closed- versus open-adaptive systems. Current learning technologies require large sets of training data something that is difficult to obtain for an extensible environment. Non-disruptive development of context models will require new ways of looking at learning, and may ultimately require a new class of minimallysupervised learning algorithms that will need to be studied explicitly as part of semiautonomous systems. Context is key in the development of new services that will impact social inclusion for the emerging information society. For this to come true, we need to find the right balance between contradictory features. If context is redefined continually and ubiquitously, then how users can form an accurate model of a constantly evolving digital world? If system adaptation is negotiated, then how to avoid disruption in human activities? We believe that clear architecture and a well-founded, explicit relationship between environment and adaptation are the critical factors the key that will unlock context-aware computing at a global scale. Thème 4: New forms of man-machine interaction based on perception In the coming years, PRIMA will continue parallel exploration of two approachs to complementary paradigms for man machine interaction: PRIMA - 23
LIG Dossier de contractualisation 708 1) The machine as tool, and 2) The machine as collaborator. The machine as tool. The SCP (steerable camera projector pair described above), is an example of the "machine as tool" paradigm. With this approach, perception of human action and projection of information are coupled to augment ordinary objects with information technology. We will continue development of this approach, using steerable projectors and dynamic service configuration augment the human environment, and to use any surface as a focus for man-machine interaction. Notably, we are currently completing a tool-kit for visual interaction. This tool kit is designed to provide an easy to use foundation for experimenting with all forms of projective widgets based on observing human activity. The machine as collaborator. A new challenge for PRIMA involves developing the foundations for a technology for social interaction. Current information technology is autistic. The creation of a technology for artificial systems that are socially aware constitutes a grand challenge for humanity. Meeting this challenge will provide a new class of technology with profound social and economic impact. The emergence of socially aware systems can potentially mark a rupture point in the evolution of informatics and the start in an exponential growth in new applications across a broad spectrum. We believe that social interaction between man and machine requires two core abilities. 1) The ability for machine to learn to sense and learn to evoke human emotions, and 2) The ability for machines to adopt social roles that correspond to the implicit expectations for users. In the area of sensing and evoking emotions, we propose to collaborate with colleagues in the CHIL project who have developed artificial agent technologies and speech understanding to develop a technology for abilities to perceive and evoke interest and pleasure in human partners. Perception of the affect of a human partner can provide feedback for learning, both to refine the ability to evoke attention and pleasure, and to develop other cognitive abilities. We propose to use these abilities exploring scripting behaviours for virtual agents in the form of roles. For this work we propose to extend the situation models and roles that we have developed in the FAME and CHIL projects. Note that we are not proposing, in PRIMA, to develop virtual agents or speech recognition. We will use virtual agent took kits developed by our partners IRST and KTH to provide a graphical display technology. We propose to adapt acoustic perception and speech recognition technologies developed by our partners IBM, UKA, KTH, and Univ of Erlangen to complete our work with computer vision. Support Requested by PRIMA Thanks to the combination of sustained success with national and European project proposals, and highly visible profile in the European and international scientific community, the PRIMA offers a particularly fertile environment for young researchers to launch their scientific careers. Unfortunately, over the last 8 years a variety of circumstances have prevented PRIMA from recruiting and integrating young researchers or academic staff. Numerous very high quality French and European candidates seeking to join PRIMA have failed to obtain an academic post, primarily because of a lack of support from the local PRIMA - 24
LIG Dossier de contractualisation 709 scientific hierarchy in selecting targeted research groups and defining scientific profiles for recruiting. The reorganization of research structure associated with the new four-year contract, coupled with the recent reorganization of the CNRS, offers the opportunity to rectify this. We would like to request that the CNRS recognize the sustained academic excellence of the PRIMA group with publication of a post for a young research (CR 2) in the area of perception for multi-modal interaction. We know of several potential candidates from France, Europe and North America who could be attracted by such a position if it offered the possibility to join PRIMA. We also request that the University Joseph Fourier create a position of Professor 2ieme Class, with a research profile of "Context Aware Multi-Modal Interaction", with recruiting directed to the LIG laboratory. We know of several candidates who would apply for such a post. PRIMA - 25
LIG Dossier de contractualisation 710 Annexe 1 - liste détaillée des membres de l'équip / Composition of the Group (1 Oct 2005) NOM Prénom Section Etab. Grade %Recherche Quotité Enseignants-chercheurs CROWLEY, James 27 INPG PR 50% 1 LUX, Augustin 27 INPG PR 50% 1 REIGNIER, Patrick 27 UJF MCF 50% 1 VAUFREYDAZ, Dominique 27 UPMF MCF 50% 1 Personnels sous contrat HALL, Daniela INPG-SA Post-doc 100% 1 FERRER, Alba INRIA Ingénieur 100% 1 VALLET, Jean-Marie UJF Ingénieur 100% 1 Visiting Professors P. C. Yuen U. Hong Kong 100% 0.5 W. Tichy Univ Karlsruhe 0.25 LISTE DES DOCTORANTS AU 01/10/2005 NOM Prénom Etablissement Financement Partenaire Stanislas Borkowski INPG EGIDE Suphot Chunwiphat INPG Bourse gvt étranger Thailande Matthieu Anne INPG CIFRE FT R&D Thi-Thanh-Hai Tran INPG EGIDE Viet Nam Olivier Bertrand INPG Allocataire Normallien Nicolas Gourier INPG Contrat équipe Europe Julien Letessier INPG Contrat équipe Europe Jerome Maisonnasse INPG Contrat équipe FT R&D Oliver Brdiczka INPG Contrat équipe Europe Sophia Zaidenberg INPG BDI CNRS cofinancée INRIA Remi Emonet INPG Allocataire MENRT PRIMA - 26
Total 3 dont thèses avec publications ou brevets avant le 01/10/2005 [Cap04] A. Caporossi, D. Hall, P. Reignier and J. L. Crowley, Visual Tracking in Video Sequences, in Pointing 04: International Workshop on Deictic Gestures, Cambridge, UK, Aug 2004. [Cap04b] A. Caporossi, D. Hall, P. Reignier and J. L. Crowley, Robust Visual Tracking from Dynamic Control of Processing, PETS04, Workshop on Performance Evaluation for tracking and Surveillance, ECCV04, Prague, May 2004. [Cho00] O. Chomat, V. Colin de Verdiere, D. Hall and J. L. Crowley, "Local Scale Selection for Gaussian Based Description Techniques", ECCV 2000, 6th European Conference on Computer Vision, Springer Verlag, Dublin, June 2000. [Hal 04b] D. Hall and J. L. Crowley, "Detection de Visages par Charactéristiques Géneriques Calculées à Partir des Images de Luminance", RFIA, Reconnaissance des Formes et Intelligences Artificielle, Toulouse, Jan. 2004. [Hal00] D. Hall and J. L. Crowley, Tracking Fingers and Hands with a Rigid Contour Model in an Augmented Reality, MANSE 99: Managing Interactions in Smart Environments, ISBN 1-85233-228-X, Springer Verlag, pp 34-43, Dublin Dec 1999. [Hal00] D. Hall, J. L. Crowley and V. Colin de Verdière, "View Invariant Object Recognition using Coloured Receptive Fields", Machine Graphics and Vision, Vol 9. pp 341-352, No. 2. June 2000. 3 Annexe 2 Liste des thèses soutenues (du 01/10/2001 au 01/10/2005) Nom Prénom Directeur(s) de thèse PELISSON, Fabien J. L. Crowley Date de soutenanc e (1) Référence des publications et des brevets (2) 01/2003. (BrandDetect) [Hal03c], [Hal04] Mode de financement (3) Bourse Région LEGAL, Christophe A. Lux 01/2003 [Leg00],[Leg01] Bourse MENRT HALL, Daniela J. L. 10/2001 [Cap04], [Cap04b], Bourse Crowley [Cho00], [Hal 04b], MENRT [Hal00], [Hal00], [Hal00], [Hal00b], [Hal00d], [Hal01a], [Hal03], [Hal03c], [Hal04], [Hal05a], [Hal05b], [Hal05c], [Hal99b] Etablissemen t d'inscription ED de rattachement (4) DEA ou master d'origine (5) Situation professionnell e (6) INPG MathInfo IVR Technical Director Blue Eye VIdeo INPG MathInfo IVR MdC, U. Brest INPG MathInfo IVR PostDoc INRIA LIG Dossier de contractualisation 711 PRIMA - 27
LIG Dossier de contractualisation 712 [Hal00] D. Hall, V. Colin de Verdiere and J. L. Crowley, "Object Recognition using Coloured Receptive Field", ECCV 2000, 6 th European Conference on Computer Vision, Springer Verlag, Dublin, June 2000. [Hal00b] K. Schwerdt, D. Hall, J. L. Crowley, "Visual Recognition of Emotional States", The Third International Conference on Multimodal Interfaces, ICMI-2000, Beijing China, 14-16 October 2000. [Hal00d] D. Hall, J. L. Crowley, "Estimating the Pose of Phicons for Human Computer Interaction", The Third International Conference on Multimodal Interfaces, ICMI-2000, Beijing China, 14-16 October 2000. [Hal01a] D. Hall, C. Le Gal, J. Martin, O. Chomat and J. L. Crowley, "MagicBoard: A contribution to an Intelligent Office Environment", Robotics and Autonomous Systems, Vol 35 No. 3-4, June 2001. [Hal03] D. Hall and J. L. Crowley, "Computation of generic features for object classification", Scale Space 03, 4th International Conference on Scale-Space theories in Computer Vision, 10-12 June 2003, Isle of Skye, Scotland, UK [Hal03c] F. Pelisson, O. Riff, D. Hall et J. L. Crowley, "Brand Identification Using Gaussian Derivative Histograms", International [Hal04] Conference on Vision Systems, ICVS-03, Graz, avril 2003. D. Hall, F. Pelisson, O. Riff, J. L. Crowley, "Brand Identification Using Gaussian Derivative Histograms, Machine Vision and Applications, in Machine Vision and Applications, Vol 16, No. 1, pp 41-46, 2004. [Hal05a] D. Hall. Automatic parameter regulation for a tracking system with an autocritical function. International Workshop on Computer Architecture for Machine Perception, pages 39 45, Palermo, Italy, July 2005. [Hal05b] D. Hall. A system for object class detection. In H.-H. Nagel and H.I. Christensen, editors, Cognitive Vision Systems, chapter Recognition Systems. Springer Verlag, Heidelberg, 2005. to appear. [Hal05c] D.Hall, J. Nascimento, P. Ribeiro, E. Andrade, P. Moreno, S. Pesnel, T. List, R. Emonet, R.B. Fisher, J. Santos Victor, and J.L. Crowley. Comparison of target detection algorithms using adaptive background models. In International Workshop on Performance Evaluation of Tracking and Surveillance, 2005. to appear. [Hal99b] D. Hall, C. Le Gal, J. Martin, O. Chomat, T. Kapuscinski, and J. L. Crowley, "MagicBoard: A Contribution to an Intelligent Office Environment", Symposium on Intelligent Robotics Systems, SIRS-99, Coimbra, July 1999. [Leg00] [Leg01] C. Le Gal, A E Ozcan, K Schwerdt, J L. Crowley, "A Sound MagicBoard", The Third International Conference on Multimodal Interfaces, ICMI-2000, Beijing China, 14-16 October 2000. C. Le Gal, J. Martin, A. Lux, J. L. Crowley, "Smart Office: An Intelligent Interactive Environment", IEEE Intelligent Systems, July/August 2001. PRIMA - 28
LIG Dossier de contractualisation 713 Annexe 3 : indicateurs de production scientifique 2001 2002 2003 2004 2005 Livres - d audience nationale - d audience internationale Chapitres d ouvrages - d audience nationale - d audience internationale 2 Editions d ouvrage - d audience nationale - d audience internationale 1 Revues avec comité de lecture - d audience nationale - d audience internationale 2 3 1 Conférences avec actes publiés et comité de lecture - d audience nationale - d audience internationale 2 4 Conférences invitées - d audience nationale - d audience internationale 1 1 3 Thèses soutenues 2 1 Habilitations soutenues 1 9 2 14 2 1 1 8 2 1 PRIMA - 29
LIG Dossier de contractualisation 714 Annexe 4 : Contrats institutionnels Contexte Catégorie CHIL 6ème PCRD IP CAVIAR Project 5ieme PCRD FAME Project 5ieme PCRD DETECT Project 5ieme PCRD ECVision Them Network 5ieme PCRD FGNet Them Network 5ieme PCRD GLOSS 5ieme PCRD Visor Base 5ieme PCRD ContAct RNTL/Proact ACTIV (I & II) Régiob Thème Coordonnateur Nb de partenaires Multi Modal Interaction Context Aware Vision Multi Modal Interaction Context Aware Vision Context Aware Vision Multi Modal Interaction Fraunhofer (Karlsruhe) Collaborations spécifiques 1 16 Univ. Karlsruhe IBM, IRST, UPC Univ. Edinburgh 3 Univ. Edinburgh, IST Lisbon Univ. Karlsruhe 9 Univ Karlsruhe, UPC, Sony Joanneum Research 6 Joanneum Research, HSArt Début Fin 01/04 12/06 11/03 10/05 11/01 02/05 11/02 10/04 Captec 13 KTH, CapTec 03/02 02/05 Univ Manchester 6 Univ. Manchester 10/02 03/05 Disapp. Computer Unv Strathclyde 6 Univ Strathclyde 01/01 12/03 Real Time Visual Tools 5 Visual Tools 11/99 Systems 10/02 Context XRCE 2 XRCE 12/03 Aware 11/05 Systems Indexation Univ. St. Etienne 8 Univ. St. Etienne 10/99 10/04 Soutien financier propre 380 000 444 357 526 000 296 500 55 000 124 487 52 000 350 541 155 000 60 000 Contrats à financement industriel Partenaire(s) Thème Début Fin FT R&D Context Aware Vision 06/05 Projet HARP 12/06 Montant 128 000 1 Noms des partenaires avec lesquels sont menées plus particulièrement des recherches conjointes PRIMA - 30
LIG Dossier de contractualisation 715 Annexe 5 Liste des brevets prioritaires français ou européens (OEB : office européen des brevets) Type de dépôt (INPI, OEB) (1) N de dépôt Date de dépôt Titre du brevet US-NB A3350 17/12/2004 IDENTIFYING OBJECTS TRACKED IN IMAGE USING ACTIVE DEVICE N de publication Not yet published Date de publicatio n Not yet published Déposants ( XRCE Logiciels Date de dépôt Déposants Co-déposants (2) Finalité du logiciel (2) Oct 2002 INRIA INPG Robust Tracking Nov 2003 INRIA INPG Monitoring Broadcast video Prepared by James L. Crowley with contributions from the PRIMA group. Time spent preparing this document - 18 man-hours. PRIMA - 31
LIG Dossier de contractualisation 716 Bibliography [Ber03] F. Bérard, The Magic Table: Computer-Vision Based Augmentation of a Whiteboard for Creative Meetings, Proc. ICCV Workshop on Projector-Camera Systems, 2003. [Bor03] S. Borkowski, O. Riff, and J.L. Crowley, Projecting Rectified Images in an Augmented Environment, Proc. ICCV Workshop on Projector-Camera Systems, 2003. [Bor04] S. Borkowski, J. Letissier, and J.L. Crowley, Spatial Control of Interactive Surfaces in an Augmented Environment, Proc. European Conference on Human Computer Interaction, July 2004. [Cad94] C. Cadoz, Les Réalités Virtuelles, Dominos, Flammarion, 1994. [Col98] V. Colin de Verdiere and J. L. Crowley, "Visual Recognition Using Local Appearance", ECCV '98, Frieburg, juin, 1998. [Cro03] J. L. Crowley, O. Riff, "Fast Computation of Characteristic Scale using a Half Octave Pyramid", Scale Space 03, 4th International Conference on Scale-Space theories in Computer Vision, 10-12 June 2003, Isle of Skye, Scotland, UK [Cro81] J. L. Crowley, "A Representation for Visual Information", Doctoral Dissertation, Carnegie-Mellon University, 1981. [Cro87] J. L. Crowley et A. Sanderson, "Multiple Resolution Representation and Probabilistic Matching of 2-D Gray-Scale Shape", IEEE Transactions on PAMI, PAMI 9(1), Janvier 1987. [Fre91] W.T. Freeman, E.H. Adelson, "The Design and Use of Steerable Filters", Transactions on Pattern Analysis and Machine Intelligence, (PAMI), Vol 13, No. 9, pp 891-906, September 1991. [Gar04] D. Garlan et al, Rainbow: Architecture-Based Self-Adaptation with Reusable Infrastructure, IEEE Computer, Oct. 2004, 46-54. [Gou04] N. Gourier, D. Hall, J. L. Crowley, Facial Features Detection Robust to Pose, Illumination and Identity, International Conference on Systems Man and Cybernetics, The Hague, Oct 2004 [Hal00] D. Hall, V. Colin de Verdiere and J. L. Crowley, "Object Recognition using Coloured Receptive Field", 6 th European Conference on Computer Vision, Springer Verlag, Dublin, pp 164-178, June 2000. [Hal04] D. Hall and J.L. Crowley. "Détection du visage par caractéristiques génériques calculées à partir des images de luminance", RFIA 04, Reconnaissance des formes et intelligence artificelle Toulouse, Jan. 2004 [Hor01] P. Horn, Autonomic Computing: IBM's Perspective on the State of Information Technology, http://researchweb.watson.ibm.com/autonomic, October 15, 2001. [Koe87] J. J. Koenderink and A. J. van Doorn, "Representation of local geometry in the visual system", Biological Cybernetics, 55:367-375, 1987. [Lin98] T. Lindeberg, "Feature detection with automatic scale selection", International Journal of Computer Vision, IJCV 30(2):77-116, 1998. [Low99] D. G. Lowe, "Object Recognition from local scale-invariant features", in 1999 International Conference on Computer Vision (ICCV-99), Corfu Greece, pp 1150-1157, Sept. 1999. [Lux03] A. Lux, "The Imalab Method for Vision Systems", International Conference on Vision Systems, ICVS-03, Graz, april 2003. [McK 98] S. McKenna and S. Gong, Real-Time Face Pose Estimation, Int. Journal on Real Time Imaging, Special Issue on Real-Time Visual Monitoring and Inspection, vol. 4, pp. 333-347, 1998. [Neg06] Negre, A. Braillon C, and Crowley J. L., "Visual navigation from variation of intrinsic scale" Submitted to ICRA 2006. [Paa03] A. Paar and W. F. Tichy: Semantic Software Engineering Approaches for Automatic Service Lookup and Integration. In Proc. 5 th Workshop on Active Middleware Services: 103-111, Seattle, June 2003. [Pel03] Pelisson, F., "Indexation Basée sur l'apparances", Thèse doctorale de l I.N.P.G, 29 janvier 2003. [Pel04] D. Hall, F. Pelisson, O. Riff, J. L. Crowley, "Brand Identification Using Gaussian Derivative Histograms, in Machine Vision and Applications, Vol 16, No. 1, pp 41-46, 2004. [Pic 97] R.W. Picard, Affective Computing, MIT Press, Cambridge, MA, 1997. [Pin01] C. Pinhanez, The Everywhere Displays Projector: A Device to Create Ubiquitous Graphical Interfaces, Proc. Ubiquitous Computing, Sep. 2001. PRIMA - 32
LIG Dossier de contractualisation 717 [Pin03] G. Pingali, C. Pinhanez, A. Levas, R. Kjeldsen, et al., Steerable Interfaces for Pervasive Computing Spaces, Proc. IEEE PerCom, Mar. 2003. [Pol04] Poladian v., et al, Dynamic Configuration of Resource-Aware Services, Proc. 24 th International Conference on Software Engineering, IEEE, May 2004, 604-613. Ras94] J. Rasure and S. Kubica, "The Khoros Application Development Environment, in Experimental Environments for Computer Vision and Image Processing, eds H.I. Christensen and J.L. Crowley, World Scientific Press, pp1-32, 1994. [Ras98] R. Raskar, G. Welch, M. Cutts, A. Lake, et al., The Office of the Future: A Unified Approach to Image-Based Modeling and Spatially Immersive Displays, Proc. ACM SIGGRAPH, 1998. [Roe 03] H. Roeber, J. Bacus, and C. Tomasi, Typing in Thin Air: The Canesta Projection Keyboard - A New Method of Interaction with Electronic Devices, Proc. CHI, pp. 712 713, 2003. [Sch00] B. Schiele and J. L. Crowley, "Recognition without Correspondence using Multidimensional Receptive Field Histograms", International Journal of Computer Vision, 36(1), pp 31-50, Jan. 2000 [Sch02] Mikolkajczyk, K. and Schmid C. "An Affine invariant interest point detector", ECCV 02, European Conference on Computer Vision, Copenhague, June 2002. [Sch05] A sparse texture representation using local affine regions, S. Lazebnik, C. Schmid and J. Ponce. IEEE Transactions on Pattern Analysis and Machine Intelligence, To appear in 2005. [Sch97] C. Schmid and R. Mohr. "Local greyvalue invariants for image retrieval", IEEE [Tak 03] Transactions on PAMI, PAMI Vol 19, No. 5, pages 530-534, 1997. N. Takao, J. Shi, and S. Baker, Tele-Graffiti: A Camera-Projector Based Remote Sketching System with Hand-Based User Interface and Automatic Session Summarization, Int. Journal of Computer Vision, vol. 53, pp. 115-133, July 2003. [Tra04] Tran Thi T. H, Lux A, Nguyen, H., L. and Boucher A. "A method for ridge extraction", ACCV 04, Asian Conference on Computer Vision, pages 960 966, Jeju, Korea, Feb 2004. [Vern02] [Wel91] [Zop97] F. Vernier, N. Lesh, and C. Shen, Visualization Techniques for Circular Tabletop Interfaces, Advanced Visual Interfaces, 2002. P. Wellner, The Digitaldesk Calculator: Tactile Manipulation on a Desktop Display, Proc. ACM Symposium on User Interface Software and Technology, pp. 27 33, 1991. Zoppis, B., "Outils pour l'intégration et le Contrôle en Vision et Robotique Mobile", Thèse Doctorale de l I.N.P.G., 22 juin 1997 PRIMA - 33
LIG Dossier de contractualisation 718