Quarterly Progress Scientific Report

Size: px
Start display at page:

Download "Quarterly Progress Scientific Report"


1 Research Program in Digital Art Technologies - QPSR Vol. III - No. 3 Quarterly Progress Scientific Report Vol. 3, No. 3, September 2010 T. Dutoit, B. Macq (Editors) i

2 Published online by: Université de Mons (UMons) Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS) Université Catholique de Louvain (UCLouvain) Laboratoire de Télécommunications et Télédétection (TELE) ISSN: (printed) X (online) Credits: Editors: Thierry Dutoit (UMons/TCTS), Benoît Macq (UCL/TELE) Cover photo: CC from flickr L A TEX editor: Christian Frisson (UMons/TCTS), using L A TEX s confproc class (by V. Verfaille) All copyrights remain with the authors. numediart homepage: Contact:

3 Preface numediart is a long-term research program centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N ). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists. numediart is organized around three major R&D themes: HyFORGE - Hypermedia Navigation: Information indexing and retrieval rely classically on constrained languages to automatically describe contents and allow formulating queries, respectively. This approach becomes hardly applicable for multimedia contents such as music or video because of the disparity between computable low-level descriptors and desired high-level semantics - the so-called semantic gap. Alternatively, HyFORGE investigates human-in-the-loop approaches and innovative tools for structuring and searching multimedia contents. Along with audio and image processing, HyFORGE builds up on self-organizing models to derive enhanced views of multimedia collections and provide users with efficient browsing interfaces. COMEDIA - Body & Media: COMEDIA is named from a French contraction between body and media or stage director and media, which nicely sums up the main objective of this axis: giving to bodies the means to be their own artistic director! Hence based on position on stage or choreography between multiple artists for the inter-relationship and gestures or voice for the intra-relationship, CO- MEDIA aims at creating interactivity between performing artists and the multimedia context around. Event description, low-level feature analysis, pattern recognition, heterogeneous sensor fusion, robustness against lighting and real-time are our keywords in 1D, 2D and 3D signal processing to reach these goals. COPI - Digital Instruments Design: COPI aims at developing a software/hardware toolbox for creating innovative digital musical instruments, from scratch or by augmenting existing instruments with new interactive channels. The main challenges for this R&D axis are to produce expressive instruments which maintain a close, embodied relationship with the musician. Our approach is to produce new sound design architectures using a large database of pre-recorded signals while maintaining real-time control of the design process. Our scientific work therefore implies three main axes: the development of expressive production models (audio signal processing), followed by the design of gestural control systems for their synthesis parameters, coupled with statistical modeling of this dynamic control. numediart is the result of collaboration between Polytech Mons (Information Technology R&D pole) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the MULTITEL research center on multimedia and telecommunications. As such, it is the R&D component of MONS 2015, a broader effort towards making Mons the cultural capital of Europe in iii

4 On Tuesday, July 6th PM, the newly created numediart Institute for New Media Art Technology was inaugurated at University of Mons (UMONS), as an extension of this research program funded by Région Wallonne from 1997 to 2012 (grant ). Its mission is to organize scientific training and research at the international level in the area of new media art technology, by capitalizing on the dynamics of MONS2015 (Mons, European capital of culture in 2015). Its investigations cover audio, image, video, gesture, and biosignal processing, for applications in which human-machine interaction aims at creating emotion. They are performed in the framework of the numediart Program and coordinated by the numediart Consortium. The numediart Consortium was inaugurated on July 7th, It extends the previous board and is now composed of 15 organizations coming from the world of research, arts, entertainment, and industry. The Consortium meets every three months and ensures optimal adequacy between the research projects carried out by the Institute and the regional needs so as to foster scientific, economic and cultural development. iv

5 This tenth session of numediart projects was held from July to September One project, CoMediAnnotate, started its first month at the enterface 10 Summer Workshop on Multimodal Interfaces, at the University of Amsterdam, Netherlands, from July 12 to August 6. The session ended with a public presentation of the results (with demonstrations) in the newly inaugurated numediart Room, on Monday, October 4th, v

6 vi

7 Projects Session #11 (Jul-Sep 2010) 37 Project #11.1: Developing Social Controllers John Anderson Mills III, Loïc Rebousière, Ricardo Chessini Bose 45 Project #11.2: CoMediAnnotate: towards more usable multimedia content annotation by adapting the user interface Christian Frisson, Sema Alaçam, Emirhan Coşkun, Dominik Ertl, Ceren Kayalar, Lionel Lawson, Florian Lingenfelser, Johannes Wagner vii

8 viii

9 DEVELOPING SOCIAL CONTROLLERS John Anderson Mills III 1, Loïc Rebousière 1, Ricardo Chessini Bose 2 1 Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS), Université de Mons (UMons), Belgique 2 Service Electronique et Microèlectronique (SEMI), Université de Mons (UMons), Belgique ABSTRACT Many forms of technology have been oriented at connecting people together. We were interested in how cheap, handheld, electronic devices might be used to promote social interactions, thereby creating social controllers. Many technological, artistic, and interface challenges are presented by this goal, though other projects point out some of the solutions. Through brainstorming, we decided to use electronic music and sound creation as the medium for creating interaction. We also decided on a fixed set of sensors (2 buttons, one slider, and a light detector) to control the sound synthesizing algorithms. We used Max/MSP, Processing with Minim, the Mini-Simius Card, and analog circuitry to create prototypes. A group of testers evaluated the prototypes for interest and enjoyment. After deciding to use a dspic microcontroller, we also decided that the devices would contain five highest-rated prototypes, since the nature of the device could be changed by simply changing the sound module. At all points, enjoyment was the main goal, and other considerations were secondary. We fabricated six social controllers, using a computer mouse as the form factor, and replacing the internal components with our own PCB and components. The final social controller, called the Lara, is a responsive, enjoyable, multimodal sound device which promotes social interaction through LEDs and light sensors with surprising and entertaining results. KEYWORDS microcontrollers, do-it-yourself, sound, toys, instrument design, sensors, electronic music 1. INTRODUCTION Technology in its many forms has often been used to create connections between people, for example, telephones, radio, television, and the internet. Given our current state of technology, the question arises as to what small, cheap, technological devices can be created which promote social interaction. Our team proposed to create these social controllers. At the beginning of this project, the exact nature of the social interactions was unknown, but it was a goal to make them be enjoyable. The determination of these interactions was part of the intended research, and this type of interaction design is part of the interests of numediart. Technology has certainly become more and more of a social driving force, and these devices are aimed to help it also be a social connecting force Motivation There are many interesting challenges presented by this project: Can decisions about the mapping of sensors to feedback components promote social ineraction? Which prototyping environments can provide a means for testing these types of interactions? Can an enjoyable social controller be designed cheaply enough to be given away? Given the limited number of sensors in a low cost device, can interesting and meaningful interactions be provided with limited control? 1.2. Background This style of do-it-yourself electronic device is a currently emerging technology art as can be seen in many of the following references. There have been several projects in this realm which, when taken as a group, demonstrate the feasibility of this poject: Perich[8] created cheap 1-bit music players. Weinberg[12] created group musical experiences using similar devices. Feldmeier[4] created hand-out motion sensors for use in group dance gatherings. BleepLabs created a product called the Thingamagoop[6] (now version 2) which is similar in concept, though more expensive. The CrackleBox[10] created at STEIM Amsterdam is a device which creates complex and responsive sound from simple sensors. Collins s Handmade Electronic Music[2] details many ways in which human interaction can be built into music and sound making devices. 2. SOCIAL CONTROLLERS The development of the social controllers took place in several stages. The first stage was research and brainstorming about how cheap electronic devices could promote social interaction. Following the brainstorming sessions, we made several design decisions which were in line with the overall goals of the project. Given those decisions, we then moved forward by creating prototypes in several prototyping environments. A testing session then helped to determine which of the eighteen prototypes was worth pursuing. Several more design decisions and refinements were made before the final fabrication of the devices. The details of the stages are described in the following sections Brainstorming and Design Goals After some initial research, several brainstorming meetings, which included people outside of our team and institution, were held concerning the device design. As is true with many brainstorming sessions, the topics of the controllers were discussed in a somewhat random and intermixed fashion. 37

10 One topic of discussion was of the type of devices. Several different types of devices were proposed, for example, puzzles, games, musical instruments, projected art, LED art, etc. Many implementations of each of these types of devices were also suggested, for example, a hot potato game was suggested for a game device. Another topic of discussion was the direction of information exchange for the devices. This exchange refers to how the flow of communication between the devices, for example, completely self-contained, individual to individual, many to one, one to many, many to a central device mounted at a fixed location, etc. Some of the discussion also focused on the details of implementation. Many types of sensors were proposed based on research and expert knowledge, for example, light, sound, motion, alcohol, etc. When implementations of the type of device were discussed, many specific decisions about types of sensors and feedback components were discussed. The feedback components proposed included speakers, LEDs, LCD displays, projections, motors, etc. After the brainstorming sessions, our team collected the ideas put forth and made some design decisions about the devices. They would promote interaction through group sound and music creation. The devices would each have a slider, a light detector, and two buttons as sensors, and each controller would have a speaker, an LED, and an optional audio jack as feedback components. The communication between devices would be through LEDs and light detectors, which can provide several of the directions of information exchange listed in this section. Several design decisions were delayed until after the devices were designed in a prototyping environment. These decisions included implementation as analog or digital circuitry and whether each device would have singular or multiple behaviors. The mapping detween the sensors and the behavior of the device would need to be determined for each device, also Prototyping Environments Instead of directly fabricating the final hardware, we decided to prototype the devices. Many different prototyping environments were considered, such as the iphone, LEGO Mindstorms, Processing[9] using the Minim[7] sound library, Max/MSP[3], PD, the Mini-Simius Card[11], Arduino, and analog circuitry. We decided on three different categories of prototyping environments: software programming (Processing/Minim and Max/MSP), analog circuitry, and microcontroller (the Mini-Simius Card). We chose to work with several prototyping environments in parallel so as to utilize the team s varied areas of expertise. All of these prototyping environments have their benefits and drawbacks, but each helped us to validate the different parts of the design goals with respect to the interactions with the generated sound Software Programming The software packages we used as prototyping environments were Max/MSP and Processing using the Minim sound library. These two software packages are respectively graphical and textual programming environments. Both of these are well-known in the art community and contain all the tools needed for synthesizing sound and designing an interface with all the controllers that we had chosen. There were many benefits of using these software packages as prototyping environments: Figure 1: A graphical interface for one of the prototypes built in Max/MSP. rapid prototying Once a graphical interface was built, one could easily modify and change the mapping of the sensors in order to improve the interaction between the sensors and the controlled parameters of the sound. unrestricted abilities Both Max/MSP and Processing/Minim already had a large number of audio tools, but because both were programming environments, they allowed incredible flexibility. simple interface to sensors Because the final devices were intended to be handheld, the physical sensors were part of the prototyping as well. Both of these software packages had relatively simple interaces with most common communication protocols like MIDI, OSC, UDP or serial. We used Interface-Z[5] light detectors and a Behringer BCF2000[1] midi controller to simulate the sensors. All these benefits enabled us to quickly work on the most important parts of the device design, the sound synthesis and the mapping of the sensors for the benefit of the interactions. A graphical interface for the prototypes using Max/MSP is shown in figure 1. A graphical interface for the prototypes using Processing and Minim is shown in 2. The main drawback of this prototyping environment was that every prototype which was selected for final implementation had to then be reimplemented in order to function on the final social controller hardware. Our choice for that hardware would impose restrictions on how the prototype could be implemented, for example, choosing a microcontroller for final implementation would place a limit on computing power Analog Circuitry As opposed to the software programming, analog circuitry was a slow and difficult to debug environment for prototyping. The main benefit of analog circuitry was the infinite controllability it gave to the user. Interactions with the sound were very precise as the signal is never sampled. The interaction with the sound was direct and responsive as well. One analog circuitry prototype is shown in figure 3. Though we did not yet know the final social controller hardware during the prototyping phase, any choice for that hardware other than analog circuitry would involve reimplementation. The most difficult part of this conversion would be to find an algorithm which approximates the behavior of the analog circuitry. 38

11 Figure 4: A prototype device built using the Mini-Simius Card development board Microcontroller Figure 2: A graphical interface for one of the prototypes built in Processing and Minim. The Mini-Simius Card has been developed by the SEMI lab of the Universitdé de Mons. The goal of this card was to have a universal hardware which would easily connect to many kinds of sensors using the I2C (Inter-Integrated Circuit) or SPI (System Packet Interface) protocol. The microprocessor used on this card is a PIC (Programmable Intelligent Computer) 18F876. A prototype using the Mini-Simius Card is shown in figure 4. Even if this development board is meant to be simple and somewhat universal, prototyping using it was still microcontroller development and remained slower than prototyping using software programming. Conversely, because of our decisions about final hardware for the social controllers described in section 2.4, of the three prototyping environments discussed here, this microcontroller prototyping required little to no modification when ported to the final hardware Instrument Design and Testing Figure 3: An example of a prototype, the soxillator, built in analog circuitry under the direct tutelage of Collins[2]. At this point in the process, we delineated the division between the concept of an instrument and a device. An instrument was a sound generating algorithm and associated sensor mappings, while a device was a piece of hardware which might be capable of giving a physical form to one or more instruments. With these conceptual definitions, we were able to move forward with instrument design, even if our device design was not yet fixed. One of our main goals when designing an instrument was to make them enjoyable to play. This goal meant that an instrument must be easy to learn, responsive, and complex enough to hold interest. Since the number and type of sensors was already fixed, and the prototypes were made without regard to the final physical form, instrument design deals with the sound synthesis and the mapping of sensors to sound synthesis parameters. The design of the final physical form is discussed in section 2.5. We also wanted the listener to be able to distinguish the different instruments sound from one another, therefore, we worked on a wide range of sound synthesis algorithms. Eighteen instru- 39

12 ments were designed, which could be grouped into the following four general categories: tonal The user can produce and control notes as with a traditional instrument distortion Given a sound of some sort, the user controls parameters which distort this sound in some way. FM (Frequency Modulation) synthesis is included in this category. rythmic The user controls the tempo and elements of a rhythmic beat. filtering The user controls the manner in which a noisy soundscape is filtered using the sensors. After the instruments had been designed, we needed to test and evaluate each one. A group of eight people tested the instruments and gave their remarks and ratings. These ratings were then correlated, and the five highest rated instrument designs were then chosen for the final social controller device. These five instrument designs reperesented the distortion, rhythmic, and filtering categories Microcontroller Choice and Programming Until this point in the design process, no decision about the final hardware implementation had been made except for the sensors. In the interest of flexibility and given the expertise of our team, we chose to implement the instruments as sound modules for embedded software running on a microcontroller. At this point, we began by implementing using the Mini-Simius Card discussed in section and further described in section For the final implementation it was necessary to choose a second microprocessor for several reasons, which will be discussed in section First Microcontroller Choice The initial requirements for the processing platform were a specific number of inputs and outputs, cost, power consumption, and small footprint. The hardware platform could be split into input, processing and output blocks to easier specification. The input block had to contain two digital inputs (2 buttons) and two analog inputs (slider and light detector) as discussed in section 2.1. The processing block was planned to be a cheap and simple 8-bit microprocessor, because of the need to keep the cost low. The output block contains one analog (audio) and one digital (LED) output. The audio synthesized inside the microcontroller and passed to the audio amplifier was generated using PWM (Pulse-Width Modulated) output instead of a DAC (Digital-to-Analog Converter). This simplified solution combined lower component count, reduced cost, a minor design requirement, and sufficient signal quality. A first prototype shown in figure 4 was built using the Mini- Simius Card, and is based on Microchip PIC16F887 processor. The specification of this processor are a 20Mhz clock speed, 35 digital I/Os, a PWM output capable of several modes, a 10-bit resolution A/D converter, and 368 bytes of RAM memory. The CCS (Custom Computer Services) compiler for PIC family, the MPLAB IDE (Integragted Development Environment), and PICkit 2 debugger/programmer from Microchip were the programming environment used for software development Programming Efficiency In order to extract the maximum processing power from the microcontroller, some efficiency choices were built into the software during the development process: All computation was done using integer rather than floatingpoint arithmetic to save on computational time. The oscillators were based on straight-line waves (triangle, saw, and square) rather than sinusoids. The processing cost of using sinusoids was considered too high in computation or memory (for a lookup-table solution). The straight-line waves used a simple counter to find exact values during runtime. The audio quality provided by the straight-line waves was sufficient and often beneficial for our purposes. The microcontroller provided a PWM output as an independent hardware block and is developed for output. The PWM could be used at an 82 khz sampling rate and 8-bit resolution. The audio signal was stored as an 8-bit value, so no special processing was requires to export it. Audio was processed as single samples instead of blocks. This approach was chosen for ease of implementing chained audio effects, but did not incure a greater computational load or block storage. A drawback was that the audio was unbuffered. The output sample audio rate was given by the frequency of main loop, so the loop had to be timing-controlled. A microcontroller timer ran independently of the audio processing loop and was verified at the end of the loop. The audio processing loop needed to always finish processing before the timer, and then wait for the timer to complete before providing a new sample value. In order to increase the number of oscillators available, and increase the resolution of frequency control using integer arithmetic, sample rate was greatly reduced to 3.2 khz. This reduction creates some issues with sound quality, but high-fidelity sound was not a goal of this project Second Microcontroller Choice Even with the many compromises made for efficiency, the first choice for microcontroller did not have the processing power necessary to implement several of the selected instruments. A second (and final) microcontroller was chosen. The new processor had to respect the same primary requirements, especially cost. A member of the dspic family of microcontrollers also from Microchip was selected in order to maintain as much compatibility as possible with the source code already written, while increasing the capabilities of the microprocessor in terms of clockspeed, bit resolution, and maintaining almost exactly the same cost. The migration of microcontrollers was facilitated because both microcontrollers use the same architectural concepts and programming environment. The final version of the social controllers ws assembled using a dspic33fj12gp bit Microchip processor. Here are the specfications: 120 Mhz clockspeed, 16 digital I/Os, a PWM output with several output modes, a 10/12-bit resolution A/D converters, two 40-bit accumulators, 16x16bit multiply operations, and 1024 bytes of RAM memory. The programming environment uses the Microchip compiler for the dspic family, the MPLAB IDE, and PICkit 2 debugger/programmer from Microchip. 40

13 Figure 5: The final design for the small footprint PCB. The two microcontrollers are compared in table 1. This shows the effective speed-up of the algorithms due to clockspeed by using the new microcontroller, and it has proved more than sufficient during the subsequent development period. The change from an 8- bit microcontroller to a 16-bit microcontroller also sped up some calculations. Table 1: Speed increase due to clock speed from our choices of the PIC family to dspic family. functions PIC887 dspic33 ADC data read 4100 ns 3300 ns triangle wave oscillator ns 1640 ns PWM data write 1300 ns 440 ns loop structure 6000 ns 440 ns maximum oscillators at 22 khz sampling rate Final Circuit and PCB Features The design was based on the previous prototypes. As stated, the final circuit needed to read data from two buttons (digital), one slide potentiometer (analog), and one light detector (analog) in order to control the behavior of the instruments. The circuit also controls one LED (digital) and one audio output (analog through PWM output). All inputs and outputs were designed with appropriate circuitry and auxiliary components for proper signal conditioning. A small footprint PCB (Printed Circuit Board) was designed using Altium Designer to fit into a small physical form. Notice that all connectors are external to the board to accomodate the external sensors. See figure 5 for the PCB design. The circuit was designed to use 3xAAA batteries, or around 4.5 V. The electronic components were chosen to allow the circuit keep functioning with as little as 3.0 V in order to give the controllers independence from other power supplies. The audio amplifier power source is unregulated allowing the audio amplifier to deliver the power available from the batteries Fabrication The final form of our social controllers was now the most important question we needed to answer. When one is dealing with phys- Figure 6: Fabrication of the social controllers. ically controlling electronic sounds, musical interactions must be considered. We wanted our social controllers to be easily understandable and usable. Part of this goal was contained in the mapping and the ease in understanding the interactions between the sensors and the reactions of the sound algorithm. The other part of this goal was contained in physical interface design. We decided to use a computer mouse as the physical form of our social controllers for several reasons. The first reason was because the form and use of the computer mouse was ubiquitous as a physical input device. No instruction would be needed for the users. The ergonomic and familiar shape of the mouse reduces the time spent learning how to control the devices as an instrument. A second reason was that old computer mice were readily available at no cost, so this helped to keep cost low. A set of 6 mice was fabricated in approximately five days. In order to use computer mice, the mouse bodies were totally customized. Most elements inside the original mouse were cut and trimmed, so that the PCB and the battery pack could fit inside. The original mouse buttons were used as the button sensors on the PCB. Holes were drilled and cut to place the slider (left side), the light detector (bottom), the speaker (bottom), and a power switch (back). The original connector cable was cut and part of the connector was used to house the feedback LED. Aside from the PCB, which required reflow soldering, all parts and components were readily available. The fabrication process is shown in figure RESULTS When the design and fabrication of the controllers were finished, we christened the final design as the Lara (both as a concatenation of the initials of the team, and in honor of the wife of one of the team who suffered her husband s absence too often), and can be seen in figure 7. The final choice for hardware implementation was the dspic processor as discussed in section Each Lara had 2 buttons, a slider, and a light detector as sensors, and a speaker, an LED, and an optional audio jack as display components, as discussed in section 2.1. Each Lara contains the five best instruments from the testing discussed in section 2.3. The instrument can be selected by the user by holding down the left mouse button while turning on a Lara, or can be selected randomly by simply turning a Lara on. Each instrument provides a Lara wih a new sound generating al- 41

14 Figure 7: The Lara, our social controller. gorithm and a unique mapping of the sensors. For example, in one rhythmic instrument slider controls the tempo, while in one of the distortion instruments, the slider controls the frequency of a tone. Each Lara can behave in a self-contained fashion; a user can create sound with one Lara by moving sliders, pushing buttons, and directing the LED or other lights at the light detector. Depending on the instrument which is active, pointing the LED directly at the light detector can create a chaotic feedback loop for extremely complex self-contained sound generation. However, the Laras are more interesting when several are linked together using the LED from one Lara directed at the light detector of another Lara. The light of the LED is associated with the sound being created on the first Lara. The light sensor of a second Lara can receive the light emitted by the LED of first Lara. This creates communication where the sound emitted by the second Lara is modulated and affected by the light received from the first. To form even more complex patterns, the LED of the second Lara can be directed at the light detector of the first, a third Lara can be introduced, or an external light source can be used to control both Laras simultaneously. 4. CONCLUSIONS There were many lessons learned in the process of creating the Lara. The most important is the difficulty in creating an engaging, controllable instrument that is simple enough to learn quickly yet complex enough to hold interest. Balancing the controllability, learnability, and complexity simultaneously was probably the most challenging aspect of this project. Microprocessor programming is often fraught with processor specific complications and long debugging cycles. We feel that simulating most of the instruments in a prototyping environment was an excellent choice which allowed us to focus on the difficult instrument design before being concerned with the microprocessor development cycle. In both the analog circuitry prototypes and the development boards for the microprocessors, hardware bugs were extremely difficult to track down. These problems took the most time to resolve and once again pointed to the usefulness of the prototyping environments. One of the main benefits of doing the final development on the microprocessors was that the instrument design was exactly as it would appear in the final device, the Lara. Developing on the device did lead to some interesting bugs which produced sounds of a completely different category than had orignially been developed in section 2.3. These unexpected behaviors were often complex and could be reproduced after analysis. This led to some enhancements in the designs of the final sound modules. Our original goal had been to produce the social controllers cheaply enough to be able to give them away. Originally this meant a price level of one or two euro. The final cost for the parts to produce 100 to 200 Laras would be about 6,40 euros apiece. Given contracted fabrication for large quantities, this cost could be reduced. Our team would, at some point, like to have enough Laras fabricated to deploy a large quantity of them at an art event. We imagine that large group interactions with a set of people open to playing and interacting with the Laras would be enjoyable to be a part of. In that vein, these controllers could be useful in a marketing campaign, especially where the interaction design could include some aspect of a marketed product in the interactions. Given the opportunity to continue this work, the first improvement we would like to make would be wireless communication. One of the ideas we had been interested in implementing with wireless communication was knowledge of neighbors. In a situation where several Laras were in the same space, they would know how many other Laras were nearby and which sound module was currently active on the others. This would allow the Laras to automatically form a diverse group of sound modules, or use completely new sound modules only available when in the presence of other Laras. 5. ACKNOWLEDGMENTS numediart is a long-term research program centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N ). Our team would like to thank those researcher at numediart and the TCTS lab who participated in the brainstorming and testing sessions. We would like to thank Alexander Deweppe at the IPEM lab at Universiteit Gent for his invaluable guidance on user feedback requirements, and Arnaud Very at Université de Mons for his programming work during his internship at TCTS. We would like to thank Lara Cristiane Corá for lending us her husband for the duration of the project. 6. REFERENCES 6.1. Software and technologies [1] Behringer. BCF2000. URL: http : / / www. behringer.com. P.: 38. [2] Nicolas Collins. Handmade Electronic Music: The Art of Hardware Hacking. 2nd ed. New York, NY, USA: Routledge, ISBN: Pp.: 37, 39. [3] Cycling 74. Max/MSP. URL: http : / / www. cycling74.com. P.: 38. [4] Mark Christopher Feldmeier. Large Group Musical Interaction using Disposable Wireless Motion Sensors. MA thesis. Massachusetts Institute of Technology, P.:

15 [5] Interface-Z. directive LDR sensors. URL: interface-z.com. P.: 38. [6] Bleep Labs. Thingamagoop. URL: http : //bleeplabs.com/. P.: 37. [7] John Anderson Mills III, Damien Di Fede, and Nicolas Brix. Music Programming in Minim. In: Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010). Sydney, Australia: New Interfaces fr Musical Expression 2010, Pp P.: 38. [8] Tristan Perich. 1-Bit music. In: Proceedings of the 2007 Conference on New Interfaces for Musical Expression (NIME 2007). New York, New York: ACM, Pp P.: 37. [9] Casey Reas and Benjamin Fry. Processing: a learning environment for creating interactive Web graphics. In: SIG- GRAPH 03: ACM SIGGRAPH 2003 Web Graphics. San Diego, California: ACM, Pp DOI: doi.acm.org/ / P.: 38. [10] STEIM. Cracklebox. URL: P.: 37. [11] SEMI of UMONS. Simius Card. URL: simius.be. P.: 38. [12] Gil Weinberg. Interconnected musical networks: bringing expression and thoughtfulness to collaborative group playing. Supervisor-Machover, Tod. PhD thesis P.:

16 44

17 COMEDIANNOTATE: TOWARDS MORE USABLE MULTIMEDIA CONTENT ANNOTATION BY ADAPTING THE USER INTERFACE Christian Frisson (1,2), Sema Alaçam (3), Emirhan Coşkun (3), Dominik Ertl (4), Ceren Kayalar (5), Lionel Lawson (1), Florian Lingenfelser (6), Johannes Wagner (6) (1) Communications and Remote Sensing (TELE) Lab, Université catholique de Louvain (UCLouvain), Belgium; (2) Circuit Theory and Signal Processing (TCTS) Lab, Université de Mons (UMons), Belgium; (3) Architectural Design Computing, Institute of Science and Technology, Istanbul Technical University (ITU), Turkey; (4) Institute of Computer Technology, Vienna University of Technology, Vienna, Austria; (5) Computer Graphics Lab (CGLab), Sabancı University, Istanbul, Turkey; (6) Lehrstuhl für Multimedia-Konzepte und Anwendungen (MM), Universität Augsburg, Germany ABSTRACT This project aims at improving the user experience regarding multimedia content annotation. We evaluated and compared current timeline-based annotation tools, so as to elicit user requirements. We address two issues: 1) adapting the user interface, by supporting more input modalities through a rapid prototyping tool and by offering alternative visualization techniques of temporal signals; and 2) covering more steps of the annotation workflow besides the task of annotation itself: notably recording multimodal signals. We developed input devices components for the OpenInterface (OI) platform for rapid prototyping of multimodal interfaces: multitouch screen, jog wheels and pen-based solutions. We modified an annotation tool created with the Smart Sensor Integration (SSI) toolkit and componentized it in OI so as to bind its controls to different input devices. We produced mockups sketches towards a new design of an improved user interface for multimedia content annotation, and started developing a rough prototype using the Processing Development Environment. Our solution allows to produce several prototypes by varying the interaction pipeline: changing input modalities and using either the initial GUI of the annotation tool, or the newly-designed one. We target usability testing to validate our solution and determine which input modalities combination best suits given use cases. KEYWORDS Multimodal annotation, rapid prototyping, information visualization, gestural interaction 1. INTRODUCTION This project attempts to provide a tentative toolbox aimed at improving the user interface of current tools for multimedia content annotation. More precisely, this project consists in combining efforts gathered in fields such as rapid prototyping, information visualization, gestural interaction; by adding all the necessary and remaining components to a rapid prototyping tool that allows to visually program the application workflow, in order to the refine the user experience, first of one chosen annotation tool. This toolbox is a first milestone in our research, a necessary step to undertake usability tests on specific scenarios and use cases after this workshop. This report is structured as follows. In Section 2 we define the context and scope of the project, i.e. multimedia content (Section 2.1) annotation (Section 2.2) and list possible use cases (Section 2.3) and testbeds (Section 2.4). In Section 3, we summarize the current problems of timeline-based multimedia content annotation tools, based on previous comparisons (Section 3.1) and on an evaluation we undertook during the workshop (Section 3.2), then we explain why we chose to adapt the SSI annotation tool (Section 3.3). In Section 4, we describe the method we opted for: through a user-centered approach (Section 4.1), we restricted our design to two modalities (Section 4.2): visualization (Section 4.2.1) and gestural input (Section 4.2.2), among other possible modalities (Section 4.2.3); we thus used a rapid prototyping (Section 4.3) visual programming tool (Section 4.3.1) for the user interface (Section 4.3.2), the OpenInterface platform (Section ), and a rapid prototyping tool for visualization (Section 4.3.3), the Processing Development Environment (Section ). In Section 5, we summarize our results: we proposed a new tentative design of an improved user interface (Section 5.1), illustrated with mockups (Section 5.1.2) and an early prototype (Section 5.1.3); and we developed components for the OpenInterface platform (Section 5.2) for gestural input modalities (Section 5.2.1), control of the SSI annotation tool (Section 5.2.2). In Section 6, we underline our future works: a more robust prototype integrated into the MediaCycle framework for multimedia content navigation by similarity (Section 6.1) and subsequent usability tests to validate our designs (Section 6.2). Finally, we conclude in Section CONTEXT: ANNOTATION OF MULTIMEDIA CONTENT 2.1. What do we mean by multimedia Multimedia data commonly refers to content (audio, images, video, text...) recorded by sensors and manipulated by all sorts of end-users. In contrast, the term multimodal data describes signals that act as ways of communication between humans and machines. Multimodal data can be considered as of a subset of multimedia data, since the first are produced by human beings. Multimedia data thus broaches a wider range of content (natural phenomena, objects, etc...). Annotation tools help analyzing multimedia data, but also make use of multimodal signals within their user interface What do we mean by annotation The following questions illustrate the issues we faced while understanding each others on a generic definition of the term annotation : 45

18 Who is doing it? Human(s) and/or machine(s)?: automatic annotation consists in extracting metadata using signal processing algorithms with no (or limited) parameter tweaking required from the user; manual annotation is performed by humans adding metadata to data using various user interaction techniques; semi-automatic annotation combines both approaches, sequenced in time. For instance: once data is loaded in the annotation tool, feature extraction algorithms run in the background on a subset, the user is then asked to correct these automated annotation, then a process propagates the corrections to the whole dataset. In case of humans, what about standard versus expert users? Is it being performed collaboratively by multiple users? What kind of data is annotated? Multimedia content and/or multimodal signals? When is it performed? Online and/or offline? For what purpose? Which use cases, scenarii? Semiotically, from the user perspective and viewpoint, two types of annotation can be discriminated: semantic: words, concepts... that can be assorted in domain ontologies, graphic: baselines, peaks... with a tight relation to the gestural input required to produce them Additionally, Kipp proposes a spatio-temporal taxonomy in [29]: shape (or geometric representation), number (of occurrences), order (chronological or not), rigidity and interpolation (discrete, linear or spline). We opted for the following definition: annotations consist in adding metadata to data in order to extract information, that is contradictory with [42] which confronts annotation and metadata, the first term considered time-dependent by the author while the second isn t Possible use cases We had in mind to propose a toolbox with which the user can adapt the annotation tool to his/her needs, instead of having to use a different tool for each domain of use, for instance: corpora archival, multimedia library sorting, sensor recordings analysis, etc.. There are numerous possible uses of multimedia content annotation, here follows a subset applied to multimedia arts: annotation of motion capture [23], for instance with online errors notification while recording for offline reconstruction of missing data; analysis of dancers performances [56] requiring diverse types of sensors, training mappings of gesture-based dancers interfaces using performances recordings [16, 19]; preparation of material for live cinema performances [37]; multimedia archival and restoration [49] Possible testbeds We tried to adapt the project scope to fit it better to some MSc/PhD participants topics, by considering two more testbeds besides timelinebased annotation tools Timeline-based Annotation Tools These tools focus on the analysis of temporal signals or time-series and offer a great challenge regarding handling time for navigation and annotation purposes. It has to be noted that most participants already had some experience with multimedia edition tools, requiring similar navigation methods and offering a subset of the variety of possible annotations Multimedia Collection Managers These tools, such as itunes for music libraries, share design questions with Emirhan s MSc (2D visualization and representation of massive datasets, in his case in the context of social networks) and Christian s PhD (similar, applied to multimedia content). We discarded this testbed because we haven t found any already-existing opensource tool that would offer flexible annotation further than basic metadata management (ID3 tags for music, movie "credits" information, etc...) for audio and video media types (however, we found some for image or text) Panaromic Image based 3D Viewers and VR World Viewers These tools, such as Google Earth, HDView Gigapixel Panoramas [32] and Photosynth [55], were particularly interesting regarding Ceren s PhD work [27]. We discarded this testbed because developing a simple 3D viewer with annotation support or even integrating navigation and annotation through Google Earth API would have taken too much time, leaving not much time to deal with real research issues (for example: occlusion-free tag 3D position considering a variable user viewpoint). 3. TIMELINE-BASED MULTIMEDIA CONTENT ANNOTATION TOOLS: FROM PROBLEMS, TOWARDS USER REQUIREMENTS 3.1. Summary of current problems Plenty of pre-existing works compared annotation tools and elicited emerging requirements, for instance throughout the last decade [6, 8, 12, 48, 50]. Based on these observations and readings, we summarize the following issues regarding how annotation tools could be improved (checked boxes emphasizing the ones we planned to address throughout the workshop): multimedia: better file formats support [6, 50], timebased media other than audio and video [6]; scale: number and/or length of media elements in the database; reusability: toolboxes/frameworks rather than isolated tools specific to a given context of use [12, 48], portability over multiple operating systems [8, 50]; accessibility: client/server applications rather than desktop applications working with local media databases; interactivity: a multimodal user interface could help enhance the pleasurability and efficiency of these tools that are generally WIMP-based [6, 12, 48], so as to provide a single used interface that allows; 1. to monitor signal feeds while recording datasets, 2. optionally to add annotations while recording, 46

19 Advene [43, 1] AmiGram [4] Anvil [29, 28] ELAN [57] Lignes de Temps [41] On The Mark[64] SSI [61, 60] VCode/VData [18, 58] Figure 1: Screenshots of our selection among the available annotation tools. Image copyrights remain with their authors. 3. to edit or correct annotations; 4. a more natural, usable, pleasurable user interface (pen and touch). workflow: supporting of the full annotation workflow [12, 18]: 1. one administrator prepares (design of a template and choice of coders); 2. several coders record; 3. several coders annotate; 4. the administrator analyses results (coder agreement...) Evaluation and testing during enterface 10 We tested 8 opensource or free tools, with screenshots in Fig 1, at least with one participant assigned to each (practically, two participants tested each), alphabetically: Advene [43, 1], AmiGram, Anvil [29, 28], ELAN [57], Lignes de Temps [41], On The Mark [64], Smart Sensor Integration (SSI) [61, 60] and VCode/VData [18, 58]; so as to better understand the concerns with a hands-on approach. We produced detailed comparison in 3 tables that are available online on the enterface 10 wiki 1, focusing on: 1. development criteria (quantitative): OS, licence, development languages, supported formats...; 2. context, usage (quantitative): media types, scope, field of use...; 3. enterface participants feedback (qualitative): subjective comments on usability and pleasurability raised by the participants while testing these tools. 1 php/comediannotate:framework:annotation:tools A first round of selection based on development considerations (operating system, development language and licenses) narrowed down the choice among 3 candidates out of the 8 tested: Ami- Gram, ELAN and SSI. implementation: in C++ or Java or C# or Python, supported by the rapid prototyping platform for multimodal interfaces we chose (as explained in Section ); license: necessarily open-source so that we could modify the source code; compatibility: running on most operating systems possible, the common denominator operating system among participants being Windows Chosen tool for adaptation: Smart Sensor Integration (SSI) Description The SSI toolkit [61, 62] developed within the CALLAS EU project by two of the participants, Johannes and Florian, is a framework for multimodal signal processing in real-time. It allows the recording and processing of human generated signals in pipelines based on filter and feature extraction blocks. By connecting a pipeline with a classifier it becomes possible to set up an online recognition system. The training of a recognition model requires the collection of a sufficient number of samples. This is usually accomplished in two steps: 1) setting up an experiment to induce the desired user behavior, 2) review the recorded signals and add annotation to describe the observed behavior. For this purpose SSI offers a an annotation tool for multimedia signals. Signals recorded with SSI can be reviewed and annotated within this tool (see Fig. 2). Depending on the length of the recordings (usually several hours) annotation can turn out to be an extremely time-consuming task. Currently the tool is controlled via simple mouse and keyboard commands. This is not always the fastest way and after some while of continuous use can become inconvenient for the user. 47

20 we produced a fast software and hardware prototype with off-the-shelf devices using rapid prototyping tools, as a first proof-of-concept, before rethinking the prototype with a more dedicated but slower to implement solution Two modalities of interest Currently, we target standard experts (ie not disabled users such as blind people), yet such cases could be addressed since we are making use of a rapid prototyping tool for multimodal user interfaces. Ever since before computerized systems, two modalities were deeply rooted in the task of annotation: visualization and gestural input Visualization Figure 2: In SSI recorded sessions are visualized together with annotation tracks that describe the observed user behavior. The screenshot shows four signals (top down: eyegaze, head tracking, audio and blood volume pulse) and two annotation tracks (here: the transcription of the dialog between the virtual character and the user). On the left side videos of the application and the user are displayed. Screenshot from [62]. Hence, the tool would greatly benefit from alternative ways of interaction, such as Nintendo s WiiRemote control or a gamepad Reasons for the choice We are in close contact with its developers who participated to the project during the first week. The core is separated from the UI. The simple annotation GUI is lightweight, hence simple to understand, and easy to replace. The toolkit not only proposes a simple annotation tool, but also feature extraction algorithms for automatic annotation, and could bridge the gap between multimedia content and multimodal signals annotation. This is of interest for some participants like Dominik for future works around adaptive multimodal interfaces by training [51]. The development languages are compatible with the chosen rapid prototyping platform (see Section ) User-centered approach 4. METHOD We opted for a user-centered approach [12] to conduct our research: in addition to gathering scientific documentation, we undertook a small contextual inquiry with enterface participants that had already had to use an annotation tool; before diving into software development, we cycled through and brainstormed on different design propositions using paper mockups; The earlier visualization techniques regarding annotation were often offered by the recording device itself: sensor plots, video films, audio tapes, and so on... The closest task to multimedia content annotation is multimedia edition, notably with audio and video sequencers that can record signals, segment them, apply effects on them and realign them along the timeline. Lots of techniques dedicated to time series have been proposed so far [3, 33]. Less standard information visualization techniques considering the user perception [63] might improve the task of multimodal annotation, during monitoring of recording processes and post-recording analysis. For a more in-depth analysis, different types of plots can help reduce the complexity of multidimensional data spaces and allow visual data mining. Animations between visualization techniques switched during the task may arouse cognitive effects and improve the user s comprehension of the underlying information present within the displayed data [22, 5]. We follow this overview with specificities to some media types we chose to investigate: audio and video Audio: waveforms... A survey of waveforms visualization techniques is proposed in [17], using visual variables to display more information than envelope or amplitude, rather: segments, frequency and timbral content, etc... Some advice is offered on how to visualize waveforms under small scale constraints, particularly by neglecting the negative part of the waveform or subtracting it to the positive part so as to overlap both, similar to a half-rectified signal. A regressive variation on these mirrored graphs called n-band horizon graphs [21], effectively reducing the height of time-series while keeping readability of information at high zoom factors, seems particularly useful for multitrack timeline representations Video: keyframes... Video content is often represented by its frames or keyframes in various ways: all frames aligned in time horizontally; a subset of these sequenced in time and overlapped in location (such animated GIF image files serving as thumbnails on video hosting portals such as Archive.org); a standard video player where all frames are displayed on the same location, overlapped in time. Other spatio-temporal content-specific techniques have been for video signals, for instance MotionGrams [26] or slit/video scanning [40], particularly suited to videos featuring movement 48

21 of recurring elements in the scene (again, for instance, dancers videos, among other examples in interactive arts [40]). Lee et al. proposed several keyframe browsing prototypes [36] characterized along a 3D design space: layerdness (single/multiple layer with/without links), temporal orientation (relative/absolute/none) and spatial vs. temporal visualization Gestural input 3D Mice Jog Wheels Navigation Indirect Mice Keyboards Pen and Touch Multitouch Surfaces Direct Annotation Pen Tablet PC Figure 3: A selection of devices sorted on a 2-D space, indicating: horizontally whether each device seem suitable for navigation and/or annotation tasks, vertically whether the tied gestural input vs visual output modalities relation is direct or indirect. Keyboards and mice interaction is still standard for most desktop applications [39], key bindings appears to be the fastest way of triggering momentary or ranged annotation when navigating on the signals with a constant playback speed [18]. Pen have been used by human people to annotate graphics and plots long before their recent computerized versions, now free-form [30] with styluses [2]. Jog wheels for navigating in audio and video signals have been widely used by experts of audio edition and video montage before multimodal annotation. Multitouch interfaces allow the combination of both navigation and annotation modes using one single gestural input modality. The direct or indirect gestural vs visual relation of the user interface can affect the spatial accuracy and speed of annotation tasks [52]. We have illustrated these concepts in Fig. 3 by representing gestural input modalities illustrated with common associated low-cost controllers Other possible modalities As raised in Section 2.1, similar sensors can be used to record both the multimedia signals being annotated with the annotation tool and the multimodal signals used in the user interface from the tool, thus such modalities used for multimodal emotion recognition [61], for instance eye gaze could be used to improve the location of annotations and predict regions of interest for the user so as to better layout notifications; while voice input with speech recognition could help produce instant user-defined tags or accurate dubbing of meeting recordings Rapid Prototyping Scripted/textual versus visual programming Signal processing and engineering specialists often use scripted/textual programming for their prototypes (for instance using Matlab) and they optionally switch to visual programming dataflow environments when realtime prototyping is of concern (with LabVIEW, Simulink, etc...). We believe that blending both approaches is convenient for the process of designing and prototyping the multimodal user interface of our adapted tool: visual programming gives a visual representation by itself of the underlying interaction pipeline, quite practical for exchanging design cues, while textual programming is quicker a designing simple and fast procedural loops, among other advantages Visual Programming Environments for Multimodal Interfaces Existing visual programming tools The number of multimodal prototyping tools and frameworks, dedicated to gestural input or generic towards most multimodal interfaces, has been increasing over the last two decades, yet none of them has been accepted so far as an industry standard. Among the vast availability, we would like to cite some that are still accessible, alphabetically: HephaisTK [10], ICon [9] and the post-wimp graphical toolkit MaggLite [24] based on top of it, OpenInterface [38] (with its OIDE [54] and Skemmi [35] visual programming editors), Squidy Lib [31]. Data flow environments such as EyesWeb [25], PureData [45] and Max/MSP [7] benefit from their anteriority in comparison with these multimodal prototyping tools, as they often provide more usable visual programming development environments. Some of the authors of this report have been successfully using PureData as a platform for rapid prototyping of gestural interfaces [14]. A notable nice feature from these environments that could be repurposed in the ones targeted for multimodal user interfaces: the multi-fidelity patch/pipeline representation modes of Cycling Max / MSP: 1. in edit" or patch mode, the dataflow representation of the pipeline, widgets of processing blocks are editable and interconnections apparent between these; 2. in running or normal mode, widgets from the pipeline are interactive, but interconnections are hidden; 3. in presentation mode, widgets are ideally positioned as it would be expected from a control GUI and connections are hidden as well. OpenInterface/Skemmi addresses this issue with designer/developper modes and a non-linear zoom slider while Squidy Lib offers a zoomable user interface Chosen plaform: OpenInterface (OI) The OpenInterface platform [34] developed by one of the participants, Lionel Lawson, facilitates the rapid prototyping of multimodal interfaces in a visual programming environment. It also eases technically the communication between components written in different development languages (currently: C++, Java, Python, 49

22 .NET) in Windows and Linux OSes. It already features several input device components (WiiMote, webcams for computer vision, 3D mice) and some gesture recognition components, but misses a few important ones (multitouch screen/tablets, pen tablet) for the scope of our project. We decided to maintain using this platform and implement the missing components Environments for GUI and visualization Existing tools Regarding visualization, mostly libraries are available rather than rapid prototyping tools, particularly Prefuse [20] for information visualization or VTK and Visualization Library for 3D computer aided design or medical visualization. The Processing Development Environment (PDE) [15] simplifies the development in Java and goes further than visualization by providing other libraries for gestural input for instance. Emerging libraries such as MT4j [13] in Java and PyMT [46] in Python offer high-level multimedia widgets with multitouch support, yet customization of widgets still requires some effort. The more recent VisTrails / VisMashup [53] allows visual programming of workflows for data exploration and visualization. view of the audio waveform and sensor signals for audio and sensor tracks; the width of the zone corresponding to the same time frame for all tracks. When recording, the zone could be located on the right, the remaining space left for visualizing past events. When navigating at a given playback speed, the zone could be located in the middle, leaving evenly proportionate space for future and past events, and restricting head movements from the user, gazing towards the center of the screen (as opposed to following visually the play head from left to right cyclicly in standard multitrack editors), the peripheral view optionally stimulated with highlighted past / future events. For a quick overview of the whole recording, the user could want to slide the zone from left to right or to a desired position as a magnifying tool Prototype A fast prototype of the proposed design was developed using the Processing Development Environment (PDE) [15], as illustrated in Figure Chosen plaform: the Processing Development Environment (PDE) We chose the Processing Development Environment (PDE) [15] since it was already mastered by the participants of the team working on designing new proposals for the graphical user interface of the annotation tool. Additionally, more scalability is offered by this solution for the prototyping: since PDE is written in Java, it is compatible with our chosen rapid prototyping platform, OpenInterface (see Section ), using proclipsing [44], a bridge to the Eclipse IDE used on top of which the OpenInterface Skemmi editor is built; but it can also be re-integrated into a more standalone MT4j application if only a multitouch interface is chosen for gestural input, hence removing the dependency to OpenInterface. 5. RESULTS Figure 5: Screenshot of the improved user interface design proposal, prototyped into the Processing Development Environment A tentative design towards an improved User Interface Design considerations While testing annotation tools (see Section 3), we noticed that the user experience with most of the tools was hindered due to the lack of seamless navigation techniques in lengthy signals, for instance changing the playback speed was awkward, both in terms of user input and visualization; and the related audio feedback was improperly rendered. The first task inherent to annotation is navigation into the multimedia content Mockups We believe that a single user interface could be used for both the recording of multimodal signals and the navigation into the recorded multimedia content. Figure 4 illustrates a design proposal that would allow this combination: a standard multi-track view of audio, video, and sensor signal tracks stacked vertically is augmented with a sliding vertical zone, extending the proposal of [17] and [59], where are visualized the current frame being played in video tracks (thus behaving like a video player), and a fisheye 5.2. Components for rapid prototyping with OpenInterface (OI) Gestural input Some device support components were previously available in the OpenInterface platform: the Wii Remote and 3D mice. For the integration of multitouch devices, 2 options were available: capturing WM_TOUCH high-level events from Windows 7 using frameworks such as MT4j [13], but it requires creating applications with the chosen framework; accessing low-level events for devices using the Human Interface Device (HID) protocol (cross-platform in theory), reusing code from the GenericHID application for Windows and Linux. We chose the second option since it also allowed with the same code base to integrate jog wheels (also using the HID protocol). 50

23 Movable Play Zone Past Events Future Events Audio Track: Waveform Video Track: Keyframes Sensor Tracks: Frames sliding right-to-left Figure 4: Annotated paper mockup of our proposed user interface design Annotation tool core/engine bindings In this workshop we decided to adapt the already existing SSI media annotation tool to become a media annotation toolkit with a multimodal user interface. This tool consists of two parts: First, the SSI Media UI, which is a WIMP-GUI based tool to add annotations to audio and/or video data. One can operate it with mouseclicks and a few keyboard commands. Second, the SSI core component that is responsible for the lower level signal processing. It is used by the SSI Media UI. In the course of the workshop we only adapted the SSI Media UI. In principle, the given SSI Media toolkit was a prototypical implementation of a media annotation tool. It did not come with a special API that can be utilized from external programs. Therefore, we bundled concrete functionality of the SSI UI Media toolkit into a new interface component. The process of media annotation can be split into three subsequent process steps: 1. create and select annotation tracks (for several annotation channels) 2. segment and reorder segments in one annotation track (includes selection of segments) 3. edit (annotate) segment meta data This process steps can be performed by the extracted functionality that includes among others start/stop playing of annotation segments, edit of annotations, selection of next/ previous segments, etc. We needed now a way to plug other input modalities to the tool that use this extracted functionality. We created an OpenInterface component for the adapted SSI Media UI. This OI component allows to use the provided SSI Media functionality by other OI input components. Figure 6 showcases several improvements we applied to its GUI Testing multimodal pipelines with OpenInterface (OI) First, we connected the following interaction devices in a new OI project to the SSI OI component. This included a WWI-mote, a 3Dconnexion SpaceNavigator mouse and a Contour Design Xpress jogwheel. Moreover, we created a new speech input OI component for the Julius toolkit [julius.sourceforge.jp/en]. Additionally, we integrated mouse behavior not only by clicking but also with mouse gestures. Each modality (a modality is an interaction device with a dedicated interaction language) was than coupled with specific functionality of the SSI Media toolkit. Not all modalities fit well for the all of the functionality, but this relies to higher-level interaction design. For example, it might be a good idea to select annotation tracks via speech input (command: next track ). Within such a track one uses mouse gestures to select next and previous segments. When a distinct segment is selected, one uses speech input again to start and stop playing the media (e.g., command: play segment. And the practicability of the proposed modalities varies. A wii-mote that fits well for arm-based gestures is not the first choice for smaller gestures for media annotation. On the other hand, a jogwheel was invented to improve video editing, thus fitting better for our work. Future work will include research on an improved interaction design, utilizing the right modalities for media annotation with the SSI_UI. 51

24 Figure 6: Screenshot of the improved GUI of the SSI annotation tool, integrated into the OpenInterface platform as component. 6. FUTURE WORK 6.1. Integration into MediaCycle, a framework for multimedia content navigation by similarity MediaCycle is a framework for multimedia content browsing by similarity, developed within the numediart Research Program in Digital Art Technologies, providing componentized algorithms for feature extraction, classification and visualization. The supported media types are: audio (from loops to laughter) [11], video (particularly featuring dancers) [56], images... This framework already solves some of the issues raised in Section by providing flexible audiovisual engines for the navigation in multimedia content (audio feedback with variable playback speed and visual feedback with cost-effective zoom and animated transitions). Moreover, the interoperation of an annotation timeline (displaying a few elements of the recorded database) with a browser view (displaying the whole database at different levels of segmentation) such as the one already provided by MediaCycle could help compare annotations between recordings and segments. Finally, the use of this framework could help reduce the number of video keyframes by content-based grouping of frames, with a possible scalability against the user-defined zoom factor Usability testing We received some feedback from several enterface participants who had already had to use an annotation tool, regarding their satisfaction with the tool they used. After the setup of a detailed protocol, usability tests based on simple tasks will be performed with the prototype, trying to determine if the user interface improves the annotation efficiency and pleasurability. 7. CONCLUSIONS We reached a first step towards more usable annotation tools for multimedia content: we raised the problems with current tools and proposed a new design to overcome these issues. The prototype needs to be polished and tested with users to validate the design. Meaning to produce deliverables available to most people (lowcost, open-source, and so on...) the enterface way, we developed: a free and opensource toolbox, mostly based on cross-platform tools and libraries; compatibility with low-cost input devices; a starting point to undertake usability testing that demonstrate the validity of the proposed solution. 8. ACKNOWLEDGEMENTS Christian Frisson works for the numediart long-term research program centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N ). Ceren Kayalar s PhD Research Project is partly funded by TUBITAK Career Research Grant 105E087 of her advisor, Dr. Selim Balcısoy. Florian Lingenfelser and Johannes Wagner are funded by the EU in the CALLAS Integrated Project (IST-34800). 52

Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization

Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization Christine A. Halverson IBM Research 650 Harry Rd San Jose, CA. 95120, USA krys@us.ibm.com Thomas Erickson IBM Research

More information

Analysis, Design and Implementation of a Helpdesk Management System

Analysis, Design and Implementation of a Helpdesk Management System Analysis, Design and Implementation of a Helpdesk Management System Mark Knight Information Systems (Industry) Session 2004/2005 The candidate confirms that the work submitted is their own and the appropriate

More information

Resource Management for Scientific Application in Hybrid Cloud Computing Environments. Simon Ostermann

Resource Management for Scientific Application in Hybrid Cloud Computing Environments. Simon Ostermann Resource Management for Scientific Application in Hybrid Cloud Computing Environments Dissertation by Simon Ostermann submitted to the Faculty of Mathematics, Computer Science and Physics of the University

More information

A Model Curriculum for K 12 Computer Science. Level I Objectives and Outlines

A Model Curriculum for K 12 Computer Science. Level I Objectives and Outlines A Model Curriculum for K 12 Computer Science Level I Objectives and Outlines Daniel Frost University of California, Irvine 5058 Donald Bren Hall Irvine, CA 92697-3425 949-824-1588 frost@uci.edu Anita Verno

More information

NESSI White Paper, December 2012. Big Data. A New World of Opportunities

NESSI White Paper, December 2012. Big Data. A New World of Opportunities NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of

More information


INTRODUCTION TO THE INTERNET AND WEB PAGE DESIGN INTRODUCTION TO THE INTERNET AND WEB PAGE DESIGN A Project Presented to the Faculty of the Communication Department at Southern Utah University In Partial Fulfillment of the Requirements for the Degree

More information

Social Media Analysis for Disaster Management

Social Media Analysis for Disaster Management Institute for Visualization and Interactive Systems University of Stuttgart Universitätsstraße 38 D 70569 Stuttgart Fachstudie Nr. 162 Social Media Analysis for Disaster Management Dang Huynh Nils Rodrigues

More information

Issue Tracking Systems

Issue Tracking Systems MASARYK UNIVERSITY FACULTY OF INFORMATICS Issue Tracking Systems DIPLOMA THESIS Jiří Janák Brno, spring 2009 Declaration I, hereby declare that this paper is my original authorial work, which I have worked

More information

VARIABILITY is commonly understood as the ability of a

VARIABILITY is commonly understood as the ability of a 282 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 3, MARCH 2014 Variability in Software Systems A Systematic Literature Review Matthias Galster, Danny Weyns, Dan Tofan, Bartosz Michalik, and

More information


SOCIAL AND ETHICAL DIMENSIONS OF ELECTRONIC MEDICAL RECORDS SOCIAL AND ETHICAL DIMENSIONS OF ELECTRONIC MEDICAL RECORDS A Paper Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Phani Ganga Bhavani Tirupathi

More information

Generic Statistical Business Process Model

Generic Statistical Business Process Model Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Generic Statistical Business Process Model Version 4.0 April 2009 Prepared by the UNECE Secretariat 1 I. Background 1. The Joint UNECE

More information

Integrating Conventional ERP System with Cloud Services

Integrating Conventional ERP System with Cloud Services 1 Integrating Conventional ERP System with Cloud Services From the Perspective of Cloud Service Type Shi Jia Department of Computer and Systems Sciences Degree subject (EMIS) Degree project at the master

More information

Stakeholder Relationship Management for Software Projects

Stakeholder Relationship Management for Software Projects Stakeholder Relationship Management for Software Projects BY FRANCESCO MARCONI B.S., Politecnico di Milano, Milan, Italy, 2010 M.S., Politecnico di Milano, Milan, Italy, 2013 THESIS Submitted as partial

More information

Improving Contact Center Demonstrations at Cisco Systems

Improving Contact Center Demonstrations at Cisco Systems Project Number: MIS-DMS-0802 Improving Contact Center Demonstrations at Cisco Systems A Major Qualifying Project Report submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment

More information

The Greenfoot Programming Environment

The Greenfoot Programming Environment The Greenfoot Programming Environment MICHAEL KÖLLING University of Kent Greenfoot is an educational integrated development environment aimed at learning and teaching programming. It is aimed at a target

More information

HP Performance Engineering

HP Performance Engineering HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: 201 Software Release Date: 2014 Legal Notices Warranty

More information

Final Report. DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing

Final Report. DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing Final Report DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing 5/30/2003 Peggy Lindner 1, Thomas Beisel 1, Michael M. Resch 1, Toshiyuki Imamura 2, Roger Menday 3, Philipp Wieder

More information

Risk and Reward with Intelligence Technology. Fuld & Company Intelligence Software Report 2006-2007

Risk and Reward with Intelligence Technology. Fuld & Company Intelligence Software Report 2006-2007 Risk and Reward with Intelligence Technology Those acquiring competitive intelligence technology often find themselves under intense pressure from superiors to begin producing results quickly. Technology

More information


INTERPLAY OF CONTENT AND CONTEXT INTERPLAY OF CONTENT AND CONTEXT RUDI BELOTTI, CORSIN DECURTINS, MICHAEL GROSSNIKLAUS MOIRA C. NORRIE, ALEXIOS PALINGINIS Institute for Information Systems, ETH Zurich 8092 Zurich, Switzerland {belotti,decurtins,grossniklaus,norrie,palinginis}@inf.ethz.ch

More information

ediscovery in digital forensic investigations

ediscovery in digital forensic investigations overy in digital forensic investigations D Lawton R Stacey G Dodd (Metropolitan Police Service) September 2014 CAST Publication Number 32/14 overy in digital forensic investigations Contents 1 Summary...

More information


MASTER DEGREE FINAL PROJECT MASTER DEGREE FINAL PROJECT TITLE: Creation, distribution and social data gathering by an application on Facebook. MASTER: Master in Science in Telecommunication Engineering & Management (MASTEAM) AUTHOR:

More information

Knowledge Management in an IT-Help Desk environment

Knowledge Management in an IT-Help Desk environment Viðskipta- og raunvísindasvið School of Business and Science Knowledge Management in an IT-Help Desk environment Final Year Project 2010 Gunnar Ingi Ómarsson Institutionen för kommunikation och information

More information

Authors: Raouf Hamzaoui (DMU), Shakeel Ahmad (DMU), Jonathan Freeman (i2 media), Avihai Cohen (KAL), Israel Geron (Exent)

Authors: Raouf Hamzaoui (DMU), Shakeel Ahmad (DMU), Jonathan Freeman (i2 media), Avihai Cohen (KAL), Israel Geron (Exent) CNG ICT-248175 Deliverable D6.4 Best Practices Report Authors: Raouf Hamzaoui (DMU), Shakeel Ahmad (DMU), Jonathan Freeman (i2 media), Avihai Cohen (KAL), Israel Geron (Exent) Version: Final Date: 22/11/12

More information

HP Performance Engineering Best Practices Series

HP Performance Engineering Best Practices Series HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: May 2009 Software Release Date: May 2009 Legal Notices

More information

Council of Nova Scotia Archives. Archives Management Software Review Report

Council of Nova Scotia Archives. Archives Management Software Review Report Council of va Scotia Archives Archives Management Software Review Report Prepared By Amanda Stevens, MLIS For the ArchWay Committee, Council of va Scotia Archives September 2008 Table of Contents I. Executive

More information


TASK-BASED USER INTERFACE DESIGN. Martijn van Welie TASK-BASED USER INTERFACE DESIGN Martijn van Welie SIKS Dissertation Series No. 2001-6. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Graduate School for

More information



More information

Introducing Text Analytics as a Graduate Business School Course. Executive Summary

Introducing Text Analytics as a Graduate Business School Course. Executive Summary Journal of Information Technology Education Volume 10, 2011 Innovations in Practice Introducing Text Analytics as a Graduate Business School Course Theresa M. Edgington Baylor University, Waco, Texas,

More information

A Process for COTS Software Product Evaluation

A Process for COTS Software Product Evaluation A Process for COTS Software Product Evaluation Santiago Comella-Dorda John Dean Grace Lewis Edwin Morris Patricia Oberndorf Erin Harper July 2004 TECHNICAL REPORT ESC-TR-2003-017 Pittsburgh, PA 15213-3890

More information

A G X S W H I T E P A P E R. A Primer An Introduction to Electronic Data Interchange

A G X S W H I T E P A P E R. A Primer An Introduction to Electronic Data Interchange A G X S W H I T E P A P E R A Primer An Introduction to Electronic Data Interchange Table of Contents Section 1 A Definition of EDI...3 What is EDI?...3 Traditional Document Exchange...3 The EDI Alternative...4

More information