Automated Recording of Lectures using the Microsoft Kinect Daniel Sailer 1, Karin Weiß 2, Manuel Braun 3, Wilhelm Büchner Hochschule Ostendstraße 3 64319 Pfungstadt, Germany 1 info@daniel-sailer.de 2 weisswieschwarz@gmx.net 3 mb.braun2@gmx.de Abstract: An automated camera system was developed to provide affordable recordings for large quantities of lectures. To achieve this the requirements were analyzed, current technologies were reviewed with respect to their applicability and a working hardware and software prototype was developed. The resulting system uses the Microsoft Kinect to detect the movements of a lecturer and follows him with a rotatable camera. It can be operated without additional personnel and gives the lecturer more freedom of movement than a stationary camera. Using this system enables universities to capture and share many of their lectures in an affordable manner. Thus the students attending traditional courses as well as E-Learning courses can access the lectures at all times. 1 The problem to solve Capturing a good record of a lecture is useful for a variety of purposes. Video recordings can be used for the improvement of existing degree programs as well as for the development of new styles of teaching. Students can use the recordings to perform follow-up course work, catch up on missed lectures or watch additional lectures to deepen their understanding. Lecturers can use the videos to get an impression of how they are perceived by their students or to improve their presentations. The efficient use of recorded material is highly dependent on the amount and quality of the recorded lectures that are available. It is therefore crucial for a university to be able to produce recordings of many of its lectures.
There are two ways to record a lecture that are common practice. The first way is to have a stationary camera set up in the room which points at the area where the lecturer is holding his speech. This approach is inexpensive as the lecturer himself can start and end the recording and there is no need for additional personnel. The disadvantage with this approach is a very restricted area of movement for the lecturer if the camera is adjusted to capture only the lecturer. He can t move around freely but has to be careful at all times not to step outside the area of recording. If the camera is adjusted to capture a larger area of the room the lecturer will be depicted to small. The other common approach is to have one or more assistants capture the lecture on camera. This provides the lecturer with a greater freedom of movement since the assistant can follow him with the camera. However the necessary expenses are quite high because at least one assistant has to be present during the whole lecture. Also lectures that take place at the same time can only be recorded with additional personnel. 2 The idea To allow for an affordable way to capture lectures without forcing the lecturer to stay in a narrow area we developed the following scenario: A stationary mounted camera is operated by a motion detection system to automatically follow the movements of the lecturer (see figure1). Figure 1: Setup for an automated recording system
Using such an automated system a lecture can be recorded in an affordable way without restricting the freedom of movement for the lecturer too much. That way the advantages of both scenarios can be combined (see figure 2). Figure 2: Combination of the advantages from both approaches 3 Available methods for motion detection To detect the motions of a person there are currently several methods available. Optical markers, audio analysis or picture analysis by software can be used to directly detect the current position of a person. Detectors can be used to obtain data about its movements and derive the current position. Using markers multiple tracker targets are positioned on a moving object and tracked by a camera. These tracker targets can be active (e.g. lightemitting) or passive (recognizable shapes or patterns). With audio detection an array of microphones uses different audio delays to calculate the position of an audio source. Changes in the position are registered as a movement. Picture analysis by software can be used to detect movements by comparing several picture frames against each other. The software is searching for changes in the pictures that can result from a moving object. These changes are used to detect moving people in the different frames. Using a suit with built-in detectors the movements of a person can be detected directly. These detectors may for example be gyroscopes that detect changes in orientation or sensors that measure movements of the person s joints.
In addition to these single source methods there are also hybrid systems available that combine multiple sources to detect motions. 4 Choosing an appropriate method A suitable method for the automated recording of lectures has to fulfill certain requirements. The most important of these requirements are: Reliable recognition of movement Low cost Easy utilisation Few limitations for the lecturer (freedom of movement, clothing, etc.) Good results in different settings (room size, light etc.) Provide suitable output for an easy and standardized post processing Markers and Detectors share the disadvantages of being visible in the recording and requiring the lecturer to put on special equipment first. Detectors also restrict the movement of the lecturer through the detector suit and the necessary wiring. Picture analysis by software isn t working well in scenarios with insufficient lighting. An implementation that would provide good results in most lighting situations would be very difficult to achieve. Audio detection is not reliable for lectures in front of an audience since sounds from the audience will prohibit the analysis of the audio delays from the lecturer. Hybrid systems achieve reliable motion detection in most scenarios through the combination of different detection technologies. 5 Implementation For our implementation we choose the Microsoft Kinect because its hybrid technology 1 enables reliable motion detection in most settings, it is not very 1 The Kinect is equipped with an infrared sensor, a microphone array and a camera. To recognize people in its sensor reach the Kinect projects a field of infrared light spots into the room. These light spots, called structured light, are arranged in a certain pattern and get reflected by the scene in front of the camera. The reflection is detected by the infrared sensor and used as the main input for the motion detection. In addition to the infrared sensor the Kinect can use a microphone array and an inbuilt camera as supplemental data-sources.[s.f. PRI11]
expensive to purchase and it provides good libraries that enable the use of the sensor in customized software. 5.1 Components of the system Our system consists of the following components (see figure 3). It is using standard hardware, freely available software libraries 2 and our own software implementation. The Kinect Sensor is used to capture the motion data and the Logitech webcam is rotated accordingly to the movements of the lecturer and does the recording. Figure 3: Components of the recording system The system can be mounted on a tripod to capture lectures in different locations or installed stationary in a lecture room. 5.2 Functionality The sensor, software and camera work together in the following way: The Kinect collects positioning data of the scene in front of it ( in figure 4). 2 NUI Library, DirectX Library, Logitech Driver
Using this data it extracts information about the position of people in its sensor reach and passes them on to the application. The application can access this information 3 using the software development kit provided by the Kinect. The software development kit and the application are both.net based and written in the C# programming language. The application calculates the necessary changes in the camera angle and regulates the position of the camera using the Logitech Control Library,. Figure 4: Interaction between the components As a result the system is able to automatically react to changes in the position of the person in front of it. As long as the person stays within a certain range the camera can reliably follow its movements. 6 Conclusions The system fulfills the intended task. It provides motion detection and is able to adjust a camera to changes in the lecturer s position. It can be used without much preparation and without additional personnel. To capture a 3 The Kinect can detect the position of up to 6 people in its sensor range. The Kinect tracks two persons as active and up to four additional persons as passive. For active persons the Kinect analyzes and provides positioning information for 20 reference points on the human body, thus providing information about their current movement as well as their posture. For passive persons the Kinect provides much less information. Our system is processing the positioning data of one (active) person.
lecture the system simply has to be put in front of the lecturer (see figure 5) and has to be started. Figure 5: The Kinect sensor and camera of the system mounted on a tripod Limitations of the systems are the tracking of more than one person, or of persons that are beyond the range of the sensor. Also the system doesn t provide the ability to zoom and is restricted to the use of the Logitech webcam. 7 Next steps For the near future we are planning the following enhancements for our system. To simplify the further development of the application we will modularize it into logic components. That way the system will be easier to extend and parallel development by different developers will be possible. To become independent from the Logitech webcam we will develop a hardware platform that can be steered directly. As a result there will be no need for a communication interface between the system and the camera and any camera can be put on the platform and used for the recording. We will also improve the camerawork by developing an artificial intelligence system that emulates the behavior of a human cameraman. This system will be tailored to allow for more advanced camera actions like zooming or choosing an optimal image section. For the more distant future other possibilities for the enhancement of the system can be thought of. The system could allow the lecturer to change the way the camera does the recording or enable him to manipulate the
recording directly. Another possibility would be to include some postprocessing in the system. That way it could deliver a complete ready-to-use video file of a lecture without any additional steps like adding the lecture title or converting the video into a convenient format. 8 References [BRE98] Breig, Marcus; Kohler, Markus (1998): Motion detection and tracking under constraint of pan tilt cameras for vision based human computer interaction, Faculty of Information Technology, University Dortmund. [KRI02] Krieger, Thomas P. U. (2002): Innovative Sensorkonzepte und Signalverarbeitungsstrategien zur Bewegungserkennung und Präsenzkontrolle von Personen. (In german) [MS11] Microsoft, Documentation of the official Microsoft Kinect SDK Beta, http://research.microsoft.com/enus/um/redmond/projects/ kinectsdk/docs/programmingguide_kinectsdk.pdf (12.09.11) [PRI11] Background report of the company Prime Sense (Hardwaredeveloper of the Kinect) http://www.joystiq.com/2010/06/19/kinect-how-it-works-from-thecompany-behind-the-tech/ (22.09.11)