Improving Bicycle Safety through Automated Real-Time Vehicle Detection

Transcription

1 Improving Bicycle Safety through Automated Real-Time Vehicle Detection Stephen Smaldone, Chetan Tonde, Vancheswaran K. Ananthanarayanan, Ahmed Elgammal, and Liviu Iftode {smaldone, cjtonde, vanchi, elgammal, Technical Report DCS-TR-665 Department of Computer Science Rutgers University 110 Frelinghuysen Rd, Piscataway, NJ August 2010 Abstract The manner in which people use bicycles has changed very little since their invention in In that time, though, roadways have become congested with a dramatically less environmentally friendly mode of transportation: automobiles. These vehicles and the motorists who drive them represent, at times, a serious threat to the safety of both road cycling enthusiasts and bicycle commuters alike. Since bikers typically ride with the flow of traffic, the most dangerous situation for them is when they are being passed by a motorist from behind. As a result, a biker must spend a substantial amount of her cognitive and physical ability to periodically scan for rear-approaching vehicles, reducing her capacity to handle the bicycle safely and maintain continual awareness for both the forward and rearward situations. To improve road cycling safety, we present a system that augments a standard bicycle with audio and video sensing, and computational capabilities. This Cyber-Physical bicycle system continuously senses the environment behind a biker, processes the sensed data utilizing audio processing and computer vision techniques, automatically detects the occurrences of rear-approaching vehicles, and alerts the biker in real time prior to the encounter. In this paper we present (i) the design of our prototype Cyber- Physical bicycle system and (ii) the results of our evaluation using video and audio traces collected from bikers. These results demonstrate both the feasibility of the system, exhibiting a high degree of detection accuracy while operating under the real-time and energy constraints of the problem scenario. 1 Introduction Since their invention in 1817 [2], bicycles have proven to be a healthy and environmentally friendly mode of transportation for both enthusiasts and commuters alike. Although the bicycle has remained ubiquitous over time, the world has changed dramatically. Today, US roadways are dominated by automobiles; inefficient, aggressive, modes of human transport. Unfortunately, bikers are considered second-class citizens as they attempt to share roadways with motorists [30, 3]. In fact, this has been the situation for most of the lifetime of the bicycle. In 1896, the first automobile accident ever to occur in the U.S. took place in New York City between an automobile and a bicycle, and proved fatal for the cyclist [4]. According to a more recent report (2007) in the U.S., over 700 bicyclists die annually in accidents with automobiles, while there are over 44,000 annually reported cases of injuries due to bicycle-automobile accidents [5]. A key limiting factor for modern-day bikers is roadway safety. This is primarily due to the inherent unbalanced burden of risk that is placed on a biker during cyclist-motorist encounters. As will become evident in Section 2, existing laws and approaches towards biker safety (e.g., bike paths) are inadequate 1

2 solutions. At best, laws proscribe remedies for incidents after-the-fact, and bicycle paths are only as good as the limited coverage they provide. Due to the unbalanced nature of the risk during cyclist-motorist encounters, a biker is forced to focus a substantial portion of her cognitive and physical capabilities on the task of maintaining situational-awareness, by continuously probing for the occurrence of rear-approaching vehicles. What is required is a preventative, biker-centric solution to the problem. In this paper, we present an approach that solves the problem by offloading the low-level cognitive requirements from a biker to her bicycle. To support this approach, we enhance a standard bicycle with sensing and computational capabilities to create a Cyber-Physical bicycle system. The core goal of this system is to provide accurate and timely detection of rear-approaching vehicles to alert the biker of the pending encounter, through the cross-cutting application of mobile sensing, computer vision, and audio processing techniques. Our goal is to allow a bicycle to maintain situational-awareness for a biker and provide updates to her as relevant changes occur that potentially impact her safety. To the best of our knowledge, our system is the first to equip bicycles with sensors for the purpose of improving road cycling safety. We consider this an important problem domain, as roadway safety is a key limiting factor in the adoption of bicycling as a viable form of environmentally friendly transportation. The novel contributions of this work are: The design of an automated real time detection system for roadway cycling utilizing a multimodal sensing approach. Our approach augments a bicycle with a camera, microphone, and computational capabilities, employs computer vision and audio processing techniques to detect when a motor vehicle is approaching from behind, and alerts the biker prior to each encounter. A prototype Cyber-Physical bicycle system enhanced with multimodal sensing (rear-facing camera and microphone) and on-board processing capabilities to perform real time approaching vehicle detection. The evaluation of our prototype system using real biker traces, which we collected (over 3 hours of roadway cycling video and audio traces including more than 187 biker-motorist interactions), and real time performance measurements. The results of this evaluation demonstrate the feasibility of this system, which exhibits a high degree of accuracy while continuously operating within the real-time and energy constraints of the problem scenario. In the remaining sections of this paper, we review the background of this problem in the context of existing real-world solutions in Section 2, then we provide an overview of our solution in Section 3. Following this, we present the design of our system in Section 4, while Section 5 presents the results of our evaluation. We discuss a number of open issues and future work in Section 6. Finally, we review the related work in Section 7 and present our conclusions in Section 8. 2 Background and Motivation In this Section, we review the broad-range of solutions that have attempted to improve biker safety. These solutions include legal and infrastructure approaches. In summary, a quick review of biker fatality and injury statistics shows consistent non-improvement [4, 5]. Clearly, none of these approaches, either in isolation or taken as a whole, has improved biker safety very much. Finally, we focus in on what we believe to be the core problem that still must be addressed. Legal Approaches. Laws have been enacted in most states to force children to wear helmets while riding bikes, but they do not apply to adult bikers. Statistics show that the average age for cyclist fatalities is 40 years and is 30 years for cyclist injuries [5]. Although a helmet is likely to be effective for simple falls, it is unclear what protection they provide during an accident involving a motor vehicle. More recently, some states have even enacted laws to impose a three foot safe passing limit on motorists passing bikers [6]. Unfortunately, laws do little to prevent accidents due to insufficient enforcement and can only proscribe after-the-fact remedies to accidents. According to statistical studies, three out of four at-fault drivers are not even cited for hitting and killing cyclists, and 22% of fatal accidents involved hit-and-run drivers, who were never even found or charged. For example, in New York City, of the 92% of drivers who were at-fault for killing a cyclist, 74% did not even receive a traffic citation [22]. In short, laws may help penalize offenders, if properly enforced, but they provide little actual preventative protection for the biker. Infrastructure Approaches. Certain cities, for example, Amsterdam in The Netherlands, and Portland, OR, in the U.S., have built extensive networks of bicycle lanes [1] to promote safe cycling. Retrofitting 2

3 A normal bicycle augmented with sensors (video and audio), CPU, wireless networking, and GPS to create a Cyber-Physical bicycle system to detect rear-approaching motor vehicles. Alerts and related data collected by the system are transmitted to a centralized service where they are logged and stored. Figure 1: Cyber-Physical Bicycle. existing roadways for bike lanes, is difficult and costly. Therefore, adoption of such infrastructure change is slow due to public inertia on the issue and funding competition from other more popular public projects. In less bicycle-friendly cities, or suburban areas, infrastructure coverage is much less consistent, providing little safety improvements for bikers in those areas. Oddly, the arguably more bicycle-hostile areas are those that also reject the bicycle lane idea, while the very areas where road cycling is well accepted provide the additional safety of bicycle lanes. In the end, bicycle lanes are only as good as the coverage they provide, require a strong public commitment to install and maintain, and enforced legislation to ensure they are not improperly utilized (e.g., illegally parked cars). The Problem: Cognitive Overload. Since bikers typically ride with the flow of traffic, one of the more dangerous situations for them is when they are being passed from behind by a motor vehicle. To predict the occurrence of these situations, a biker must spend a substantial amount of her cognitive and physical ability to periodically scan for rear-approaching vehicles, reducing her capacity to handle the bicycle safely and maintain continual awareness for both the forward and rearward situations. In fact, when riding in groups along the side of roadways, bikers will commonly call forward to alert the members of their group about a rear-approaching vehicle, as a natural way to share the cognitive load. Accompanying the cognitive aspects are the physical requirements to maintaining situational awareness. To detect the presence of a vehicle a biker must look behind herself. This simple motion has two profound effects. First, by diverting her attention to the back, a biker loses the ability to track the roadway in front. This means that any approaching roadway hazard will be unnoticed by the biker during that period. Second, the physical act of looking back naturally causes a biker to drift to the side (i.e., either into the roadway or into the shoulder) due to the dynamics of the complex forces that act on a moving bicycle. In an attempt to reduce the cognitive and physical effects of looking back, many bikers employ rearview mirrors. Unlike the mirrors found in a car, bicycle mirrors are either handle-bar or helmet mounted. In either form they can be very distracting, often do not provide a broad enough range of view behind the cyclist, or a consistently good view throughout the active range of frequently changing cyclist riding positions. Regardless, the effects of periodically scanning a rear-view mirror are little better than directly scanning behind. More recently, products such as Cerevellum [9] provide a video-based rear-view mirror 3

4 solution. Aside from providing a continuous view of the situation behind a biker, it does not actually detect approaching motor vehicles. Finally, various ways to alert motorists to the presence of a biker have been developed. These include bike reflectors, flashing lights, and reflective clothing. The goal of this approach is to make the biker more conspicuous to the motorist. Although they represent preventative measures, they do little to warn an unprepared biker to the presence of an approaching vehicle. Recently proposed, LightLane [11] projects the image of a bike path around a bicycle on the roadway. The idea is to provide a bicycle lane that adapts to a biker s behavior, by following them on the roadway, illustrating for motorists the safe passing distance. Although this concept goes farther than other visible cues for motorists, it still does not provide any notification to a biker regarding the presence of an approaching vehicle. 3 Overview To illustrate how our Cyber-Physical bicycle can improve the safety of bikers, we provide a high-level overview of the system. The central element is a normal bicycle augmented with a set of sensors (audio and video) providing multiple modalities of sensing capabilities, compute resources in the form of a bicycle computer, and advanced wireless capabilities (3G, WiFi, and GPS). Figure 1 illustrates the key parts of the Cyber- Physical bicycle system and its core functionality. As illustrated in the figure, a camera and microphone face backward from the bicycle s direction of forward motion. These sensors collect video and audio data samples and stream them to an embedded bicycle computer. Software executing on the computer continuously processes the data streams, in parallel, utilizing computer vision and audio signal processing techniques to perform rear-approaching motor vehicle detection. Further discussion of the detailed system design is deferred until Section 4 of the paper. As a biker, shown in Figure 1, rides along a roadway on his normal daily route, his bicycle maintains situational awareness for him. As a motor vehicle approaches the biker from behind, this occurrence is detected by the bicycle computer and an audio notification is raised to the biker. Even after a notification has been raised, the system continues to track the approaching vehicle to determine the level of threat posed to the biker. To ascertain this, the system encapsulates the biker in a virtual safety zone, which is a three foot perimeter around the bicycle. Any motor vehicle that crosses the threshold of this perimeter is considered to have violated the safety zone of the biker, and the system considers this to be an unsafe interaction. The virtual safety zone is visually depicted in Figure 6. Whenever an unsafe interaction occurs, the Cyber-Physical bicycle system performs three actions. First, it produces an audible warning to notify the biker. This warning is distinct from the early notification produced when the vehicle is first detected. Second, an image of the offending vehicle is collected and transferred to a server at a centralized location. This is stored along with the location coordinates of the encounter, for future reference by the biker. Third, the encounter is logged at the centralized service and aggregated with the unsafe encounters of all other users. The purpose of this third action is to build aggregate safety statistics for roadways frequented by bikers, to be used as a safety metric in safe route planning. 4 Cyber-Physical Bicycle Design In this section, we describe the design of our Cyber-Physical bicycle system. The goals of our design are threefold. First, the system must detect and track vehicles that approach from behind. Second, the system must be able to alert a biker in real time, when such an alert is still useful. Third, the system should distinguish between vehicles that approach a biker in both safe and unsafe manners. There are a number of significant challenges that must be overcome when building such a system to automatically detect rear-approaching vehicles. They are: Limited Resources. Detecting approaching vehicles is a computationally intensive process. This is magnified by the real time (latency sensitive) requirements of the system. However, unlike enginepowered vehicles, bicycles have very limited available power generation capabilities. This along with obvious weight restrictions places a serious limitation on the computational resources that a bicycle can be equipped to carry for this purpose. 4

5 (a) Base Image - Approach. (b) Optical Flow - Approach. (c) Base Image - Depart. (d) Optical Flow - Depart. The four images are: (a) a rear-approaching vehicle, (b) optical flow of the rear-approaching vehicle in (a), (c) a rear-departing vehicle, and (d) optical flow of the rear-departing vehicle in (c). Figure 2: Comparative Optical Flows of Approaching and Departing Cars. Platform Instability. As a moving platform, a bicycle is subject to a substantial amount of vibrational motion due to roadway conditions, as well as any rapid changes in directions caused by the biker or environment around her (wind, roadway surface, interactions with other bikers, etc). Therefore, even with commonly available image stabilization technologies, the resulting video stream obtained from a bicycle-mounted camera is subject to large amounts of jitter, sudden jarring vibrations, and rapid unpredictable changes in orientation. Approaching Vehicle Directionality. Vehicles may approach a biker from both the rear and the front. For vehicle detection to be useful, distinguishing between these two cases is critical. Otherwise, an alert will be generated by the system for each vehicle encounter, regardless of the relative directions of motion, greatly reducing the effectiveness of the system. 4.1 Video-Based Detection Since approaching vehicles travel at relatively high speeds compared to bicycles, it is necessary to be able to detect them as early as possible, with high accuracy. Although we would like to leverage existing automobile driver assistance systems, which can detect the presence of vehicles in a driver s blind spot, we cannot. These systems rely on the fact that nearby vehicles are prominent in the sensed image s field of view (FOV). To provide ample notification time to a biker, the Cyber-Physical bicycle system must be able to detect approaching vehicles while they are still very small (as small as 2% of the FOV). In this subsection of the paper, we describe the design of the video-based sensing and detection subsystem. To inform the components of this design, we make two observations. First, as a bicycle moves down a roadway, all stationary objects behind the bicycle will appear to recede. Therefore, a vehicle approaching from the rear will move counter to the motion of all other objects in the FOV. This visual cue is leveraged in the Optical Flow Analysis component (Section 4.1.1). The second observation we make is that traffic on roadways follows a predictable pattern. Since bikers ride with the flow of traffic, a rear-approaching vehicle will always appear in the same roadway lane as the biker. Therefore, by identifying the natural segmentation of a roadway, we can reduce the image area (FOV) that must be analyzed, focusing on the only areas of the FOV where a rear-approaching vehicle is likely to appear. This visual cue is leveraged in the Roadway Segmentation Analysis component (Section 4.1.2). Together, these two visual cues provide the necessary information for the video-based detection subsystem to reason about approaching vehicles and decide whether they present a hazardous condition to the 5

6 Feature Extraction Classification Audio model Integration Alert Road and Lane detection Car detection Optical flow Tracking Optical Flow model Approaching/ Departing Cars This block diagram presents the video, audio, and combined detection subsystems for the Cyber-Physical bicycle system. Figure 3: Cyber-Physical Bicycle System Diagram. This figure presents a graphical depiction of the optical flow trajectory for approaching and departing vehicles in u-v space. Red indicates high approaching relative speed, while blue indicates high departing relative speed. Figure 4: Optical Flow Trajectory. biker. Additionally, the video-based subsystem performs real time tracking of any detected rear-approaching vehicles, and classifies each as either a safe or unsafe approach. Figure 3 shows a block diagram of the components for the video processing subsystem (in blue), as well as the overall subsystem organization Optical Flow Analysis Optical flow is the pattern of apparent motion of objects in a visual scene caused by the relative motion between an observer and the object [21]. For any pixel in an image, a two dimensional motion vector describing that pixel s relative motion can be computed using two or more consecutive frames. This can be computed at each pixel, which results in dense flow; or at certain pixels such as edge pixels, which results in sparse flow. There have been many techniques developed for computing optical flow from images, and are used extensively in motion estimation and for video compression, for example in MPEG encoding [16]. Since computing the optical flow is a very computationally intensive task, the inherently parallel nature of the computation can be exploited to accelerate it through the use of commodity Graphics Processing Units (GPUs). As discussed earlier, the motion of a rear-approaching vehicle is expected to be opposite to that of 6

7 A graphical representation of roadway segmentation depicting the boundaries that are determined by the video subsystem automatically for each video frame. Figure 5: Roadway Segmentation. everything else in the FOV. This allows the approaching vehicle to be distinguished from all other objects in the camera s FOV. Furthermore, this includes discriminating between approaching and departing vehicles. Figure 2 shows an example of optical flow computed from images taken from video captured by a rearfacing camera mounted on the back of a moving bicycle. In the figure, there are four images: (a) a rearapproaching vehicle, (b) optical flow of the rear-approaching vehicle in (a), (c) a rear-departing vehicle, and (d) optical flow of the rear-departing vehicle in (c). Red pixels are moving towards the biker, while blue pixels are moving away. The stronger the color (i.e., more red or blue) indicates a faster relative speed. From the figure, we observe that an approaching car exhibits a very distinct relative motion pattern compared to the rest of the scene. It is a dominant red spot in the middle of a bluish scene. The reason for the background variation in color (from blue to purple) is largely due to the effects of the side-to-side motion of the bicycle while being actively ridden. Figure 4 shows the typical trajectories for approaching and departing vehicles, as projected in optical flow space. In the figure, the two axes represent the horizontal and vertical motions and optical flow is color coded as before [15]. The trajectory data plotted in this figure is of the approaching and departing cars in Figure 2. The optical flow trajectory for the approaching car starts from the origin where the car is at its farthest visible distance, where its speed is virtually indiscernible from the image. For the departing car, the optical flow trajectory converges to the origin, at which point this car s speed becomes indiscernible in the image due to distance. Clearly, optical flow is only useful when a vehicle is close enough to the bicycle such that it shows distinct relative motion. As a vehicle moves farther away from the bicycle, the signal-to-noise ratio in the optical flow calculation (for the car s image pixels) reduces, ultimately becoming dominated by the effects of the motions of the biker on the image quality (e.g., jitter) Roadway Segmentation Analysis To reduce the computational requirements of vehicle detection, the video-based detection subsystem segments the image of the roadway based upon the existing visible natural boundaries. The roadway is segmented from the rest of the scene based on its color appearance using a statistical color model, learned for the road color distribution. To facilitate this, a region close to the bicycle is chosen and used as a seed for frame by frame model generation. Then, this roadway color model is utilized to perform road segmentation on the rest of the scene. Figure 5 shows an example of this. In the figure, three boundaries are highlighted to illustrate the roadway segmentation concept. The upper horizontal blue line identifies the horizon boundary, where the roadway meets the non-roadway portion of the image above it. The painted center lines on the roadway are also discovered. In the figure, this discovery is highlighted by the blue and red points overlaying the double yellow center line. The lower blue line splits 7

8 (a) Safe Vehicle Approach. (b) Unsafe Vehicle Approach. (c) Opposite Side Vehicle Approach. The three images are: (a) a safe approach by a vehicle that does not cross into the safety zone, (b) an unsafe approaching vehicle that crosses into the safety zone, and (c) a vehicle passing on the opposite lane. Figure 6: Safety Zone. the image horizontally at the point where the center (double yellow) lines intersect the left border of the image. Finally, we also determine the vanishing point of the roadway using lines formed by edges present in the image which is denoted by the red cross with a circle. Together, the three boundaries and the vanishing point outline a bounding box where any valid rearapproaching vehicle detections are expected to appear. Therefore, more expensive detection computation is reduced by restricting to this region Vehicle Tracking Once detected, the system needs to track approaching vehicles to be able to determine when an alert should occur and also resolve multiple detections. Figure 6 shows examples of vehicle tracking for approaching vehicles (red box on vehicles) and correct handling of departing vehicles (no red box). Multiple car passes at the same time may also occur, which need to be taken care of by the tracking framework. Many techniques exist for tracking moving objects based on their appearance, shape, and feature points. We use the appearance-based approach as it is more robust to noise from camera jitter. We use the method proposed in [32], which uses Principle Component Analysis (PCA) based appearance model to track the detected car in affine subspace. The method automatically learns and updates the appearance model starting from its first detection. This method works well for tracking objects with slight pose, scale, and illumination changes. In our case the appearance, pose, and scale of the car change rapidly, as it comes closer due to the strong perspective effect. To handle this, we modified the above approach to be adaptive by updating the model parameters using online-based successive detection. Multiple cars can be tracked simultaneously by running multiple tracks for each detected car, and detection ambiguities are resolved by doing a linear assignment based on the location and appearance similarity of the detected blob. Finally, all tracks are managed based on their occurrence location and appearance, i.e., the track is valid car track if and only if we have detection close to previous location in next frame, or the track is deleted when car leaves the scene Safety Zone As each tracked vehicle approaches a biker, the Cyber-Physical bicycle system calculates the distance between the vehicle and the biker. The system maintains a configurable boundary (called the safety zone) around the bicycle, and utilizes this boundary as the determining threshold between a safe pass and an unsafe pass. In both cases, the biker is alerted to the presence of the approaching vehicle. In the latter case, an additional alert is raised to warn the biker of the more dangerous encounter. This is demonstrated by the three images shown in Figure 6. Figure 6(a) shows the safe pass scenario. In the image, the yellow and white grid delineates the safety zone boundary. Since the vehicle in the image is passing the biker outside of the safety zone, a yellow warning is signaled in the image (yellow box in the upper right corner). This is equivalent to the audible notification that a biker would receive in the scenario to warn about the safely approaching vehicle. Alternately, in 8

9 Audio Feature Symbol Domain Description Spectral Entropy SPE Freq The entropy of the spectrum. Falls off sharply when there is a vehicle approaching. Spectral Centroid SPC Freq Weighted center of the spectrum. Increases when there is a vehicle approaching. Root-Mean- RMS Time The per-frame audio energy. Used to drop frames to ignore irrelevant Square Amplitude sounds, e.g., a parked bike. Vehicle approaches are often louder than other audio frames. Zero Crossing ZCR Time Number of times the audio signal crosses zero within a frame. Typically Rate higher for random noise. Spectral Rolloff SPR Freq Frequency bin below which 91% of the signal energy is contained. Increases slightly during vehicle approaches. Spectral Flux SPF Freq Relative change in the frequency weighted by magnitude. Exhibits a small increase at the onset of a vehicle approach. Table 1: Audio Feature Descriptions. Figure 6(b), we see an example of an unsafe car pass. In this image, the car crosses the safety zone boundary and a red alert is visibly raised by the system. This box is meant to represent an audible alert raised to the biker to notify the unsafely approaching vehicle. 4.2 Audio-Based Detection In addition to video-based approaching vehicle detection, the Cyber-Physical bicycle also performs audiobased sensing and analysis to detect the presence of any rear-approaching vehicles. To sample sound from behind a biker, the bicycle is equipped with a rear-facing audio sensor (microphone), which continuously collects the ambient sound. To inform this system, we make two observations. First, vehicles are clearly audible to a biker over the background wind noise, when she turns her head. Therefore, a wind shielded audio sensor should be able to detect vehicular sound. Second, since sound from a vehicle is directional, the audio-based system should be able to discriminate between rear and front-approaching vehicles. Furthermore, a vehicle that approaches from the rear will cause a longer sound since it will take more time to pass a biker than one approaching from the front in the opposite traffic lane. Figure 3 shows the block diagram for the audio processing subcomponents in green Audio Feature Extraction Audio is continuously captured from the rear-facing microphone. The resulting stream is then broken into fixed-sized frames for processing. We apply a hanning window to each frame and then passed to the feature extraction processor. A number of features is extracted from each frame to form a feature vector. Some features are time domain features, while other are drawn from the frequency domain. Together, they represent a number of characteristic audio features (commonly utilized audio features are described in [26, 28, 33]). Table 1 provides a description for each audio feature used in the audio-based detection subsystem. To calculate the frequency-domain features, we first must transform the signal using a Fourier Transform (in our case we choose a standard Fast Fourier Transform for this). SPE, SPC, and ZCR all show a strong correlation with the sound of a rear-approaching vehicle, while the rest of the features reinforce classification accuracy. We also utilize the first-order time differentials for these functions in classification to take advantage of the temporal nature of vehicle approaches. Figure 7 shows an example of audio feature behavior for two scenarios, one for a rear-approaching vehicle and one when no vehicle is present. From the figure, we can clearly observe a substantial difference in the audio signal feature response to each scenario. For the rear-approaching vehicle case, there is a rise in the SPC between frames 40 and 120. This directly corresponds to the time frame when the vehicle approaches and passes the biker. These results are indicative of the other audio features utilized, and we do not present other feature results for brevity. 9

10 Vehicle Present (Behind) No Vehicle Present Spectral Centroid Frame Number Plot of Spectral Centriod (SPC) feature over time. The two trends show the audio feature measurement for the vehicle present and no vehicle present situations Detection Model Creation Figure 7: Audio Feature Discrimination. Once all of the relevant audio features have been extracted, they are utilized to build a classifier to detect approaching motor vehicles. To build the classifier, we randomly select an equal number of audio feature vector instances from a set of annotated vectors. Each vector in the training set is annotated to place it into one of three classes: (i) front-approach, (ii) rear-approach, and (iii) no approach. For our classifier, we use a decision tree, due to its relative simplicity, classification speed, and accuracy. The classifier model is built offline and then used by the real-time detection algorithm to determine the classes of current audio samples. Then, the real-time portion of the system continuously samples frames, extracts features, and classifies them according to this preconstructed classifier model. To reduce the variability (noise) of the output of this classification due to false positives and negatives of individual frames, the results of classification are fed into a higher layer that performs further discrimination Higher-Order Discrimination This layer receives a stream of classifier results from the layer below. To reduce the effects of incorrectly classified individual frames, the results are smoothed using window-based moving averaging. This removes small fluctuations in results due to noise in either the original data stream or classifier results, thereby improving the overall accuracy of the detector. Once a detection has been determined at this layer, an alert is generated, similar to the camera-based portion of the system, and ultimately propagated to the user. 4.3 Integrated Multi-Modal Detection Finally, since both the video and audio-based systems operate in parallel, the Cyber-Physical bicycle system combines the results of both modalities to further improve the accuracy of detection. This combination allows the system to perform an additional comparison based upon the lower layer results to further filter false positives and catch false negatives by leveraging the diverse characteristics of the different modalities. Figure 3 shows the audio and video integration block (in orange) to produce the final situational determination. We combine both these results in the following way. We construct a binary state vector S n = [A n V n a n v n ] where, a n = current audio prediction [0, 1] v n = current video prediction [0, 1] { n 1 if A n = i=n w a n < low or high 0 otherwise 10

11 { n 1 if V n = i=n w v n < low or high 0 otherwise w = constant We choose w = 5, low = 1 and high = 3 in our case. Now, conditioning on the different cases of S n, a n, and v n we issue an alert. If the system deems the situation to be a threat to the safety of the biker, the appropriate alert is raised. 4.4 Implementation Implementation of video detection and tracking consists of two components: optical flow computation and vehicle tracking. To build the optical flow component, we used the well known, open source optical flow library implementation available from [34]. We implemented this component using C++ and the NVIDIA CUDA library version 2.3, and built it to utilize the GPU to perform image processing. Finally, our tracking code was implemented in MATLAB 7.10 using the base code provided at [32]. The audio pipeline was built as two components, as well. The feature extraction module is implemented in Python 2.6. Audio capture is implemented using the PyAudio module and we use the Python Numeric module to perform the feature and FFT calculations. The feature vector classifier is built using the Weka machine learning toolkit (Version 3.6.2) and is implemented in Java 1.6. Weka is used to train the classifier on a subset of the roadway cycling traces. We use a decision tree classifier (J48) model, and built a Java server to handle the classification tasks. We install and run the Cyber-Physical bicycle system on a HP Mini 311 netbook. We chose this as our platform for development, since the hardware closely matches embedded hardware utilized in various multimedia applications. The specifications of the netbook are: Intel Atom N GHz CPU, 3GB RAM, NVIDIA ION GPU, 80 GB SSD hard disk, and internal 6-cell Li-ion battery. The netbook weighs only 3.26 lbs. The GPU is composed of 16 CUDA cores and 256 MB dedicated graphics memory. 5 Evaluation Our experimental evaluation of the Cyber-Physical bicycle system prototype addresses three questions: How accurate are video and audio-based sensing techniques in detecting rear-approaching vehicles? (Section 5.2) Can detection be performed in real-time? (Section 5.3) What are the power requirements of the system? (Section 5.4) All experiments are executed using our prototype implementation on the hardware specified in Section Roadway Cycling Traces To support repeatable experimentation, we collected about over 3 hours of real roadway cycling traces, amounting to roughly 10 GB in size. To gather these traces, we mounted a rear-facing digital video recorder (Sony Handicam DCR-SX40), which includes a camera and microphone, to an ordinary road bicycle (Trek FX7.5). The camera collects video and audio recordings, while a biker rides the bicycle along a set of typical rural bike routes in central New Jersey. Once collected, the traces are viewed and annotated. We manually annotated every interaction between the biker and a vehicle (both approaching and departing) using the timestamps of each individual trace. In the traces, there are 52 incidents of rear-approaching vehicles and 135 incidents of front-approaching vehicles. The average time that audio is heard for a rear-approach is 2.96 seconds. Similarly, cars can be clearly visible in the video for 3.81 seconds on average. All traces are collected during normal daylight hours. Although we commonly refer to audio and video segments as frames, this term can be ambiguous when applied so broadly. For the purposes of our implementation and evaluation and to disambiguate terms, we define both here. A video frame is defined as a single 3 channel color image with 8-bit depth and a resolution of 80 x 90 pixels. For audio, we define a frame to consist of 200 ms worth of data, with consecutive frames overlapping each other by 50%. Our audio data is sampled as a single channel at Hz with an 8-bit sample size. 11

12 Mode Positives Negatives Video True 18 N/A False 7 1 Audio True 17 N/A False 2 2 Combined True 18 N/A False Accuracy Table 2: Detection Accuracy. Accuracy is a critical metric for our system. Since the key function of the system is to detect rear-approaching vehicles for a biker, the system must be accurate enough to instill confidence. In this section, we evaluate the accuracy of the video and audio-based detectors individually, and then evaluate the combination of them as a single multimodal detector. In all cases, we replay our annotated roadway cycling traces by streaming them through the system as if they are being directly captured from the video and audio sensors. All results in this section are presented as confusion matrices. True positives (T P ) represent the cases where a rear-approaching vehicle is correctly identified. True negatives (T N) represent the cases when the absence of a rear-approaching vehicle is correctly detected. False positives occur when something is misclassified as a rear-approaching vehicle. A false negative occurs when a rear-approaching vehicle should have been detected, but was not. Finally, we define accuracy as: Accuracy = Video Detection Accuracy T P T P +F P +F N. Table 2 presents the results for the video-based detector. From this table, we observe that in all tested cases, only 1 rear-approaching vehicle was not detected (false negatives), while only 4 false alerts were raised (false positives). Based upon these results, we calculate the overall accuracy of this method to be 69.2%. To better understand the sources of incorrect classification, we reviewed the specific trace sequences that were incorrectly handled by the system. In one of the instances, a biker is being followed by another rider, who occludes a rear-approaching vehicle. This causes a false negative. A second example illustrates a false positive case. In this example, a car approaches a biker and is detected by the system correctly. Prior to passing the biker, though, the car slows down and takes a right turns of the roadway. This causes a false positive since the tracker remains in the scene and incorrectly tracks a section of roadway. Finally, a different false positive occurs due to sudden jerk to the system caused by a section of uneven roadway surface. The fast lateral motion causes the location of the car to shift by hundreds of pixels, confusing the tracker again Audio Detection Accuracy Table 2 also presents the results for the audio-based detector. From the table we observe, only 2 rearapproaching vehicles were not detected, while 2 false alerts were generated. We calculate the overall accuracy to be 80.9% for the individual audio case. After closer inspection of the results, we found that the reason for the first false negative is due to a very slowly approaching car, which breaks to take a turn away from the bike before ever reaching the biker. This is a case where we were conservative in annotating the occurrence in the trace data, but the car never actually gets close enough to be a danger to the biker. In another interesting instance, there is a car approaching very slowly from behind and there is a flurry of vehicles passing the bike from the opposite direction. These rapid car passes drown out the slow passing sounds of the behind car, confusing the classifier. This was responsible for a false positive and negative. The final false positive was due to the noise generated as the bicycle rode across a bad patch of road. 12

13 (a) Vehicle 1. (b) Vehicle 2. (c) Vehicle 3. Car No Car Ground Truth Car No Car Audio Car No Car Video Car No Car Combined A/V (d) Detection Timeline. Three vehicles approach a biker from behind in succession. The Ground Truth graph plots the approaches over time. The second and third graphs plot the individual Audio and Video detection results. The fourth graph plots the detection results for the combined multimodal approach. Figure 8: Multimodal Accuracy - Multiple Vehicles Combined Multimodal Detection Accuracy Finally, Table 2 also presents the results for the combined multimodal detector. For the combined case, we present the alert-level results. From the table, we observe the overall accuracy to be 78.3%. This represents 9.1% increase and a 2.6% decrease in accuracy for video and audio, respectively. Aside from the effects on the accuracy, the combination improves the real-time alerting function, as will become evident in Section 5.3 Three example scenarios from the traces provide a closer look at the results for the multimodal case. The first example is shown in Figure 8. In this scenario, three vehicles are approaching a biker from the rear, shown as the images in Figures 8(a), 8(b), and 8(c). The figure also presents the detection results for the scenario (Fig. 8(d)). We make a few observations from this example. First, although there are large portions of agreement between both detectors, there are times when they do not agree. This happens when one detector or the other is more accurate. It is this non-agreement that provides the boost in accuracy to the combined detector. Second, for the first example scenario, audio is unable to distinguish the closely packed individual vehicles, while video is successful. In this examples, as well as other omitted for brevity, the Combined A/V (multimodal) detector performs closer to the Ground Truth than either the Video or Audio detectors alone. 5.3 Real-Time Performance Accurate detection is only useful to a biker if the system can provide alerts in a timely fashion. In this section, we evaluate the timeliness of the individual audio and video detection processing components, and then measure the performance of the combined multimodal detector. We define timeliness in terms of the average number of seconds warning the system provides to the biker prior to a vehicle encounter. We also present the potential timeliness as the percentage of total possible time when a vehicle could be detected by the system, i.e., the difference between the first appearance of the vehicle and the time it passes the biker. All experiments are measured by executing the real-time Cyber-Physical bicycle prototype, using our 13

14 roadway cycling traces. Similar experiments were also executed while capturing video and audio data directly from the sensors to validate our trace-based results. We omit these results, and present only the trace-based results, for brevity Video Performance In this section, we investigate the end-to-end performance of the video-based detection processing. For the video case, two components form the critical performance path: optical flow processing and vehicle tracking subsystems. In the following experiments, we measure the processing costs of both subsystems. 340 Frame Processing Time Smoothed Frame Processing Time Time (ms) Frame Number Plot of the time (in ms) to perform optical flow processing on individual video frames. Includes instantaneous and smoothed results. Figure 9: Optical Flow Performance. The first video performance experiment measures the latency of processing a frame using optical flow techniques. Figure 9 presents the results of our trace-based experimentation. In the figure, the results are presented as the frame processing latency (in ms) for the sequence of frames from a portion of the roadway cycling traces. From the figure, we observe that each frame in the experiment is processed in real-time. Although the latency for frame processing fluctuates between 250 to 315 ms, the frame rate never drops below 3 frames per second (FPS). We also observe, by focusing on the smoothed data (dashed line), that optical flow imposes a relatively constant and predictable processing cost. The second video performance experiment examines the cost of vehicle tracking processing. The results are presented in Figure 10. In the figure, we measure the instantaneous FPS rate as each frame is processed in the Vehicle Tracking component. From the figure, we observe that the frame rate varies between 3.7 and 1.3 FPS. We also observe that performance is quite stable around those two values. The reason for this is that when the component is tracking vehicles, frame processing is more expensive, and we experience a subsequent drop in FPS rate. Two examples of this can be observed in the figure for frames and again for frames Under these performance conditions, we measure the real-time video-based alerting to occur an average of 3.5 seconds prior to a vehicle encounter, which is 92% of the potential time Audio Performance In this section, we investigate the end-to-end performance of the audio-based detection processing. For the audio case, two components form the critical performance path: audio feature extraction and feature vector classification. In the following experiment, we measure the processing costs of both components. To measure isolated performance of the audio-based detection components, we use only the audio streams from the roadway cycling traces. For each audio frame, we measure the latency of feature extraction for each feature in the feature vector and the latency of classification for each vector. The results of this experiment are presented in Figure 11. In the figure, there are eight bars. Each of the first seven bars represents the 14

15 5 Vehicle Tracking Smoothed Vehicle Tracking Frames per Second (FPS) Frame Number Plot of the number of frames processed per second (FPS) through the vehicle tracking component of the video-based detection subsystem. Includes instantaneous and smoothed results. Figure 10: Vehicle Tracking Performance. rate (in FPS) for one of the audio features calculated from the source frames (see Table 1 for details). The last bar is the rate (in FPS) of feature vector classification. From the figure, we observe that all components in the audio processing pipeline execute well within real time limits. In fact, the slowest component, FFT, processes nearly 100 frames per second. Under these performance conditions, we measure the real-time audio-based alerting to occur and average of 1.8 seconds prior to a vehicle encounter, which is 59% of the potential time Combined Multimodal Performance Finally, in this section we investigate the performance when utilizing a multimodal configuration. In this scenario, both audio and video processing executes concurrently, processing data from the same audio/video stream in parallel. Similar to the previous experiment, we measure the frame rate for each component in the pipeline and report the results (in FPS). We consider three components: (i) optical flow processing, (ii) video-based vehicle tracking, and (iii) audio processing. Figure 12 presents the results of this experiment. Although all three components exhibit reduced performance, due to resource competition under concurrent processing conditions, the overall notification time is 3.5 seconds on average which equates to 92% of the potential. In short, the combined system matches the timeliness of the video-based subsystem, yet only loses a little amount of accuracy. 5.4 Power Requirements Since a bicycle is a mobile platform, we must consider the power requirements of the Cyber-Physical bicycle system. A system that provides accurate real time detection is only useful when operating. So, we need to understand the energy burden the system imposes. In this section we evaluate power requirements in two manners. First, we measure the absolute power rates compared to a baseline idle system. The second experiment measures the battery discharge rate as the system executes various components. All experiments are measured by executing the real-time Cyber-Physical bicycle prototype. To capture the measurements, we utilize the BatteryBar utility [7]. Table 3 presents the results for the power consumption rate measurements. All measurements are in Watts. From the table, we observe that Audio places a modest increase in power requirements over the idle system, while Video nearly doubles it due to the inclusion of GPU processing. In fact, the Video results are comparable to the requirements for a typical Movie Player (VLC Media Player [14]). Finally, we observe that the Multimodal scenario is also comparable to the base Video scenario. This is due to the fact that both 15

16 2000 Frames per Second (FPS) RMS ZCR FFT SPE SPF SPC SPR CLASS Audio Processing Component Plot of the number of frames processed per second (FPS) through the audio processing components of the audio-based detection subsystem. First seven bars represent the audio feature extraction performance (FPS) for each feature in Table 1. The last bar is the performance of feature vector classification. Figure 11: Audio Component Processing Performance. Component Power Consumption Rate (Watts) Idle 6.4 (0.1) Audio 8.5 (0.1) Movie Player 12.4 (0.1) Video 14.2 (0.3) Multimodal 14.4 (0.2) Power usage for each processing component. The baseline power usage is Idle. Audio, Video, and Multimodal represent individual power consumption rates for audio, video-based, and combined detection modes. Movie Player is a standard MPEG movie player, for comparison purposes. Results are the mean of five measurements (in Watts) and standard deviations are presented in parentheses. Table 3: Power Consumption Rates. the CPU and GPU are being utilized under Video. At that level of performance, the hardware is already in a high performance mode and the additional processing overheads of Audio cause an incremental increase in power requirements due to higher CPU and memory utilization. Figure 13 presents the battery discharge rates for the five different scenarios. In the figure, we plot the percentage of battery depletion over time as each scenario is executed. As expected, these results mirror the power consumption rates with Video and Multimedia exhibiting similar battery discharge rates. Since both are close to the rate of a typical Movie Player, we expect the battery lifetime while executing our system to be comparable to that of a user watching a movie. Based upon these measurements, and assuming a linear battery discharge rate, we estimate the battery lifetime of our system executing in multimodal mode to be approximately 5 hours. Of course, battery discharge rates are non-linear, but that is not likely to change the fact that the estimated lifetime matches on the order of typical roadway bicycle ride durations [8, 12]. Moreover, since a common application of most modern netbook users is to watch movies (operating on battery power), experience suggests that a typical netbook battery will support the Cyber-Physical bicycle system for the required duration. 16

17 12 10 Frames per Second (FPS) Video-OF Video-Track Audio Frame Processing Component Plot of the number of frames processed per second (FPS) through each processing components of the multimodal detection system. 6 Discussion and Future Work 6.1 User Study Figure 12: Multimodal Processing Performance. The central goal of this work is to reduce the cognitive overhead of a biker to allow her to focus attention on bicycle handling and the roadway ahead. Evaluating this is a challenging problem as it requires thorough coverage of different biker skill levels, riding styles, roadway and route characteristics, environmental conditions, and user interface issues. As such, we acknowledge the importance and need for a full user study and plan to conduct one in the future as a separate, but related piece of research. We envisage this to include two components. The first is a systematic enumeration and categorization of the relevant contributing factors to be included in the study. The second is a broad study over many hours of system evaluation in the field with real bikers on typical rides. Although we plan to pursue a user study, this is not meant to diminish the trace data used in this study. In fact, the data used in our experiments is real data and is of the same quality that would be collected in real time during a user study. 6.2 System Limitations Detection Accuracy. Although the results prove the accuracy of our system, they also allow room for improvement. Our design is biased towards eliminating false negatives, since we consider an unannounced rear-approach to be worse than a false positive. We believe that there is a trade-off here that should be explored, as we attempt to further optimize the system. Additionally, we are investigating additional computer vision techniques, for example profile analysis, to allow our system to recognize the front profile of a vehicle as a means of further validating the presence of a rear-approaching vehicle. Real-Time Performance. Although our system meets the requirements for real time alerting, in multimodal detection it comes close to fully utilizing the resources of our prototype platform. This ultimately places an upper bound on the frame rate (FPS) that the system can process in real time. Since the video capture rate for a typical video camera can be 30 FPS or higher and our real time prototype processes between 1.5 to 10 FPS, there is a potential for improvements in accuracy through optimization. In conjunction with this optimization, we plan to explore the inherent trade-offs between video quality, accuracy, and performance. Another direction that we envisage is to introduce adaptivity to the multimodal system. The idea is to better understand the environmental conditions that affect the accuracy of video and audio-based detection, and to adaptively adjust the level of precessing for each technique. For example, under conditions of low visibility, the system could reduce the frame rate for video and rely more on audio-based detection. 17

18 Percent Remaining Battery Idle Audio Video Movie Player Multimodal Time (min) Plot of the battery discharge rates for each processing component. The baseline power usage is Idle. Audio and Video represent individual discharge rates for audio and video-based detection components executing in isolation. Multimodal represents concurrent audio and video-based detection execution, and Movie Player is a standard MPEG movie player, for comparison purposes. Figure 13: Battery Discharge. Conversely, under conditions of high levels of ambient noise, the system could reduce the frame rate for audio. Limited Visibility Situations. Under conditions of good visibility, our system performs very well, as described in Section 5. In the scope of this work, we have not considered the possible scenarios of limited visibility conditions. At this time, our system relies upon its multimodal nature to accurately detect rearapproaching vehicles. Clearly, audio-based detection does not depend on visibility, even working under conditions of complete darkness. Since we do rely on multimodal combination to achieve the best possible accuracy, we should also consider variable visibility scenarios. We plan to do so in two ways. First, we will include such tests as part of a larger user study. Second, we will purposely gather additional traces from a wide variety of conditions, to provide a more comprehensive data set to perform repeatable experimental trials. Sensor Calibration. In this work, we focus on one specific implementation of the Cyber-Physical bicycle system. One practical issue that must be addressed to enable a broad deployment of this system on generic hardware is that of sensor calibration. We have largely ignored this in this paper, since we rely on specific consistent hardware, which eliminates the issue. If we include an additional goal to broadly deploy the system software on diverse generic hardware, including a variety of different types of cameras and microphones, then we must address the system-sensor calibration issue. For the camera, the focal length must be discovered to allow for accurate distances between bikers and vehicles to be determined. For the microphone, the system must regenerate the classifier models, based upon sound samples captured from the target sensor. Although we have purposefully left this out of the scope of this paper, we plan to address this in the future. 6.3 Power Generation Although we have investigated system power utilization as part of our evaluation, we have assumed that all power is supplied by the platform s integrated battery. This completely ignores the presence of green power. There are at least three forms of power present that we may tap to supplement the system. First, as a biker pedals, she generates power, which can be captured by specialized bicycle wheel hubs [13]. Such hubs can generate up to 6 Watts of continuous power, and are commonly used by bikers to power headlamps. Second, as a bicycle moves between 13 and 20 mph, an opposing wind force is generated, which can be harnessed by small, attachable, wind turbines [10] to capture power. Finally, since bicycling is an fairweather, outdoors activity, solar energy is typically available and can also be harnessed to capture ambient power for the system. Together, these three sources of power might be utilized as a source for battery 18

19 charging. 6.4 Roadway Hazard Detection There are numerous, highly-visible obstacles that may appear along a biker s path such as tree branches or parked cars. Such obstacles are typically observable by a forward-facing biker, and do not require any Cyber-Physical aid. However, there is a class of obstacles that are less observable, yet pose a more serious hazard to biker safety. For example, under normal conditions, a biker being passed by a slow-moving motor vehicle may not be placed in a particularly unsafe situation. On the other hand, if that same biker happens to be riding over a cluster of potholes while being passed by the motor vehicle, there is a greater risk of potential collision due to the increased chances the biker may lose control of his bicycle. Neither the roadway condition nor the slow-moving vehicle poses an individual threat. It is the combination that places undue risk on the biker. As future work, we intend to investigate the possibility of performing roadway sensing to continuously measure roadway surface conditions. We plan to use a combination of accelerometer, video, and audio-based sensing to collect the motion of a bicycle as it travels along the roadway, and to identify visible roadway anomalies. Accurate roadway sensing from a moving motor vehicle has proven difficult, yet tractable [22]. Compared to motor vehicles, bicycles pose additional difficulties. They are substantially lighter, and therefore more susceptible to slight perturbations in motion. 6.5 CyberPeloton (Platoons of Bicycles) Typically bikers ride in groups called pelotons. As bikers collect into groups, there is an opportunity for the Cyber-Physical bicycle system to take advantage of this close proximity to improve safety, share processing load, and provide additional social functionality. For example, as a vehicle approaches from behind, the last bicycle in the group can pass detection alerts forward to other bikers in the peloton. Similarly, a bicycle in front may perform roadway hazard detection and pass alerts back to those behind. Two levels of support are required to achieve this. The first level of support required is functionality to automate the formation of CyberPelotons whenever bikers are in close proximity to each other. Beyond formation, the system must also support intra-group signaling for both high-priority alerts and lower-priority signals. CyberPeloton formation involves a number of challenges that must be overcome. First, Cyber-Physical bicycles must perform real-time proximity detection to determine when there are opportunities to form groups. Second, once a group has been formed, fine-grained relative positioning must be determined 1 and group consensus achieved to ensure that each member of a CyberPeloton agrees that it is a member and has an accurate view of the group positioning topology. Third, support is required to handle load sharing functionality within a CyberPeloton. For example, in order to elect a member to perform roadway hazard detection and alerting, the group must agree on the member that is at the front-most position in the group. The same applies when electing a member of the group to handle automated motor vehicle detection. Moreover, since bicycles in a group may change relative positions frequently, this is a continuous task that must be performed efficiently. 6.6 Automated Incident Detection This paper has dealt with biker safety from an accident avoidance perspective. It is also important to consider biker safety from an accident response perspective. Except for the most fortuitous situations where no injury occurs, an accident that involves a bicycle and a motor vehicle will likely require immediate medical attention for the biker. Considering that a substantial number of at-fault motorists were never even identified [4, 5], it is unlikely that a biker could rely on the motorist to react in the biker s interests. A future goal of ours is to automate the detection of such situations and react accordingly. Additionally, since prosecuting hit-and-run motorists is difficult due to the inherent lack of actionable evidence (e.g., license plate number, make and model of car, etc.), we have the complimentary goal gathering such evidence. Once a incident has been correctly detected, it must be reliably reported to the appropriate authorities. Since the Cyber-Physical bicycle system is equipped with wireless communication technology, notifying authorities in most situations is straight-forward. Considering the potential severity of the incident, though, 1 Bikers frequently ride within 1 foot of each other, well within the error range of common GPS. 19

20 we must also consider the exceptional case when traditional wireless communication fails (e.g., poor cellular signal strength). In this case, the system may need to have alternate signaling methods available. Investigating these methods is a necessary part of the scope of this topic. Finally, once the proper authorities have been contacted, the correct information must be communicated in a data sensitive manner. Since we are attempting to preserve evidence that would otherwise be unavailable to authorities, the authenticity and integrity of the data is paramount. 6.7 Safe Route Planning A more proactive measure to accident avoidance is to incorporate roadway safety metrics directly in bicycle route planning. The goal would be to allow users to map out potential bike routes and then have the system quantify the safety of their route and suggest alternate paths based upon safety criteria. This would allow a user to make a direct quantitative comparison of the relative safety between different bike routes. Today, this can only be determined through biker experience, documented opinion, and slim anecdotal evidence. From a web services perspective, there currently exists a number of cycling-oriented services to allow bikers to form social groups [8, 12]. None of these services include safety as a first-order property in route planning. Although bikers may directly share qualitative experiences with each other regarding roadway safety and route planning, none of these services attempt to include quantitative analysis as a weighting factor. 7 Related Work The closest work to ours in both spirit and domain is the BikeNet [17] project. In this work, the authors utilize a suite of sensors to collect various types of environmental data from air quality to coarse-grained motor vehicle traffic flow (using an embedded magnetometer), again as an indicator of environmental conditions. Although this project was the first to suggest applying sensors to bicycles, their application domain was biker fitness, and did not target the equally challenging problem of biker safety. In the remainder of this section we review work related to ours in the areas of video and audio-based detection techniques. 7.1 Video Detection Techniques The current state of the art with respect to bicycle accident detection is the Cerevellum [9] digital rear-view mirror product. Aside from providing a continuous video-based view of the situation behind a biker, it also detects when a bike is struck (using simple accelerometer measurements) and will store the last 30 seconds of video on local flash storage. It does not, however, contact authorities on behalf of the biker, nor does it provide any digital signature of the evidence. It simply stores a small 30-second video, without any further processing. Computer vision technologies have been successfully used in detection of moving objects, such as people and vehicle in many application domains. For example, in surveillance domain, many algorithms have been developed for moving objects detection and tracking [18]. In a very closely related domain, many computer vision systems have been developed to assist automobile drivers. For example systems have been developed and deployed to detect approaching cars in blind spots using cameras and radar sensors [31]. Systems have developed to detect and recognize traffic signs [20] and to detect crossing pedestrians [19]. Several systems have also been developed to detect lanes, to detect the vehicle ahead in the same lane and keep a safe distance, e.g., [25, 35, 29, 24]. Integrated systems have also been developed for autonomous driving, e.g., [23]. In DARPA 2005 grand challenge, several teams competed on developing autonomous vehicles to drive on 211 km desert roads. Despite that much effort spent in integrating vision systems in automobiles, almost nothing has been done to develop vision systems for bicycles where it is highly needed for bikers safety. This is because it is quite challenging to develop a low cost and power efficient vision system for biker assistant. 20