Simultaneous Mosaicing and Tracking with an Event Camera

Similar documents
Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

What is Candidate Sampling

An Alternative Way to Measure Private Equity Performance

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Realistic Image Synthesis

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Vehicle Detection and Tracking in Video from Moving Airborne Platform

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

SVO: Fast Semi-Direct Monocular Visual Odometry

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Multi-Robot Tracking of a Moving Object Using Directional Sensors

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

L10: Linear discriminants analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Inter-Ing INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, November 2007.

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Detecting Global Motion Patterns in Complex Videos

Recurrence. 1 Definitions and main statements

Forecasting the Direction and Strength of Stock Market Movement

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

The OC Curve of Attribute Acceptance Plans

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

A Probabilistic Theory of Coherence

A Multi-mode Image Tracking System Based on Distributed Fusion

Autonomous Navigation and Map building Using Laser Range Sensors in Outdoor Applications

SIMPLE LINEAR CORRELATION

Support Vector Machines

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

where the coordinates are related to those in the old frame as follows.

Simple Interest Loans (Section 5.1) :

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Traffic State Estimation in the Traffic Management Center of Berlin

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Damage detection in composite laminates using coin-tap method

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

A Secure Password-Authenticated Key Agreement Using Smart Cards

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

DEFINING %COMPLETE IN MICROSOFT PROJECT

An interactive system for structure-based ASCII art creation

Tracking with Non-Linear Dynamic Models

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

21 Vectors: The Cross Product & Torque

An Interest-Oriented Network Evolution Mechanism for Online Communities

Vehicle Tracking Using Particle Filter for Parking Management System

Calculating the high frequency transmission line parameters of power cables

Single and multiple stage classifiers implementing logistic discrimination

Laws of Electromagnetism

Project Networks With Mixed-Time Constraints

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

A Fast Incremental Spectral Clustering for Large Data Sets

Lecture 2: Single Layer Perceptrons Kevin Swingler

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

NON-LINEAR MULTIMODAL OBJECT TRACKING BASED ON 2D LIDAR DATA

Extending Probabilistic Dynamic Epistemic Logic

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Brigid Mullany, Ph.D University of North Carolina, Charlotte

An Empirical Study of Search Engine Advertising Effectiveness

Rotation Kinematics, Moment of Inertia, and Torque

HALL EFFECT SENSORS AND COMMUTATION

Ring structure of splines on triangulations

This circuit than can be reduced to a planar circuit

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Can Auto Liability Insurance Purchases Signal Risk Attitude?

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Human Tracking by Fast Mean Shift Mode Seeking

Software Alignment for Tracking Detectors

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion

7.5. Present Value of an Annuity. Investigate

An RFID Distance Bounding Protocol

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Joe Pimbley, unpublished, Yield Curve Calculations

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Gender Classification for Real-Time Audience Analysis System

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

1 Example 1: Axis-aligned rectangles

I. INTRODUCTION. 1 IRCCyN: UMR CNRS 6596, Ecole Centrale de Nantes, Université de Nantes, Ecole des Mines de Nantes

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

BERNSTEIN POLYNOMIALS

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Transcription:

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 1 Smultaneous Mosacng and Trackng wth an Event Camera Hanme Km 1 hanme.km@mperal.ac.uk Ankur Handa 2 ah781@cam.ac.uk Ryad Benosman 3 ryad.benosman@upmc.fr So-Ho Ieng 3 so-ho.eng@upmc.fr Andrew J. Davson 1 a.davson@mperal.ac.uk 1 Department of Computng, Imperal College London, London, UK 2 Department of Engneerng, Unversty of Cambrdge, Cambrdge, UK 3 Sorbonne Unverstés, UPMC Unv Pars 06, UMR_S 968, Insttut de la Vson, Pars, F-75012, CNRS, France. Abstract An event camera s a slcon retna whch outputs not a sequence of vdeo frames lke a standard camera, but a stream of asynchronous spkes, each wth pxel locaton, sgn and precse tmng, ndcatng when ndvdual pxels record a threshold log ntensty change. By encodng only mage change, t offers the potental to transmt the nformaton n a standard vdeo but at vastly reduced btrate, and wth huge added advantages of very hgh dynamc range and temporal resoluton. However, event data calls for new algorthms, and n partcular we beleve that algorthms whch ncrementally estmate global scene models are best placed to take full advantages of ts propertes. Here, we show for the frst tme that an event stream, wth no addtonal sensng, can be used to track accurate camera rotaton whle buldng a persstent and hgh qualty mosac of a scene whch s super-resoluton accurate and has hgh dynamc range. Our method nvolves parallel camera rotaton trackng and template reconstructon from estmated gradents, both operatng on an event-by-event bass and based on probablstc flterng. 1 Introducton Real-tme, real-world vson applcatons such as n robotcs and wearable computng requre rapd reacton to dynamc moton, and the ablty to operate n scenes whch contan large ntensty dfferences. Standard vdeo cameras run nto problems when tryng to supply ths, ether of huge bandwdth requrements at hgh frame-rates or dmnshng mage qualty wth blur, nose or saturaton [11]. To overcome these lmtatons, researchers n neuromorphcs have bult new vsual sensors amng to emulate some of the propertes of the human retna [5]. An event camera has no shutter and does not capture mages n the tradtonal sense. Instead, each pxel responds ndependently to dscrete changes n log ntensty by generatng asynchronous events, each wth mcrosecond-precse tmng. The bandwdth of an c 2014. The copyrght of ths document resdes wth ts authors. It may be dstrbuted unchanged freely n prnt or electronc forms.

2 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA event stream s much lower than for standard vdeo, removng the redundancy n contnually repeated mage values; but an event stream should n prncple contan all of the nformaton of standard vdeo and wthout the usual bounds on frame-rate and dynamc range. However, the uses demonstrated of event cameras have been lmted. We are nterested n scene understandng and SLAM applcatons where the camera tself moves and there has been lttle work on buldng coherent scene models from event data. The clear dffculty s that most methods and abstractons normally used n reconstructon and trackng, such as feature detecton and matchng or teratve mage algnment, are not avalable. In ths paper we show that the pure event stream from a hand-held event camera undergong 3D rotatons, wth no addtonal sensng, can be used to generate hgh qualty scene mosacs. We use a SLAM-lke method of parallel flters to jontly estmate the camera s rotaton moton and a gradent map of a scene. Ths gradent map s then upgraded to a full mage-lke mosac wth super-resoluton and hgh dynamc range propertes. When an event camera moves, events are trggered at pxel locatons where ntensty edges cross ts feld of vew. However, an event camera does not drectly measure mage gradents but only the locatons, sgns and tmes of brghtness changes. The presence, orentatons and strengths of edges must be estmated together wth the camera s rotaton trajectory. As each new event s receved, current trackng estmates of the camera s poston and velocty relatve to the mosac mean that the event serves as a measurement of the component of gradent parallel to the moton drecton of that pxel. We refne estmates of gradents, as well as the moton of the camera, usng Bayesan flterng on an event-by-event bass. Event cameras can be seen as the logcal concluson of devces such as rollng shutter cameras whch have some degree of non-global capture; but are a much more powerful proposton snce ther output s purely data-drven and they do a lot of the hard thngs we are used to dong n computer vson to determne whch pxels are useful for trackng or reconstructon n hardware, at no computatonal cost. We hope that our work, as well as havng a strong engneerng nterest, also shnes some more lght on why a bologcal retna works as t does and suggests that t stll has an awful lot to teach us n computer vson. 2 Event cameras We use the frst commercal event camera, the Dynamc Vson Sensor (DVS) [13] shown n Fgure 1(a). It has 128 128 resoluton, 120 db dynamc range and 15 mcrosecond latency, and communcates wth a host computer usng USB 2.0. It outputs a stream of events, each consstng of a pxel locaton, a polarty bt for postve or negatve change n log ntensty and a tmestamp n mcroseconds as depcted n Fgure 1(b). We can vsualse ts output as shown n Fgure 1(c) by accumulatng events wthn a tme nterval; n ths fgure, whte and black pxels represent postve and negatve events respectvely. We should note that there s a newer verson, the Asynchronous Tme-based Image Sensor (ATIS) [17], whch has hgher resoluton (304 240), hgher dynamc range (143 db) and lower latency; and ATIS provdes an absolute ntensty for each event. We can expect much more nnovaton n neuromorphc cameras n the near future. 2.1 Related Work Snce the emergence of the event cameras, most vson work usng them has focused on trackng movng targets from a fxed pont of vew, where almost all events are generated by

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 3 (a) (b) (c) Fgure 1: The frst commercal event camera: (a) DVS128; (b) a stream of events (upward and downward spkes: postve and negatve events); (c) mage-lke vsualsaton of accumulated events wthn a tme nterval (whte and black: postve and negatve events). the dynamc object moton. For nstance, a robot goalkeeper applcaton blocks balls detected by a DVS camera [8], and a pencl balancng applcaton mantans a pencl balanced on ts tp by controllng an actuated table underneath usng two DVS cameras [7]. Ths applcaton requres very fast feedback control, successfully provng the remarkable hgh measurement rate and low latency capabltes of the event camera. More recently, Benosman et al. [4] proposed an optcal flow estmaton algorthm usng an event camera whch precsely estmates vsual flow orentaton and ampltude based on a local dfferental approach on the surface defned by events wth mcrosecond accuracy and at very low computatonal cost. Work on reconstructng, understandng and trackng of more general, prevously unknown scenes where the event camera tself s movng s at an early stage. The frst attempt to use ths type of camera for a SLAM problem was made by Wekersdorfer et al. [19]. They used a wheeled robot equpped wth an upward lookng DVS camera to estmate 2D moton and construct a planar celng map. Most recently, Mueggler et al. [15] presented an onboard 6 DoF localsaton quadrotor system usng a DVS camera whch s able to track hgh-speed maneuvers, such as flps. Ther system starts by ntegratng events untl a known pattern s detected, and then t tracks the borders of the pattern, by updatng both the lne segments and the pose of the flyng robot on an event-by-event bass. As a DVS camera does not provde absolute brghtness values, a few attempts to combne an event camera wth an extra full frame camera were made. Wekersdorfer et al. [20] developed an event-based 3D sensor combnng a DVS wth an RGB-D camera whch generates a sparse stream of depth-augmented 3D ponts. In a smlar way, Cens and Scaramuzza [6] presented a low-latency event-based vsual odometry algorthm combnng a DVS wth a normal CMOS camera whch uses events from the DVS to estmate the relatve dsplacement snce the prevous frame from the conventonal camera. Although these are both certanly possble practcal ways to use an event camera for localsaton and mappng, n our vew ths type of approach s sub-optmal; returnng to the need for a frame-based camera alongsde the event camera reduces many of the advantages of workng purely wth events. After the submsson of our work, we came across a smlar dea to our scene reconstructon from an event stream [3], but n a much more constraned and hardware-dependent setup. They developed a specal 360 hgh dynamc range camera conssts of a par of dynamc vson lne sensors, a hgh-speed rotatng mechancal devce wth encoders and a processng unt, and created a panoramc wth greyscale values from event data.

4 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA postve events tme (a) negatve event (b) Fgure 2: Event tme ntervals τ and τ c : (a) A smplfed 2 2 event-based camera movng over a scene, wth colours to dentfy the pxels. (b) A stream of events generated by the camera. Upward and downward spkes represent postve and negatve events, and ther colours ndcate the pxel each event came from. τ s the tme elapsed snce the prevous event at any pxel, and τ c s the tme snce the prevous event at the same pxel. 3 Method Our approach reles on two parallel probablstc flters to jontly track the global rotatonal moton of a camera and estmate the gradents of the scene around t; the gradent map s then upgraded to a full mage-lke mosac wth super-resoluton and hgh dynamc range propertes. Each of these components essentally beleves that the current estmate from the other s correct, followng the approach of most recent successful data-rch SLAM systems such as PTAM [12] and DTAM [16], or n pure rotaton mosacng [14]. Note that we do not currently explctly address bootstrappng n our method. We have found that smple alternaton of the trackng and mappng components, startng from a blank template, wll very often lead to rapd convergence of jont trackng and mappng, though there are sometmes currently gross falures and ths s an mportant ssue for future research. We use the notaton e(u,v) = (u,v, p,t) to denote an event wth pxel locaton u and v, polarty p and tmestamp t. The rotatonal mosac or template we am to reconstruct s denoted M(p m ) and has ts own fxed 2D coordnate frame wth pxel poston vector p m. We defne two mportant tme ntervals τ and τ c whch are used n our algorthm. For clarty, let us consder a smplfed 2 2 event camera movng over a smple scene as shown n Fgure 2(a). We receve a stream of postve and negatve spkes from the camera; those spke events are depcted along the tme axs n Fgure 2(b) and can be assocated wth a specfc pxel by ts colour. When a new event arrves from a certan pxel, we defne τ as the tme elapsed snce the most recent prevous event from any pxel and; and τ c as the tme snce the most prevous event at the same pxel. Here, τ s sgnfcant as the blnd tme snce any prevous vsual nformaton was receved and s used n the moton predcton component of our tracker; whle τ c s mportant snce ts nverse serves as a local measurement of the rate of events at a partcular locaton n mage space. 3.1 Event-based camera trackng We have chosen a partcle flter as a straghtforward sequental Bayesan way to estmate the rotaton moton of our camera over tme wth the mult-hypothess capablty to cope wth the sometmes nosy event stream. In our event-based partcle flter, the posteror densty functon at tme t s represented by N partcles, {p (t) 1,p(t) consstng of a hypothess of the current state R (t) 2,...,p(t) N }. Each partcle p(t) s a set SO(3) and a normalsed weght w (t).

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 5 Intally, all partcles are set to the orgn wth the same weght. 3.1.1 Moton predcton We frst explan the trackng component of our algorthm, whose job s to provde an event by event updated estmate of the rotatonal locaton of the camera wth respect to the scene mosac. We use a constant poston (random walk) moton model for our partcle flter where the predcted mean rotaton of a partcle at any gven tme remans constant whle the varance of the predcton expands lnearly wth tme. We perturb the current so(3) vector on the tangent plane wth Gaussan nose ndependently n all three axes and reproject t onto the SO(3) unt sphere to obtan the correspondng predcted mean rotaton. The nose s the predcted change the current rotaton mght have undergone snce the prevous event was generated. Ths s further smplfed by the composton property of rotaton matrces and yelds the fnal update at the current tme t as: R (t) ( 3 ) = R (t τ) exp k G k, (1) k=1n where R s the rotaton matrx for the th partcle and G k are the Le group generators for SO(3). The nose vector n = (n 1,n 2,n 3 ) s obtaned by generatng random numbers sampled from Gaussan dstrbutons ndependently n all three drectons.e. n N (0,σ 2τ). Note that the hgh average frequency of events (at least 10kHz typcally) relatve to the dynamcs of a hand-held camera strongly motvates the use of a stronger moton model (e.g. constant velocty or acceleraton) [10], and we am to test such a model soon. 3.1.2 Measurement update The weghts of these perturbed partcles are now updated through the measurement update step whch apples Bayes rule to each partcle (the weghts are subsequently normalsed): w (t) = P(z R (t) )w (t τ). (2) We calculate the value of a measurement z gven an event e(u,v), the current state R (t) and the prevous state R (t τ c) by takng a log ntensty dfference between the correspondng ntensty map postons: z = log(m(p (t) m )) log(m(p (t τ c) m )), (3) ( ) where p m (t) = π R (t) K 1 ṗ c. (4) Here ṗ c = (u,v,1) s a camera pxel poston n homogeneous coordnates, K s the camera ntrnscs, and π(p) = 1 p 2 (p 0, p 1 ) s the homogeneous projecton functon. The measurement z s now used to calculate the lkelhood P(z R (t) ) for each partcle, essentally askng how lkely was ths event relatve to our mosac gven a partcular hypothess of camera pose?. We frst compare the sgn of the log ntensty dfference wth the polarty of the event, and we gve a partcle a fxed low lkelhood f the sgns do not agree. Otherwse, we look up a lkelhood of ths absolute log ntensty dfference (contrast) n the Mexcan

6 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 1.0 event lkelhood 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 contrast (a) (b) Fgure 3: (a) Event lkelhood functon. (b) Camera and map reference frames. hat shaped curve shown n Fgure 3(a), wth mean algned to a known contrast whch hghly lkely generates an event [13]. For the next measurement update step and the reconstructon block, a partcle mean pose s saved for each pxel whenever an event occurs at a specfc pxel locaton. To calculate the mean of partcles, we apply the matrx logarthm to all partcles SO(3) components to map them to the tangent space, calculate the arthmetc mean, and re-map to the SO(3) group s manfold by applyng the matrx exponental. Because of the random walk nature of our moton model whch generates nosy moton estmates, a new mean pose s saved n a form of a weghted average wth the prevous mean pose. Fnally, after gvng each partcle a lkelhood and normalsng the dstrbuton we resample the dstrbuton n the standard way, makng a new partcle set whch copes old partcles wth probablty accordng to ther weghts. However, due to the very hgh frequency of events, we do not resample on every step to avod unnecessarly deletng good partcles n cases where all weghts are smlar. We follow [9] to determne whether the resamplng should be carred out dependng on the so-called effectve number of partcles N eff : N eff = We resample the set of partcles whenever N eff s less than N/2. 1 N =1 (w(t) ) 2. (5) 3.2 Mosac reconstructon We now turn to the other man part of our algorthm, whch havng receved an updated camera pose estmate from trackng must ncrementally mprovng our estmate of the ntensty mosac. Ths takes two steps; pxel-wse ncremental Extended Kalman Flter (EKF) estmaton of the log gradent at each template pxel, and nterleaved Posson reconstructon to recover absolute log ntensty. 3.2.1 Pxel-wse EKF based gradent estmaton We receve an event at a pxel locaton p c = (u,v) n the camera reference frame and, usng our trackng algorthm as descrbed n Secton 3.1, we fnd the correspondng locaton p m = (x,y) n the map reference frame as shown n Fgure 3(b). Each pxel of the gradent

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 7 map has an ndependent gradent estmate g (t) = (g x,g y ) and 2 2 covarance matrx P (t) g. At ntalsaton, all estmated gradents are ntalsed to zero vectors wth large covarances. Now, we want to mprove an estmate g (t) based on a new ncomng event and a trackng result usng the pxel-wse EKF. We know τ c at p c and the velocty of the camera at a pxel p m s calculated as below: ( vx ) v (t) = = p(t) m p (t τ c) m. (6) v y τ c Assumng, based on the rapdty of events, that the gradent g n the template and the camera velocty v can be consdered locally constant, we now say (g v)τ c s the amount of log grey level change that has happened snce the last event. Therefore, f we have an event camera where log ntensty change C should trgger an event, the brghtness constancy tells us that: ( g (t) v (t)) τ c = ±C, (7) where the sgn of C depends on the polarty of an event. We now defne z, a measurement of the nstantaneous event rate at ths pxel, and ts measurement model h as below: z (t) = 1 τ c, (8) h (t) = g(t) v (t). (9) C In the EKF framework, the gradent estmate and the uncertanty covarance matrx are updated usng the standard equatons at every event: g (t) = g (t τ c) + Wν, (10) where the Kalman gan W s: the nnovaton ν s: P (t) g = P (t τ c) g WSW, (11) W = P (t τ c) g h g S 1, (12) and the nnovaton covarance S s: ν = z (t) h (t), (13) S = h g P(t τ c) g h g + R, (14) where R s the measurement nose, n our case scalar σm. 2 Fnally, Jacoban h g s derved as: h g = g ( g v ) C ( ( ) = gx v x +g y v y g x C ( )) gx v x +g y v y g y C (15) = ( v x C v y C ).

8 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA (a) (b) (c) (d) Fgure 4: Proposed algorthm: (a) scene and DVS camera; (b) event stream; (c) estmated gradent map (the colours and ntenstes of ths fgure represent the orentatons and strengths of the gradents of the scene respectvely); (d) reconstructed ntensty map. Essentally, each new event whch lnes up wth a partcular template pxel mproves our estmate of ts gradent n the drecton parallel to the moton of the camera over the scene at that pxel whle we learn nothng about the gradent n the drecton perpendcular to camera moton. We vsualse an estmated gradent map n Fgure 4(c); the colours and ntenstes of the fgure represent the orentatons and strengths of the gradents of the scene respectvely. 3.2.2 Reconstructon from gradents Inspred by [18], we reconstruct the log ntensty of the mage whose gradents M x, M y across the whole mage doman are close to the estmated gradents g x, g y n a least squares sense: J(M) = (M x g x ) 2 + (M y g y ) 2 dxdy. (16) The Euler-Lagrange equaton to mnmse J(M) s: whch leads to the well known Posson equaton: J M d J d J = 0, (17) dx M x dy M y 2 M = x g x + y g y. (18) Here 2 M = 2 M + 2 M s the Laplacan. To solve Equaton (18), we use a sne transform x 2 y 2 based method [1, 2]. We show a reconstructed ntensty map n Fgure 4(d). 4 Experments We recommend readers to vew our submtted vdeo 1 whch llustrates all of the key results below n a form better than stll pctures. We have conducted sphercal mosacng n both ndoor and outdoor scenes. Also, we show the potental for reconstructng hgh resoluton and hgh dynamc range scenes from very small camera moton. Our algorthm runs n realtme on a standard PC for low number of partcles and template resolutons; the results we show here were generated at hgher resoluton and are currently off-lne but we beleve t s a smple matter of engneerng to run at these settngs n real-tme n the near future. 1 https://www.youtube.com/watch?v=l6qxem1dbxu

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 9 Fgure 5: Sphercal mosacng for ndoor and outdoor scenes. The overlad boxes represent the feld of vew of the event camera. 4.1 Sphercal mosacng As shown n Fgure 5, our algorthm s able to reconstruct ndoor and outdoor scenes. In these mosacs, the overlad box represents the tracked feld of vew of the event camera. 4.2 Hgh resoluton reconstructon Even though the current event-based cameras have very low resoluton (DVS has a 128 128 pxel array), as they provde very fast vsual measurements, we can reconstruct hgh resoluton scenes snce our algorthm tracks rotaton at sub-pxel accuracy. In Fgure 6 we compare (a) an mage from a standard camera down-sampled to 128 128 resoluton wth (b) our DVS reconstructon, showng sharper detals. 4.3 Hgh dynamc range reconstructon Another key characterstc of the event camera s ts senstvty over a very hgh dynamc range (e.g. 120dB for DVS). Our algorthm can buld mosacs whch make use of ths range, to deal wth scenes where there are large ntensty dfference between the brghtest and darkest parts. We created a scene wth a very hgh range of lght ntensty by placng a row of brght LED lghts on top of a poorly lt sketch pad. A standard global shutter camera generates an mage whch s partly saturated, partly very dark and also has smearng effects (Fgure 7(a)). However, the event camera and our algorthm are able to reconstruct the hgh dynamc range log ntensty mage n Fgure 7(c) where all elements are clear.

10 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA (a) (b) Fgure 6: Hgh resoluton reconstructon: (a) a down sampled normal camera mage for a comparson; (b) a reconstructed hgh resoluton scene. (a) (b) (c) Fgure 7: Hgh dynamc range reconstructon: (a) a saturated normal CCD camera mage wth the smear effect for a comparson; (b) a vsualsaton of a stream of events from the DVS camera; (c) a reconstructed hgh dynamc range scene. 5 Concluson We beleve these are breakthrough results, showng how jont sequental and global estmaton permts the great benefts of an event camera to be appled to a real problem of mosacng, and hopefully openng the door to smlar approaches n dense 3D reconstructon n the style of [16] and many other vson problems. It s worth restatng the comparson between the data rate of an event camera typcally on the order of 40 180kB/s n our experments and for nstance a standard monochrome VGA vdeo feed at 30Hz: 10MB/s. The only nformaton that s mportant for trackng and reconstructon s how edges move, and the event camera gves us drectly that nformaton and nothng else, whle removng the problems of blur, low dynamc range and lmted resoluton whch standard cameras have. Acknowledgments Hanme Km was supported by an EPSRC DTA studentshp. We thank Jacek Zenkewcz and other colleagues at Imperal College London for many useful dscussons.

KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA 11 References [1] A. Agrawal, R. Chellappa, and R. Raskar. An Algebrac Approach to Surface Reconstructon from Gradent Felds. In Proceedngs of the Internatonal Conference on Computer Vson (ICCV), 2005. [2] A. Agrawal, R. Raskar, and R. Chellappa. What s the Range of Surface Reconstructons from a Gradent Feld. In Proceedngs of the European Conference on Computer Vson (ECCV), 2006. [3] A. N. Belbachr, S. Schraml, M. Mayerhofer, and M. Hofstätter. A Novel HDR Depth Camera for Real-tme 3D 360 Panoramc Vson. In IEEE Computer Socety Conference on Computer Vson and Pattern Recognton Workshops (CVPRW), 2014. [4] R. Benosman, C. Clercq, X. Lagorce, S. Ieng, and C. Bartolozz. Event-Based Vsual Flow. IEEE Transactons on Neural Networks and Learnng Systems, 25:407 417, 2014. [5] K Boahen. Neuromorphc Chps. Scentfc Amercan, 2005. [6] A. Cens and D. Scaramuzza. Low-Latency Event-Based Vsual Odometry. In Proceedngs of the IEEE Internatonal Conference on Robotcs and Automaton (ICRA), 2014. [7] J. Conradt, M. Cook, R. Berner, P. Lchtstener, R.J. Douglas, and T. Delbruck. A pencl balancng robot usng a par of AER dynamc vson sensors. In IEEE Internatonal Symposum on Crcuts and Systems (ISCAS), 2009. [8] T. Delbruck and P. Lchtstener. Fast sensory motor control based on event-based hybrd neuromorphc-procedural system. In IEEE Internatonal Symposum on Crcuts and Systems (ISCAS), 2007. [9] A. Doucet, S. Godsll, and C. Andreu. On sequental Monte Carlo samplng methods for Bayesan flterng. Statstcs and Computng, 10:197 208, 2000. [10] P. Gemener, A. J. Davson, and M. Vncze. Improvng Localzaton Robustness n Monocular SLAM Usng a Hgh-Speed Camera. In Proceedngs of Robotcs: Scence and Systems (RSS), 2008. [11] A. Handa, R. A. Newcombe, A. Angel, and A. J. Davson. Real-Tme Camera Trackng: When s Hgh Frame-Rate Best? In Proceedngs of the European Conference on Computer Vson (ECCV), 2012. [12] G. Klen and D. W. Murray. Parallel Trackng and Mappng for Small AR Workspaces. In Proceedngs of the Internatonal Symposum on Mxed and Augmented Realty (IS- MAR), 2007. [13] P. Lchtstener, C. Posch, and T. Delbruck. A 128 128 120 db 15 µs Latency Asynchronous Temporal Contrast Vson Sensor. IEEE Journal of Sold-State Crcuts (JSSC), 43(2):566 576, 2008.

12 KIM et al.: SIMULTANEOUS MOSAICING AND TRACKING WITH AN EVENT CAMERA [14] S. J. Lovegrove and A. J. Davson. Real-Tme Sphercal Mosacng usng Whole Image Algnment. In Proceedngs of the European Conference on Computer Vson (ECCV), 2010. [15] E. Mueggler, B. Huber, and D. Scaramuzza. Event-based, 6-DOF Pose Trackng for Hgh-Speed Maneuvers. In Proceedngs of the IEEE/RSJ Conference on Intellgent Robots and Systems (IROS), 2014. [16] R. A. Newcombe, S. Lovegrove, and A. J. Davson. DTAM: Dense Trackng and Mappng n Real-Tme. In Proceedngs of the Internatonal Conference on Computer Vson (ICCV), 2011. [17] C. Posch, D. Matoln, and R. Wohlgenannt. A QVGA 143 db Dynamc Range Frame- Free PWM Image Sensor Wth Lossless Pxel-Level Vdeo Compresson and Tme- Doman CDS. IEEE Journal of Sold-State Crcuts (JSSC), 2011. [18] J. Tumbln, A. Agrawal, and R. Raskar. Why I want a Gradent Camera. In Proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton (CVPR), 2005. [19] D. Wekersdorfer, R. Hoffmann, and J. Conradt. Smultaneous Localzaton and Mappng for event-based Vson Systems. In Internatonal Conference on Computer Vson Systems (ICVS), 2013. [20] D. Wekersdorfer, D. B. Adran, D. Cremers, and J. Conradt. Event-based 3D SLAM wth a depth-augmented dynamc vson sensor. In Proceedngs of the IEEE Internatonal Conference on Robotcs and Automaton (ICRA), 2014.