Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Similar documents
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

3D Vision An enabling Technology for Advanced Driver Assistance and Autonomous Offroad Driving

3D Vision An enabling Technology for Advanced Driver Assistance and Autonomous Offroad Driving

Pedestrian Detection with RCNN

Chapter 10: Third Working Phase

Vision meets Robotics: The KITTI Dataset

Traffic Monitoring Systems. Technology and sensors

3D Model based Object Class Detection in An Arbitrary View

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

The KITTI-ROAD Evaluation Benchmark. for Road Detection Algorithms

Object Recognition. Selim Aksoy. Bilkent University

Point Cloud Simulation & Applications Maurice Fallon

Tracking and integrated navigation Konrad Schindler

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Automatic Labeling of Lane Markings for Autonomous Vehicles

The Visual Internet of Things System Based on Depth Camera

Robot Perception Continued

Real-Time Tracking of Pedestrians and Vehicles

Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation

Tracking in flussi video 3D. Ing. Samuele Salti

High speed 3D capture for Configuration Management DOE SBIR Phase II Paul Banks

Colorado School of Mines Computer Vision Professor William Hoff

automatic road sign detection from survey video

T-REDSPEED White paper

The goal is multiply object tracking by detection with application on pedestrians.

Wii Remote Calibration Using the Sensor Bar

Part-Based Pedestrian Detection and Tracking for Driver Assistance using two stage Classifier

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Announcements. Active stereo with structured light. Project structured light patterns onto the object

3D Scanner using Line Laser. 1. Introduction. 2. Theory

Real-time Visual Tracker by Stream Processing

Organizers: Cristóbal Curio (Reutlingen University, Germany) and Christoph Stiller (KIT, Germany)

3D/4D acquisition. 3D acquisition taxonomy Computer Vision. Computer Vision. 3D acquisition methods. passive. active.

Tracking performance evaluation on PETS 2015 Challenge datasets

Semantic Recognition: Object Detection and Scene Segmentation

Multi-view Intelligent Vehicle Surveillance System

Situated Visualization with Augmented Reality. Augmented Reality

National Performance Evaluation Facility for LADARs

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Crater detection with segmentation-based image processing algorithm

LIBSVX and Video Segmentation Evaluation

Two-Frame Motion Estimation Based on Polynomial Expansion

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Removing Moving Objects from Point Cloud Scenes

Cloud Based Localization for Mobile Robot in Outdoors

How does the Kinect work? John MacCormick

ARC 3D Webservice How to transform your images into 3D models. Maarten Vergauwen

Position Estimation Using Principal Components of Range Data

Flow Separation for Fast and Robust Stereo Odometry

Vehicle Tracking System Robust to Changes in Environmental Conditions

Tracking People and Their Objects

Robust Pedestrian Detection and Tracking From A Moving Vehicle

Automatic parameter regulation for a tracking system with an auto-critical function

Towards License Plate Recognition: Comparying Moving Objects Segmentation Approaches

Multi-View Object Class Detection with a 3D Geometric Model

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

Real-time 3D Scanning System for Pavement Distortion Inspection

Mobile Robot FastSLAM with Xbox Kinect

3D Vehicle Extraction and Tracking from Multiple Viewpoints for Traffic Monitoring by using Probability Fusion Map

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Convolutional Feature Maps

LabelMe: Online Image Annotation and Applications

Machine vision systems - 2

Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

Interactive person re-identification in TV series

Blender 3D Animation

Synthetic Sensing: Proximity / Distance Sensors

Can we calibrate a camera using an image of a flat, textureless Lambertian surface?

Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

The Advantages of Using a Fixed Stereo Vision sensor

Feature Tracking and Optical Flow

Sensor Modeling for a Walking Robot Simulation. 1 Introduction

Robotics. Lecture 3: Sensors. See course website for up to date information.

Discovering objects and their location in images

Aguantebocawertyuiopasdfghvdslpmj klzxcvbquieromuchoanachaguilleanic oyafernmqwertyuiopasdfghjklzxcvbn mqwertyuiopasdfghjklzxcvbnmqwerty

Open-Set Face Recognition-based Visitor Interface System

SIMULTANEOUS LOCALIZATION AND MAPPING FOR EVENT-BASED VISION SYSTEMS

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER

What more can we do with videos?

Snapshot of the online application for image annotation.

A Methodology for Obtaining Super-Resolution Images and Depth Maps from RGB-D Data

INTRODUCTION TO RENDERING TECHNIQUES

Urban Vehicle Tracking using a Combined 3D Model Detector and Classifier

Local features and matching. Image classification & object localization

THE CONTROL OF A ROBOT END-EFFECTOR USING PHOTOGRAMMETRY

Finding people in repeated shots of the same scene

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Digital Image Increase

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Aligning 3D Models to RGB-D Images of Cluttered Scenes

Transcription:

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3 TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO Philip Lenz (KIT) Andreas Geiger (MPI-IS) Christoph Stiller (KIT) Raquel Urtasun (TTI-C)

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars 3D Laserscanner State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars Depth/reflectance map for 3D localization Detailed road map for path planning State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The Challenge of Autonomous Cars State of the art Localization, path planning, obstacle avoidance Heavy use of 3D laser scanner and detailed maps Problems for computer vision Stereo, optical flow, localization Object detection, recognition and tracking Semantic segmentation, 3D scene understanding P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 2

The KITTI Vision Benchmark Suite Two stereo rigs (1392 512 px, 54 cm base, 90 opening) Velodyne laser scanner, GPS+IMU localization 6 hours of recordings, 10 frames per second P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 3

The KITTI Vision Benchmark Suite Two stereo rigs (1392 512 px, 54 cm base, 90 opening) Velodyne laser scanner, GPS+IMU localization 6 hours of recordings, 10 frames per second P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 3

The KITTI Vision Benchmark Suite Two stereo rigs (1392 512 px, 54 cm base, 90 opening) Velodyne laser scanner, GPS+IMU localization 6 hours of recordings, 10 frames per second P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 3

Sensor Calibration Challenges Camera camera calibration Velodyne camera registration GPS Velodyne registration } Geiger et al., ICRA 2012 } ICP + Hand-eye calibration P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 4

Sensor Calibration Challenges Camera camera calibration Velodyne camera registration GPS Velodyne registration } Geiger et al., ICRA 2012 } ICP + Hand-eye calibration P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 4

Sensor Calibration Challenges T GPS T C T C T Velodyne Camera camera calibration Velodyne camera registration GPS Velodyne registration } Geiger et al., ICRA 2012 } ICP + Hand-eye calibration P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 4

Data Annotation Challenges 3D object labels: 22 Annotators Occlusion labels: Mechanical Turk P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 5

Data Statistics 150k Number of Labels 100k 50k 0 Car Van Truck Pedestrian Person (sitting) Cyclist Tram Misc P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 6

Data Statistics 150k Number of Labels 100k 50k 0 Car Van Truck Pedestrian Person (sitting) Cyclist Tram Misc P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 6

Data Statistics P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 6

Novel Challenges Middlebury Stereo Evaluation Version 2 Average errors: 2 3% (non-occluded regions) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 7

Novel Challenges Middlebury Stereo Evaluation Version 2 Average errors: 2 3% (non-occluded regions) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 7

Novel Challenges Fast guided cost-volume filtering (Rhemann et al., CVPR 2011) Middlebury, Errors: 2.7% Error threshold: 1 px (Middlebury) / 3 px (KITTI) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 8

Novel Challenges Fast guided cost-volume filtering (Rhemann et al., CVPR 2011) Middlebury, Errors: 2.7% KITTI, Errors: 46.3% Error threshold: 1 px (Middlebury) / 3 px (KITTI) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 8

Novel Challenges So what is the difference? Middlebury KITTI Laboratory Lambertian Rich in texture Medium-size label set Largely fronto-parallel Moving vehicle Specularities Sensor saturation Large label set Strong slants P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 9

Novel Challenges So what is the difference? Middlebury KITTI Laboratory Lambertian Rich in texture Medium-size label set Largely fronto-parallel Moving vehicle Specularities Sensor saturation Large label set Strong slants P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 9

Novel Challenges So what is the difference? Middlebury KITTI Laboratory Lambertian Rich in texture Medium-size label set Largely fronto-parallel Moving vehicle Specularities Sensor saturation Large label set Strong slants P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 9

Novel Challenges So what is the difference? Middlebury d = 16 px KITTI d = 0 px d = 50 px d = 150 px Laboratory Lambertian Rich in texture Medium-size label set Largely fronto-parallel Moving vehicle Specularities Sensor saturation Large label set Strong slants P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 9

Novel Challenges So what is the difference? Middlebury KITTI Laboratory Lambertian Rich in texture Medium-size label set Largely fronto-parallel Moving vehicle Specularities Sensor saturation Large label set Strong slants P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 9

Stereo Evaluation 200 training images / 200 test images P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 10

Stereo Evaluation 200 training images / 200 test images P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 10

Stereo Evaluation Particle Convex Belief Propagation (PCBP): Best Results Errors: 0.5% Errors: 0.5% Natural scenes, lots of texture, no objects A couple of wrong pixels at poles P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 10

Stereo Evaluation Particle Convex Belief Propagation (PCBP): Worst Results Errors: 19.5% Errors: 21.1% Inner city scenes, lots of objects Textureless surfaces, sensor saturation, reflections P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 10

Optical Flow Evaluation Second Order Total Generalized Variation: Best Results Errors: 0.5% Errors: 0.5% City scenes with slow motion (intersections) Small flow vectors (< 30 px) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 11

Optical Flow Evaluation Second Order Total Generalized Variation: Worst Results Errors: 56.5% Errors: 58.8% Difficult lighting conditions, highway driving Large flow vectors (> 150 px) P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 11

SLAM Evaluation 22 sequences 40 kilometers P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 12

SLAM Evaluation 11 training sequences / 11 test sequences P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 12

Challenges: Visual Odometry Ground truth from GPS+IMU Metric: For all frame combinations (i, j): P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 13

Challenges: Visual Odometry Ground truth from GPS+IMU Metric: For all frame combinations (i, j): P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 13

Challenges: Visual Odometry Ground truth from GPS+IMU Metric: For all frame combinations (i, j): Translation Error [%] 30 25 20 15 10 VISO2-S VO3ptLBA VO3pt VOFSLBA VOFS VISO2-M GT V O3pt Rotation Error [deg/m] 0.3 0.25 0.2 0.15 0.1 VISO2-S VO3ptLBA VO3pt VOFSLBA VOFS VISO2-M 5 0.05 0 0 10 20 30 40 50 60 70 80 90 Speed [km/h] 0 0 10 20 30 40 50 60 70 80 90 Speed [km/h] P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 13

Object Detection and Orientation Image Selection for Training and Test Sequences are added exclusively to training or testset Maximize no. of non-occluded objects Images with many objects Diversity in object orientation Entropy Maximization [ ] X X argmax α noc(x) + 1 C H c (X x) x C X : current set x: image from the whole dataset c=1 noc(x): no. of non-occluded objects in an image C: no. of object classes H c: entropy of class c with respect to the orientation P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 14

Object Detection and Orientation Image Selection for Training and Test Sequences are added exclusively to training or testset Maximize no. of non-occluded objects Images with many objects Diversity in object orientation Entropy Maximization [ ] X X argmax α noc(x) + 1 C H c (X x) x C X : current set x: image from the whole dataset c=1 noc(x): no. of non-occluded objects in an image C: no. of object classes H c: entropy of class c with respect to the orientation P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 14

Object Detection and Orientation Image Selection for Training and Test Sequences are added exclusively to training or testset Maximize no. of non-occluded objects Images with many objects Diversity in object orientation Entropy Maximization [ ] X X argmax α noc(x) + 1 C H c (X x) x C X : current set x: image from the whole dataset c=1 noc(x): no. of non-occluded objects in an image C: no. of object classes H c: entropy of class c with respect to the orientation P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 14

Object Detection Evaluation Evaluation Criteria difficulty height occlusion truncation easy > 40px fully visible < 0.15 moderate > 25px partly occluded < 0.30 hard > 25px mostly occluded < 0.50 Overlap o Criterion per Class class overlap car > 0.7 pedestrian > 0.5 cyclist > 0.5 o = GT DET GT DET Object Detector used as baseline for the benchmark Discriminatively Trained Deformable Part Models v4 [1] [1] P. Felzenszwalb et. al. Cascade Object Detection with Deformable Part Models. CVPR 2010. P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 15

Object Detection Evaluation Evaluation Criteria difficulty height occlusion truncation easy > 40px fully visible < 0.15 moderate > 25px partly occluded < 0.30 hard > 25px mostly occluded < 0.50 Overlap o Criterion per Class groundtruth class overlap car > 0.7 pedestrian > 0.5 cyclist > 0.5 o = GT DET GT DET detection Object Detector used as baseline for the benchmark Discriminatively Trained Deformable Part Models v4 [1] [1] P. Felzenszwalb et. al. Cascade Object Detection with Deformable Part Models. CVPR 2010. P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 15

Object Detection Evaluation Evaluation Criteria difficulty height occlusion truncation easy > 40px fully visible < 0.15 moderate > 25px partly occluded < 0.30 hard > 25px mostly occluded < 0.50 Overlap o Criterion per Class groundtruth class overlap car > 0.7 pedestrian > 0.5 cyclist > 0.5 o = GT DET GT DET detection Object Detector used as baseline for the benchmark Discriminatively Trained Deformable Part Models v4 [1] [1] P. Felzenszwalb et. al. Cascade Object Detection with Deformable Part Models. CVPR 2010. P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 15

Object Orientation Evaluation Evaluation Metric: Orientation Average Orientation Similarity (AOS) AOS = 1 11 max s( r) r: r r r {0,0.1,..,1} Normalized Cosine Similarity s(r) groundtruth detection s(r) = 1 D(r) i D(r) 1 + cos (i) θ δ i 2 D(r): set of all object detections at recall rate r (i) θ : difference in angle between estimated and ground truth orientation of detection i δ i = 1 if detection i has been assigned to a ground truth bounding box δ i = 0 if no ground truth has been assigned P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 16

Object Orientation Evaluation Evaluation Metric: Orientation Average Orientation Similarity (AOS) AOS = 1 11 max s( r) r: r r r {0,0.1,..,1} Normalized Cosine Similarity s(r) groundtruth detection s(r) = 1 D(r) i D(r) 1 + cos (i) θ δ i 2 D(r): set of all object detections at recall rate r (i) θ : difference in angle between estimated and ground truth orientation of detection i δ i = 1 if detection i has been assigned to a ground truth bounding box δ i = 0 if no ground truth has been assigned P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 16

Object Detection Evaluation 1 0.8 Car Easy Moderate Hard Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 17

Object Detection Evaluation 1 0.8 Car Easy Moderate Hard Orientation Similarity 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 17

Object Detection Evaluation 1 0.8 Pedestrian Easy Moderate Hard Precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 18

Object Detection Evaluation 1 0.8 Pedestrian Easy Moderate Hard Orientation Similarity 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Recall P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 18

Which cars are hard to see? Bad illumination conditions P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 19

Which cars are hard to see? Bad illumination conditions P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 19

Which cars are hard to see? Occlusions P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 19

Which cars are hard to see? Occlusions P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 19

Which pedestrians are hard to see? Children P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 20

Which pedestrians are hard to see? Carried Items P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 20

Conclusion Where are we now? Realistic dataset with 3D ground truth Stereo Optical flow SLAM Object detection / orientation estimation Object Tracking Recognition meets Reconstruction Challenge Complement existing benchmarks / reduce overfitting Submit your results: www.cvlibs.net/datasets/kitti P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 21

Conclusion Where are we now? Realistic dataset with 3D ground truth Stereo Optical flow SLAM Object detection / orientation estimation Object Tracking Recognition meets Reconstruction Challenge Complement existing benchmarks / reduce overfitting Submit your results: www.cvlibs.net/datasets/kitti P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 21

Conclusion Where are we now? Realistic dataset with 3D ground truth Stereo Optical flow SLAM Object detection / orientation estimation Object Tracking Recognition meets Reconstruction Challenge Complement existing benchmarks / reduce overfitting Submit your results: www.cvlibs.net/datasets/kitti P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 21

Conclusion Where are we now? Realistic dataset with 3D ground truth Stereo Optical flow SLAM Object detection / orientation estimation Object Tracking Recognition meets Reconstruction Challenge Complement existing benchmarks / reduce overfitting Submit your results: www.cvlibs.net/datasets/kitti P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 21

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps Lenz et al., IV 2011 P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps Williams et al., RAS 2009 P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps Agarwal et al., ICCV 2009 P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps Wojek et al., ECCV 2008 P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps Geiger et al., CVPR 2011 and NIPS 2011 P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Conclusion But KITTI is much more... 3D object tracking Loop closure (SLAM) Structure-from-Motion Semantic segmentation (class labels) 3D scene understanding (layout and objects) Use of maps OpenStreetMap The free Wiki World Map P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 22

Related Datasets and Benchmarks Stereo / Optical Flow setting #images ground truth EISATS synthetic 500 dense Middlebury laboratory 40 dense Make3D Stereo real 260 0.5 % Ladicky real 70 manual KITTI real 400 50 % SLAM setting length metric TUM RGB-D indoor 0.4 km New College outdoor 2.2 km Malaga 2009 outdoor 6.4 km Ford Campus outdoor 5.1 km KITTI outdoor 39.2 km P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 23

Related Datasets and Benchmarks Object Detection #cat. #labels/cat. occlusion 3D orientation Caltech 101 101 40-800 MIT StreetScenes 9 3k LabelMe 4000 60 ETHZ Pedestrian 1 12k PASCAL 2011 20 1k Daimler 1 56k Caltech Pedestrian 1 350k COIL-100 100 72 discrete EPFL Multi-View 20 90 discrete Caltech 3D 100 144 discrete KITTI 3 1k - 40k continuous P. Lenz: The KITTI Vision Benchmark Suite www.mrt.kit.edu 23