Terrain Traversability Analysis using Organized Point Cloud, Superpixel Surface Normals-based segmentation and PCA-based Classification Aras Dargazany 1 and Karsten Berns 2 Abstract In this paper, an stereo-based terrain traversability analysis and estimation approach for all terrains in offroad mobile robotics (Unmanned Ground Vehicles or UGVs) is proposed and presented. The proposed approach reformulates the problem of terrain traversability analysis and divides it into two main problems: surface detection and analysis. The proposed approach is using organized dense point cloud (imagelike structure), mainly produced by stereo cameras. In order to detect all the existing surfaces as superpixel segments, an image segmentation technique is applied to the generated point cloud using geometry-based features (pixel-based normal). Having detected all the surfaces in the generated point cloud of the terrain, SSA approach (Superpixel Surface Analysis) is applied on all of the detected surfaces (superpixel segments) in order to classify them based on their traversability index. The proposed SSA approach is based on: (1) Superpixel surface normal and plane estimation, (2) Traversability analysis using superpixel surface planes. Having analyzed all the superpixel surfaces based on their traversability, these surfaces are finally classified into five main categories as following: traversable, semi-traversable, non-traversable, unknown and undecided. I. INTRODUCTION The first step for autonomous navigation of UGVs is the environment perception (as shown in figure 1) and finding out where it is possible to traverse or which part of the terrain is traversable. This Analysis of the acquired sensor data in order to distinguish between the traversable and nontraversable areas in the environment is known as terrain traversability analysis. A. Motivation In ICARUS project (i.e Integrated Components for Assisted Search and Unmanned Rescue), two UGVs (i.e Unmanned Ground Vehicles) are employed for coping with disastrous terrains such as earthquakes, collapsed buildings and etc. Among these two UGVs, one of them is large and another one is small which will be employed in the above scenarios. Autonomous navigation is essential due to likelihood of losing signal for UGV control in the above scenarios. B. Contribution The proposed approach is mainly reformulating the problem of terrain traversability analysis and dividing it into four main problems: terrain and environment perception = 3D reconstruction of the surrounding environment terrain feature extraction = using pixel-based normals (geometry-based feature) and pixel-based textures (appearance-based feature) terrain surface detection = collection of surfaces - how to detect all surfaces (important ones) hybrid-feature based segmentation surface traversability analysis = classification of all surfaces based on traversability - how SSF (Superpixel Surface Analysis) C. Outline The remaining paper is structured as follows: Section II reviews briefly state-of-the-art approaches. Section III explains the proposed approach in details. Section IV shows the results from our experiments. Section V concludes our work and explain our ideas for future. II. RELATED WORK Fig. 1. Navigation of small UGV (Unmanned Ground Vehicle) in disaster city in 2010. 1 A. Dargazany and K. Berns are with Robotics Research Lab, Department of Computer Science, University of Kaiserslautern, Germany Our recent work [1] in terrain traversability estimation is mainly based on one of these geometry based features called superpixel surface normals. In this work, we have been using a static stereo camera system as point cloud generator sensor. Also it was explained how superpixel normals can help us point cloud segmentation and then terrain classification based on traversability criteria. This work was composed of 3 main steps in traversability analysis: 1) Superpixel surface normals estimation using Integral Image method 2) Connected Component segmentation
3) Classification of segments using PCA (Principle Component Analysis) and traversability criteria such as max traversable step and slope There is similar work in [5] which was inspiring for us to propose our recent approach in [1]. This work is using RGB- D cameras such as Kinect and Xtion for object detection and recognition on a table. The main difference of this work compared to our recent work is that they are using: Geometry-based features such as normals, euclidean distance Appearance-based features such as RGB features for segmentation and refinement of the segments and noisy parts. In [6], a segmentation technique is proposed for detecting the traversable and drivable road surface using surface normals. This approach is applied to a very planar surface such as road which actually does not work for a very rough terrain but it can be adapted to different terrains. In [2], a similar approach like our recent work [1] has been proposed for traversability analysis using Kinect on mobile robots. The proposed traversability estimation approach in this work is splitted into two main steps as follows: 1) Preprocessing: the system estimates the local traversability map: Online which is based on single depth frame Navigation capability of the vehicle 2) Integration of a single frame into the traversability map In [3] and [4], a new normal based feature is introduce for roughness estimation which is called UPD (Unevenness Point Descriptor). This feature is basically describing the unevenness and roughness on one point (superpixel roughness and unevenness). This normal-based feature is describing the superpixel surface slope and step. This combination in superpixel is usually called roughness or unevenness. III. PROPOSED APPROACH The proposed algorithm can be mainly explained into two main steps: point cloud generation and traversability analysis using point cloud processing (figure 2). A. Point cloud generation Terrain and environment perception can be accomplished by accurate 3D reconstruction of the surrounding environment which is performed by generating the point cloud. The organized dense point cloud can be generated using stereo camera, RGB-D cameras (Xtion, Kinect) or ToF cameras. This point cloud is called organized point cloud due to its image-like structure since they are basically produced by images. In this work, the organized point cloud is mainly generated using stereo cameras. 1) Stereo image acquisition: Since stereo image acquisition is not the focus of this paper, it is assumed that capturing well-exposed images in outdoor settings (CMOS sensors, HDR image acquisition) is used. Stereo calibration and rectification Fig. 2. Work flow of the proposed approach in stereo-based terrain traversability estimation: (top) point cloud generation, (bottom) traversability analysis. Fig. 3. Stereo images before and after rectification: (top) before rectification, (bottom) after rectification and cropped. An offline calibration is performed for stereo image rectification using OpenCV stereo calibration tool [9]. The raw images before rectification and after rectification are shown in figure 3 and figure 4. Once stereo images are rectified, as shown in figure 4, the minimum rectified common area in both of the left and right images are measured and used for cropping the images. Stereo matching For point cloud generation, stereo matching is applied on stereo images for detecting the corresponding pixels on left and right images in order to use triangulation for measuring the depth value using the distance between (or displacement of) the corresponding left and right pixel. All of the stereo matching techniques are mainly divided into two main categories of local and global approaches [7]. For better evaluation of the results, two stereo matching techniques have been used for disparity map generation of the terrain surface: Block-based stereo matching (BB) [8], Adaptive Cost - 2-Pass Scanline Optimization(ACSO)
Fig. 4. Stereo rectification lines (epipolar lines) are visualized on already rectified and cropped left and right images for eye-checking. Fig. 5. Disparity images generated by: (left) ACSO (Adaptive Cost 2-pass Scanline Optimization), (right) BB (Block Based). [11]. The yellow area in figure 5 shows that there is no disparity values available and the remaining in gray-scale is showing the disparity values starting from white (close distance or 255 value) to black (far distance or 0 value). These two different state-of-the-art approaches are used for generating dense organized point clouds in order to have a better comparison of the traversability results. The generated disparity map using one of these approaches, ACSO (Adaptive Cost 2-pass Scanline Optimization) [7] illustrated in figure 6. 3D reconstruction Having calculated the disparity map, it is possible to generate the point cloud associated to the left image. Left image is chosen to be reconstructed in 3D using the below equations: Z = (f ocal/disparity) baseline X = ((U U p)/f ocal) Z Y = ((V V p)/f ocal) Z In figure 7, the 3D reconstructed disparity image is showing (X, Y, Z) along with the corresponding disparity value generated by both stereo matching techniques BB and ACSO. Fig. 6. This is the disparity map visualized in 2D using ACSO (Adaptive Cost 2-pass Scanline Optimization) stereo matching technique [7]. Fig. 7. This is the 3D reconstruction of the left image visualized in 3D, generated using stereo camera as an input for traversability analysis. B. Feature extraction Terrain feature extraction is performed by extracting pixelbased normals (geometry-based feature) and pixel-based textures (appearance-based feature) from the generated point cloud of the surrounding environment. 3D feature estimation from point cloud data is a very important and initial step in point cloud processing. Among all of the geometry-based features, two of the most widely used geometric point features, at a query point p on the surface based on its neighboring points, are: curvatures and normals. Both of these features are considered local since they describe the neighboring points (i.e. descriptors). There are three ways of computing normals based on [10] which are different in optimizing the trade-off between the most accurate normals at every point in a point cloud vs the fast way of computing them as below: COVARIANCE MATRIX - creates 9 integral images to compute the normal for a specific point from the covariance matrix of its local neighborhood. AVERAGE 3D GRADIENT - creates 6 integral images to compute smoothed versions of horizontal and vertical 3D gradients and computes the normals using the crossproduct between these two gradients. AVERAGE DEPTH CHANGE - creates only a single integral image and computes the normals from the average depth changes. Pixel-based (point-based) normal features are extracted at first from the point cloud using integral images (COVARIANCE MATRIX as mentioned and explained above) [10]. Using these pixel-based normals on the surface, it is possible
Fig. 10. Terrain classification using superpixel surface traversability analysis is visualized in modular architecture. Fig. 8. 3D. The resulting extracted normals from point cloud is visualized in Fig. 9. The resulting segmented point cloud is visualized in 2D: (left) the original left image, (right) the segmented point cloud of the left image. to define the roughness as shown in figure 8. C. Image segmentation Terrain surface detection is performed by collecting and detecting all surfaces (important ones) using the extracted features. This is accomplished by segmentation technique which is converting the generated point cloud to a collection of superpixel surfaces (segments). These superpixel surfaces are basically representing the important surfaces in the 3D reconstructed surrounding environment. This segmentation technique is similar to image segmentation approaches since the used organized point cloud has the image-like structure. The segmentation results are shown in figure 9. The difference between the point normal n p1 and neighboring normals n p2 should be smaller than roughness α r as below: n p1 n p2 cos (α r ) (1) D. Terrain classification Surface traversability analysis is performed by classification of all detected surfaces based on their traversability content. In this section, the resulting segments (all detected superpixel surfaces) are analyzed based on their point distribution using PCA, approximate each segment with a plane, define the required traversability parameters and criteria needed for classification and traversability index generation as shown in modular architecture in figure 10. The required traversability parameters in general includes: max surface roughness - required for segmentation and superpixel surface detection. gravity normal in camera coordinate system g c = (0, 0, 1) or the expected ground plane normal max slope alpha max - the maximum declination which defines vehicles mechanical and kinematics capabilities. dominant ground plane max step h max - the maximum possible step (i.e. max height) the robot can climb which is also related to vehicles mechanical and kinematics capabilities. These traversability parameters are mainly based on vehicle type, size, kinematics capability of the vehicles, robots or UGVs and the application environment or scenario. 1) Superpixel surface normal and plane estimation: In this subsection, the point distribution analysis on each superpixel surfaces or segments are performed using PCA. Given this distribution, the surface plane parameters estimation is measured for each segments. Superpixel surface normal estimation using PCA In order to classify these superpixel surfaces, these segments will be analyzed based on their point distribution as below: 1) Min inlying points inliers min in each segment as a threshold to avoid noisy segments. This threshold is often empirical. In our case, it is set to a ratio of image height and width as follows inliers min = img w img h 0.02. 2) Computing mean and covariance for each segments. 3) Applying PCA on the calculated covariance for each segment, analyzing each segments in eigen space and extracting the eigen values and eigen vectors 4) Using the smallest eigenvector for each segment, we calculate the main normal of the segment located at centroid (center of gravity on segment) (A, B, C) 5) Using the each segment mean and normal, we can transform the segment into one surface using centroid (center of gravity)and its normal in order to define the coefficients of the approximated segment plane Ax + By + Cz + D = 0 Superpixel surface plane estimation At the end of this stage, we are bale to approximate every segment with one plane (planar surface) using its plane parameters such as: hessian normal (A, B, C) offset from origin (D) centroid of the plane (center of gravity) which the computed mean (X c, Y c, Z c ) 2) Traversability analysis using superpixel surface planes: In order to classify the terrain based its traversability, all estimated superpixel surface planes should be analyzed based on their slope and step. The detailed description of this analysis is provided inn this subsection.
Gravity normal estimation In order to be able to analyze the slope of the segmented planes in the point cloud, gravity in camera coordinate system is required. This gravity is measure as below equation: g c = g cos (a w + a r ) (2) Slope analysis using gravity normal The expected gravity normal (or ground plane normal) is n = (0, 0, 1) if the camera is not tilted. Usually in mobile robots, IMU sensor can used for more accurate estimation of gravity normal. In this work, it is also proposed to measure this by applying the stereo camera tilting angle (T heta) to the hessian expected ground plane normal (0, 0, 1) as also exlained above. This slope is max ramp or sloped surface the vehicle or UGV can drive. As we said before, this is based the vehicle information and type. This parameters is used for slope thresholding of the segments with the nominal ground plane normal. For traversability estimation of the segments (which are now approximated using a plane), we should use a comparison function. This comparison function is comparing the normals of the segments with nominal ground plane normal using max slope as threshold. Max slope is based on degree. In this part, the plane normals of the segments n p and gravity in the camera coordinate system g c are compared as below equation: n p g c cos (α max ) n p g c cos (α max ) (3) Using this slope analysis, a traversability index can be assigned to all the segments: traversable, semi-traversable and non-traversable. Dominant ground plane estimation Having classified the segments, we can detect the largest traversable terrain segment at first (the segment with the max number of inlying points). This dominant traversable segment is required for: 1) Step detection with other segments plane centroid to make sure if they are traversable or not. 2) Quality of the generated dense point cloud can also be checked based on the number of inlying point in dominant traversable plane. If the dominant traversable segment is too small (smaller than a specific threshold for number of inlying points), the we can say that the point cloud quality is not good enough or it is noisy and we can discard the point cloud and go to point cloud generated from the next frame. Among all the traversable plane, the dominant ground plane can be detected by counting the number of inlying points inliers min. This threshold is set using image height and width as follows: inliers min = img w img h 0.02 (4) Using the dominant ground plane, it is also possible to check the quality of the generated point cloud using the actual number of inlying points. Point Cloud quality check Having detected the dominant ground plane, it is possible to analyze the quality of the generated point cloud based on the quality of the dominant ground plane since this plane is the most confident traversable superpixel surface in the generated point cloud so that it is highly possible that it indicates the main ground plane on which the vehicle is moving and traversing. The quality of the dominant ground plane is measured based on the size of the plane (the number inlying points) and its superpixel surface normal. If the number of inliers in the plane inliers Ph is more than a specific threshold and the surface normal is positive, the frame will pass the quality check otherwise the frame will be discarded and will not be used for further analysis. Step analysis of superpixel planes This is max step or gap or height or elevation between two consecutive stairs that a robot can traverse. This is usually a very well-known parameters which is also based on vehicle type. This parameters is used for detecting the steps between two segments. We should at first detect the dominant traversable segment and then we can measure the centroid of other segment to the dominant plane using point to plane distance and compare it with max step. The distance from dominant ground plane and the gravity center (or centroid) of other traversable planes can be analyzed as follows: p d p c h max (5) This point-to-plane distance is a perpendicular distance of the point to the plane. IV. RESULTS To evaluate the performance of the proposed terrain traversability estimation method, the whole terrain cloud (or image) is classified into five classes: 1) Traversable region is green. 2) Semi-traversable regions are in blue. 3) Non-traversable obstacles regions are colorized in red. 4) Unknown regions are colorized in black - no depth or no disparity values 5) Undecided no color A series of gray stereo images are generated by stereo system shown in figure 11. The terrain classification results are shown in figure 12, 13 and 14. V. CONCLUSION AND FUTURE WORK The proposed approach is using superpixel surface traversability analysis for a better description of all-terrain geometry and more accurate terrain classification based on vehicle kinematics capability. This approach produces reasonable results in detecting all important surfaces in all terrain using only geometry-based features such as normals but only relying on geometry-based feature may result in
Fig. 11. This stereo camera is mounted on platform RAVON and used to generate the gray-scale stereo images in our experiments. Fig. 14. Disparity image and dominant ground plane are shown in 3D (point cloud) and 2D (image): (top row) the disparity image is shown in 3D left hand and 2D right hand. The depth values are shown in gray scale colors. (bottow row) the dominant ground plane is shown in white and red indicates the dominant obstacle plane. The arrows on the 3D point cloud in left hand is indicating the surface normals. Bad frame or red cross arrows indicates that the frame quality is low and the frame is discarded. some inaccurate surfaces due to nosiy point cloud. Using appearance-based features along with geometry-based may help segmentation module is detecting more stable and robust superpixel surfaces. That is why pixel-based texture segmentation might increase the accuracy of the surface detection. VI. ACKNOWLEDGEMENTS Fig. 12. The terrain classification results using the proposed traversability analysis approach vs raw generated input point cloud are visualized in 3D (point cloud) and 2D (image): (top row) terrain classification results in 3D left hand and 2D right hand, (bottom row) input raw generated organized point cloud in 3D left hand and in 2D right hand. Fig. 13. Disparity image and dominant ground plane are shown in 3D (point cloud) and 2D (image): (top row) the disparity image is shown in 3D left hand and 2D right hand. The depth values are shown in gray scale colors. (bottom row) the dominant ground plane is shown in white and red indicates the dominant obstacle plane. The arrows on the 3D point cloud in left hand is indicating the surface normals. This work is part of ICARUS project which is funded by European Union Seventh Framework Programme (FP7/20072013) under grant agreement number 285417. R EFERENCES [1] A. Dargazany and K. Berns, Stereo-based Terrain Traversability Estimation using Surface Normals, ISR/Robotik 2014. [2] I. Bogoslavsky et al, Efficient traversability analysis for mobile robots using Kinect sensors, 2013. [3] M. Bellone et al, Unevenness point descriptor for terrain analysis in mobile robot applications, Int Journal Advanced Robotic Sys, 2013. [4] M. Bellone et al, 3D traversability awareness for rough terrain mobile robots, Journal of Emerald - Sensor Review, 2014. [5] A. Trevor et al, Efficient organized point cloud segmentation using connected components, 2013. [6] A. Trevor, F. Tombari, R. B. Rusu: Honda Research Code Sprint: Road Segmentation, Point Clouds Library, 2012. [7] F. Tombari, S. Mattoccia, and L. Di Stefano: Stereo for robots: quantitative evaluation of efficient and low-memory dense stereo algorithms, In Proc. Int. Conf. on Control Automation Robotics and Vision (ICARCV), 2010. [8] D. Scharstein and R. Szeliski: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. Jour. Computer Vision, 2002. [9] Open Computer Vision Library: OpenCV: Open Computer Vision stereo calibration, www.opencv.org [10] Point Clouds Library: www.pointclouds.org [11] Liang Wang, Miao Liao, Minglun Gong, Ruigang Yang, and David Nister: High-quality real-time stereo using adaptive cost aggregation and dynamic programming, In Proc. 3rd Int. Symposium 3D Data Processing, Visualization and Transmission (3DPVT06), 2006.