SGN-1650/SGN-1656 Signal Processing Laboratory Constructing 3D model from stereo image pair 1 General information In this exercise, a 3D model is constructed using stereo image pair. The exercise work includes planning of the imaging arrangement, taking the images with digital camera, constructing the model, and writing a report. During the work, the students learn how to construct the 3D model. The errors caused by inaccuracies in measurement or imaging are also considered. Group size of two students is very highly recommended, and the group is assumed to know the basics of image processing using MATLAB. Guidance for this exercise work is provided via e-mail by Pekka Ruusuvuori (pekka.ruusuvuori@tut.fi). To summarize, this exercise consists of the following phases: 1. Taking the images (together with the assistant) in the image processing laboratory (TE407). Contact the assistant and arrange time for image processing laboratory. 2. Constructing the 3D stereomodel based on the exercise instructions. The work is done with MATLAB. Guidance is given by e-mail. 3. Writing a short report covering all tasks (1-6) and including answers to all questions (1-6). The report (preferably in pdf format) and MATLAB codes are returned by e-mail. Include the names, student numbers and e-mail addresses of all group members into the report. 2 Imaging geometry Here, the 3D model is reconstructed from two images taken with parallel axes; in other words, we use the normal stereo model. The images are taken with similar camera parameters by moving the camera horizontally. Such situation is illustrated in Figure 1. Thus, the first task is to take images from an object that is visible in both images. Task 1. Take images in image processing laboratory (TE407). Book time for the laboratory by e-mail (pekka.ruusuvuori@tut.fi). The imaging is done together with the assistant. Question 1. What are the values of camera and other imaging parameters needed in forming the model? Here the imaging is performed with one camera. However, to avoid movement of object between shots, simultaneous imaging with two cameras would be preferable. 1
Question 2. Why is imaging with one camera potentially problematic when constructing a 3D model? The movement of camera causes parallax p between the two images. The parallax (change in the position of an object caused by a change in position of the observer) is defined by p = x r x l, (1) where x r is the coordinate of a point in the right side image and x l is the coordinate of the same point in the left side image. Here it is assumed that the movement occurs only along x-axis. An illustrative example of parallax is given in the lower part of Figure 1. 3D model basically means that the depth information is included in the image. Thus, we need to use the stereo image pair for calculating the depth information. Following the notation used in Figure 1, the depth can be obtained by using the following equation H h = fb p, (2) where f is the focal length in camera, B is the distance between images, and p is the parallax. Question 3. Equation 2 defines how the depth information is obtained based on stereo imaging. The Equation 2 can be obtained from the schematics given in Figure 1. How? Explain briefly in the report (hint: equiform triangles). If you wish, you may give illustrative figures in the report. 3 Correspondence between images Finding the correspondence between images is the most challenging step in 3D stereo modeling. Any incorrectly matched point will lead to error in depth map. Basically, determining the corresponding points between images means calculating the parallax for each point. This information is needed in the 3D model, and as Equation 2 suggests, it is the only parameter varying between different points in the object. As the number of required points is high, an automated method for locating the most similar points in two images is preferred over manual locating. Cross correlation is the basic method for searching correspondent pixels between images. The assumption is that for each point in one image, there exists a match in the other image. The corresponding point is expected to give the highest correlation. The cross correlation is calculated for X Y area in the image with window of size X win Y win. The highest correlation found in the X Y area is the match for the corresponding point. Note that X Y defines the area where the correlation is calculated. Thus, also the corresponding point should be located within this area, otherwise the correct point will not be found. The illustration in Figure 1 presents the principle of stereoimaging by using a single point as an example. Similarly, solving Equation 2 only gives one distance to the model. The same procedure is repeated for all the points in the point grid. Notice that the grid does not have to correspond to all the pixels in the image. A more dense grid will lead to a more accurate model, whereas a sparse grid will 2
H-h h Image l B (xl,yl) Image r (xr,yr) f H Images l&r p f Figure 1: Geometric representation of the normal stereomodel. Images l and r stand for left and right side images, respectively. The distance between images is given in B, H is a distance to a common reference, and H-h is the distance to the object, in other words it is the desired depth information. Below, the parallax is visualized based on the geometric model by superposing images l and r. reduce the computational cost. Note also that the larger the search area X Y and the window area X win Y win, the more it takes time to calculate the cross correlation. An example of the image pair with approximately the same are covered is given in Figure 2. The grid of points for which the parallax will be defined is superposed to the right side image. Notice that the grid in Figure 2 is very loose for visualization purpose; for the actual model it needs to be much more dense. Task 2. Create a grid for the area of interest (face) in one image, say, the left side image (help meshgrid). First, crop the images such that only the region of interest is included. Make sure the area is approximately the same in both images. However, you will need to know the position of the cropped images with respect to the original images, or at least the difference between cropped areas in pixels, in order to get the parallax correctly calculated. Task 3. After cropping the essential part of both images, use the provided function (help locate_crosscorr) for finding the corresponding points in the other (right side) image. Restrict the X Y search area in the right hand side image to approximately match the grid by looking how much for example the distance between eyes or noses in the two images varies in both x and y direction. Now you need to subtract the found corresponding points and the difference created by cropping the images from the grid point values, the resulting points are the parallax p for each point in the grid. Here, only parallax for x-coordinates is needed. Thus, you may discard the y-dimension. After calculating the parallax, you will need to reshape the resulting point vector (help reshape). You may 3
Figure 2: Example of a cropped stereo image pair with approximately the same area covered. The point grid for which the parallax is defined is superposed on the left side image. plot the points you found in order to see the locations of corresponding points. Compare the result with the grid. 4 Constructing the 3D model using stereo image pair The last step is to form the 3D model based on the calculated parallax and by utilizing the information about the imaging geometry and camera parameters. The necessary parameters are the focal length f and the distance between image axes B, as can be seen from Equation 2. In order to get the distances in the model in millimeters, the p values have to be multiplied by the mm/pixel rate. The rate is obtained from the camera detector size in millimeters and in pixels. Question 4. the lab? What is the size of one pixel in millimeters for the images taken in Task 4. Construct the 3D model (in other words, calculate the values for H h) by using the points calculated in Task 2 and by applying the Equation 2. Try plotting the model in 3D plot. Try also visualization by using surface (help surf). Task 5. Sometimes some of the correspondence points are falsely detected, and the corresponding parallax distances are not correct. This may cause visually very disturbing high/low peaks in the model, see examples in Figure 3. If the surface contains noise (i.e. high peaks), try if median filtering helps. Sometimes the peaks are due to too small search area X Y used in finding the corresponding points. Thus, you may also try modifying the search area. Try repeating the task 3 with varying window sizes for both X win Y win and X Y, and search for the values that yield to the most accurate model. Question 5. List some error sources that limit the model accuracy. Did your 4
Figure 3: 3D models generated using stereo imaging. On the left the model is visualized in HSV colormap surf. On the right, the original image is used as a texture of the model. The model is filtered in order to remove noise (that is, miscalculated parallax due to wrongly detected points). model include error peaks? In case the model had error peaks, did median filtering improve the quality of the model? Did median filtering introduce any disturbing features? Discuss the errors present in your model. What, in your opinion, are the most crucial sources of errors? How to eliminate/minimize them? Question 6. What are your suggestions for improving this exercise work? What was the most difficult part? Feel free to give feedback and make suggestions. Task 6. Write a short report of the laboratory work where you answer the questions addressed in this document. Include plots of the model as well as figures from the previous tasks and sub-phases. Write also short descriptions of what you did and learned in each task, and include comments. Return the report (in pdf format) along with the MATLAB codes by e-mail to the assistant. References [1] F. H. Moffitt and E. M. Mikhail. Photogrammetry. Harper & Row, New York, 3rd edition, 1980. [2] L. G. Shapiro and G. C. Stockman. Computer Vision. Prentice Hall Inc., New Jersey, 2001. 5