Integer Computation of Image Orthorectification for High Speed Throughput Paul Sundlie Joseph French Eric Balster Abstract This paper presents an integer-based approach to the orthorectification of aerial imagery. The orthorectification process is a back-projection algorithm with the use of a CAHV camera model, Digital Terrain Map (DTM), and implementation of the collinearity equations. In many airborne imaging systems, the orthorectification process is a computational bottleneck which hinders processing throughput. The proposed integer-based approach reduces the computation time of the projection process by an average of 27.6%. In addition, the proposed solution easily lends itself to further processing improvement using embedded solutions such as FPGAs. I. INTRODUCTION The use of digital aerial imagery for surveying large areas of land is in use across a variety of disciplines. A common trait within these applications is the benefit of rectified image data. Image rectification refers to the processing of a raw image in order to produce a result that can be superimposed to a map. Depending on the accuracy of the rectification method chosen, the result may or may not be an orthoimage; the term orthoimage being used only for resulting imagery of the greatest achievable accuracy [3]. With the right components, the creation of one orthoimage is not troublesome. However, when producing aerially acquired real-time digital orthoimages, the computational burden becomes an obstacle. To produce an orthoimage, the user must have knowledge of two sets of data: the parameters of the camera system used to acquire the data, and elevation information of the observed area [3]. The characteristics of the camera system consist of the external and internal parameters. The internal parameters provide information on the image orientation with variables such as the focal length and location of the principal point, while the exterior parameters provide the position and orientation of the camera system in world coordinates [2]. The second set of data required is the elevation information of the area in view. This information comes in the form of a Digital Elevation Map (DEM). The term DEM is a general specification; specific terms include DTM, which provides ground elevation information, and Digital Surface Map (DSM), which includes the highest elevation at each point (such as the top of a building) [3]. With the required data sets present, the next step is to specify the algorithm for rectification. Various methods can be found in the literature, such as polynomial rectification or projective transformation [6]. One method, known as backprojection, is given in [5, 6]. An optimization of backprojection is the focus of this paper. Camera systems of today produce images consisting in the millions of pixels. When taken from the air, these images will often be acquired with an array if individual cameras. The images are stitched together to produce a mosaic which is then processed. It becomes computationally time consuming to process large ortho-mosaics in real-time. In order to accelerate the process, an implementation using a fixed-point integer approach is used. This approach is presented here because of its ability to accelerate a software solution as well as show feasibility for an embedded solution using an FPGA. FPGAs have been shown to be superior, in terms of speed, to both a CPU and GPU for image processing applications [8]. The following sections provide a solution for implementing the back-projection method using a fixed-point integer approach which results in improved throughput. Measurements are compared with the back-projection method of [4]. Section II discusses the main components of the implementation. These include the CAHV camera model [1], the back-projection algorithm [5, 6], and a review of fixedpoint mathematics. Section III covers the subject of retaining precision, followed by the results in section IV. Finally, section V offers some concluding comments and a discussion of future possibilities. II. BACKGROUND The proceeding sub-sections outline the main components of the implementation. The CAHV camera model is discussed first, followed by the back-projection algorithm and then an overview of fixed-point integer mathematics. A. CAHV Camera Model The classic photogrammetric camera model provides the internal and external parameters of the viewing camera [2]. First documented by Yakimovsky and Cunningham [1], the CAHV camera model provides the same information through the use of four 3 dimensional vectors (C, A, H, and V). The four vectors given in the CAHV model provide the transformation from world coordinates to image coordinates [2]. Each individual vector provides its own set of significant
(X c, Y c, Z c ) - Perspective center in the object coordinate system f - Focal length m ij - (i, j) th component of M, a 3x3 rotation matrix defining the transformation between the image and object space Fig. 1. CAHV camera model (source: [2]) information. The C vector specifies the location of the perspective center of the sensor, the A vector is a unit vector oriented in the direction the camera is pointing [2], and the H and V vectors are termed the horizontal and vertical vectors of the camera where H and V are first assumed to be orthogonal unit vectors and are termed H and V such that A, H, and V are mutually orthogonal [1]; these three vectors then define the image plane oriented to the real world coordinates. For a detailed explanation of all the information contained within the CAHV camera model, as well as its conversion to the photogrammetric model, see reference [2]. B. Back-Projection When creating an orthoimage two of the common algorithms are forward and back-projection [6]. Forward projection creates a digitally orthorectified image by projecting the original image onto the DEM, calculating the object space coordinates, and then projected those into the new orthoimage. In contrast, back projection projects points from a resampled orthoimage onto the DEM and then into the image space of the original image in order to acquire the pixel value for the corresponding orthoimage coordinate. Back projection provides the advantage of resampling the image as it is created, while forward projection requires resampling after the projection process is complete [5]. Within the back projection algorithm, the relationship between the image and the object space of the DEM is defined by the collinearity principle. This principle states that the perspective center, the image plane coordinate, and the object point on the DEM all lie upon a straight line [5]. The collinearity equations are given by x = f m 11(X X c ) + m 12 (Y Y c ) + m 13 (Z Z c ) m 31 (X X c ) + m 32 (Y Y c ) + m 33 (Z Z c ) y = f m 21(X X c ) + m 22 (Y Y c ) + m 23 (Z Z c ) m 31 (X X c ) + m 32 (Y Y c ) + m 33 (Z Z c ) The components of the collinearity equations are as follows: (x, y) - Image coordinates (X, Y ) - Object coordinates (1) The rotation matrix, M, is often produced by using three sequential rotations (ω, φ, and κ) [5]. However, when using the CAHV camera model this rotation matrix is calculated using the method given in [2]. H M = V (2) A The steps of the back-projection algorithm are 1 Resample the DEM to create an empty orthoimage grid space. 2 Interpolate the elevations across the new grid. 3 Use the collinearity relationship to project the object space coordinates into the source image. 4 Interpolate to obtain the intensity value for the current pixel in the orthoimage. As suggested in [6], computation time can be reduced by optimizing the back projection to be done with as few multiplications as possible. Instead, the numerator and denominator of the collinearity equations, given in Equation 1, are pre-calculated for use by additive increments instead of multiplications. This allows the projection algorithm to through each pixel using only additions and the division that is part of the collinearity relationship. C. Fixed-Point Arithmetic One of the goals of the proposed implementation is to show the feasibility of an embedded back-projection algorithm on an FPGA or similar device. When targeting an FPGA, it is well known that integer based computations provide superior throughput performance over floating point [7]. Fixed point arithmetic allows for a software solution that provides both a speed increase and models the future hardware solution. Fixed point solutions allow the use of integers in place of floating point variables while retaining much of the precision of the original values. A numerical example most clearly illustrates the fixed point method used in this paper. A simple example using an addition between two floating point values is shown. λ represents a scale factor that is used to adjust the float value before storing it as an integer. f 1 = 40.457878 f 2 = 42.654565 λ = 6 x 1 = f 1 x 2 = f 2 (3) x 1 = 2589 x 2 = 2730
Fig. 2. From left to right: Projection of [4], proposed integer-based projection, and contrast adjusted difference image. At this point, the addition is undertaken using the variables x 1 and x 2. However, the true value is defined by the addition between f 1 and f 2. The scale factor variable above is initially chosen as a power of two because such a value allows the multiplication to be done using a bitwise shift; a bitwise shift is both computationally efficient in software and easily implemented in hardware. The result of x 1 + x 2 are binary shifted down before it is stored as the final value. floatingpoint f 1 + f 2 = 83.11244 fixed (x 1 + x 2 ) = 83.10937 As can be seen, the precision is retained to the tenths place of the true value. A larger scale factor can improve the precision. For example, precision to the 1000 ths place is achievable when using a scale factor of λ = 10; the result is 83.11230. The division within the collinearity equations can be replaced with a multiplication by pre-calculating a scaled multiplier. This multiplier, α, changes slightly with each pixel increment and must be included as an additional variable that is incremented as each pixel is analysed. α = ( m 31 (X X c ) + m 32 (Y Y c ) + m 33 (Z Z c ) ) (4) ˆx = [m 11 (X X c ) + m 12 (Y Y c ) + m 13 (Z Z c )] α (5) ŷ = [m 21 (X X c ) + m 22 (Y Y c ) + m 23 (Z Z c )] α (6) x ˆx = (7) y = ŷ The image position, x, is then scaled down with a bit shift to the true image coordinate. The same process is done for the column coordinate, y. An algorithmic comparison of the original method and the proposed integer method is given in the Appendix. (8) III. RETAINING PRECISION Typically, back-projection algorithms use double precision variables for calculating the projected image coordinates. One of the chief constraints when designing the integer-based implementation is to ensure that the resulting imagery is identical with respect to the original; the original being a floating point based back-projection algorithm based upon the theory in [4]. In order to measure this, peak signal-to-noiseratio (PSNR) is utilized. Optimizing the scale factor, λ, is the main focus towards designing a system that retains the precision of the original. Figure 3 shows the test procedure to ensure accuracy of the proposed method. Fig. 3. Test Set-up Figure 4 shows the results of the PSNR analysis on a set a 10 unique images. The scale factor, λ, is varied and the average MSE between the resulting set of imagery and the originals is recorded and the PSNR is calculated. Viewing this graph, one can see that the PSNR is close to ideal when λ = 36; therefore, for optimum results, a scale factor of 36 is chosen, as improvement is not seen past this point; specifically, for λ = 36, a PSNR of 59.4dB and MSE of 0.075 are recorded. Figure 2 shows a zoomed 470 x 470 pixel section of a single processed image using each method. The resulting product is sufficiently similar to the original, with only a few pixel intensity variations across the entire image. Tests with larger data sets show consistent results; for example, on a set of 150 unique images the average MSE is found to be 0.08 with λ = 36.
separate from those used to test precision, and measures the time it takes to project each one. Fig. 4. Optimum Scale Factor Analysis; PSNR Fig. 7. Back Projection Computation Results IV. RESULTS This section highlights the timing results between the floating point and integer-based back-projection. The projection is computed on a single core of a Xeon X5460 processor running at 3.16Ghz with a 6MB cache and 32GB 667 MHz RAM. The results, shown in Figure 7, reveal an average throughput improvement of 27.6% and an average projection time of 3.67sec per frame when utilizing the inter-based method; the floating point method projected each frame with an average time of 5.07sec when tested. V. CONCLUSION The fixed point integer-based solution for the back projection of aerial imagery provides a 27.6% improvement to throughput, while retaining image quality through the use of an optimized scale factor. This implementation shows the feasibility of an embedded solution, such as an FPGA, which will likely provide additional improvements to throughput speed for back-projecting imagery. REFERENCES Fig. 5. Algorithm Profiling of [4] Figures 5 and 6 show initial results from profiling the projection of a single image. In this case, the system is given an image of 9k x 9k pixels and projects it using a given CAHV camera model. As can be seen, the switch to an inter-based implementation of the back-projection results in a noticeable decrease in processing time. [1] Yakimovsky, Y., and R. Cunningham (1978), A system for extracting three-dimensional measurements from a stereo pair of TV cameras, Computer Graphics and Image Processing, 7, 195-210. [2] Di, K., and R. Li (2004), CAHVOR camera model and its photogrammetric conversion for planetary applications, J. Geophys. Res., 109, E04004, doi:10.1029/2003je002199. [3] Kasser, M., and Y. Egels (2002), Digital Photogrammetry, Taylor & Francis, New York. [4] MISR Science Team. Algorithm Theoretical Basis Documents. [Online] Available: http://eospso.gsfc.nasa.gov/eos homepage/for scientists/ atbd/viewinstrument.php?instrument=19. [5] Mikhail, E. M., J.S. Bethel, and J.D. McGlone (2001), Introduction to Modern Photogrammetry, John Wiley, New York. [6] Novak, K. 1992. Rectification of digital imagery, Photogrammetric Engineering & Remote Sensing, 58(3):339-344. [7] Meyer-Baese, U. (2007), Digital Signal Processing with Field Programmable Gate Arrays, Springer, New York [8] Asano, S., T. Maruyama, and Y. Yamaguchi (2009), Performance comparison of fpga, gpu and cpu in image processing, International Conference on Field Programmable Logic and Applications, 2009, pp. 126-131 Fig. 6. Algorithm Profiling of Proposed Integer-Based Projection The system is then tested while simulating real world operation. The test data for this consists of 660 individual images,
APPENDIX The notable aspects of the original and integer-based algorithms are outlined below. The main differences are discussed. For reference, x, y, and z are the change in each respective dimension across the DEM cells. The variables with num subscripts are the respective numerators within the collinearity equation for both dimensions; D is the denominator. Algorithm 1 Floating Point Procedure Loop through DEM region xp num = [m 11 (X X c )+m 12 (Y Y c )+m 13 (Z Z c )] yp num = [m 21 (X X c )+m 22 (Y Y c )+m 23 (Z Z c )] D = [m 31 (X X c ) + m 32 (Y Y c ) + m 33 (Z Z c )] x num = x (m 11 ) + z (m 13 ) y num = y (m 21 ) + z (m 23 ) D = x (m 31 ) + z (m 33 ) Loop through subsampled DEM cell xp = xpnum D yp = ypnum D xp num = xp num + xp num yp num = yp num + yp num D = D + D Algorithm 2 Integer-Based Procedure Loop through DEM region xp num = [m 11 (X X c )+m 12 (Y Y c )+m 13 (Z Z c )] yp num = [m 21 (X X c )+m 22 (Y Y c )+m 23 (Z Z c )] D = [m 31 (X X c ) + m 32 (Y Y c ) + m 33 (Z Z c )] x num = x (m 11 ) + z (m 13 ) y num = y (m 21 ) + z (m 23 ) D = x (m 31 ) + z (m 33 ) Λ = 2λ D Λ = 2λ D D(D+ D) xp = Λ xp num yp = Λ yp num xp = xp λ yp = yp λ Loop through subsampled DEM cell xp num = xp num + xp num yp num = yp num + yp num Λ = Λ + Λ Interpolate gray level of the current pixel end end Interpolate grey level of the current pixel end end The floating point algorithm functions by processing each individual cell within the DEM. Each cell is iterated over in accordance with the subsampling rate and the grey values for the pixels are interpolated from the source image. The integer-based algorithm adds a scaled multiplier, Λ, which scales the x and y coordinates and allows for the avoidance of the collinearity division. The scaling is done by a power of two in order to create a situation in which the true coordinate can be retrieved by a simple bitwise shift. The initial xp and yp values within the inner are integer multiplies; both xp num and yp num are cast to integers before entering the. Note, the inner of the modified algorithm contains no divisions; instead, it is composed only of additions, shifts, and multiplies.