Coding for the Storage and Communication of. D. Tzovaras, N. Grammalidis, M. G. Strintzis, Aristotle University of Thessaloniki

Transcription

1 Coding for the Storage and Communication of Visualisations of 3D Medical Data D. Tzovaras, N. Grammalidis, M. G. Strintzis, and S. Malassiotis Information Processing Laboratory Aristotle University of Thessaloniki Thessaloniki 54006, Greece phone: (+30-31) , fax: (+30-31) Abstract The transmission of the large store of information contained in 3D medical data sets through limited capacity channels, is a critical procedure in many telemedicine applications. In this paper, techniques are presented for the compression of visualizations of 3D image data for ecient storage and transmission. Methods are rst presented for the transmission of the 3D surface of the objects using contour following methods. Alternately, the visualisation at the receiver may be based on a series of depth maps corresponding to some motion of the object, specied by the medical observer. Depth maps may be transmitted by using depth map motion compensated prediction. Alternately, a wire-mesh model of the depth map maybe formed and transmitted by encoding the motion of its nodes. All these methods are used for the transmission of the 3D image with visualisation carried out at the receiver. Methods are also developed for ecient transmission of the images visualised at the encoder site. These methods allow remoteinteractive manipulation (rotation, translation, zoom) of the 3D objects, and may be implemented even if the receiver is a relatively simple and inexpensive workstation or a simple monitor. In all above cases, the coding of binocular views of the 3D scene is examined and recommendations are made for the implementation of coders of stereo views of 3D medical data. Subject Terms: 3D Data Compression Medical Data Visualisation Depth Map Transmission Telemedicine Applications. i

2 1 INTRODUCTION Data compression and coding is an essential processing task in medical image transmission and storage [1, 2, 3]. The transmission of the large store of information contained in medical data through limited capacity channels, is still an open problem [4, 5]. Also, this information must be coded for storage in Picture Archiving and Communication Systems in a way thatisecient, and also permits fast retrieval, especially for such operations as browsing or previewing the data. Coding and compression methods are necessary for the implementation of several medical data storage and transfer needes. These include methods for Coding of the whole 3D (volume) data set. Coding of the 3D object surface for visualisation at the receiver. Visualisation at the transmitter and coding of the visualised images. The entire 3D volume of the medical data is needed in applications such as stereotactic surgery planning and radiotherapy simulation. It is also needed for routine archival purposes. Very often, at least a portion of this data set must be coded in a lossless (reversible) way to fully retain the diagnostic information [1, 6, 7]. Methods for medical 3D volume compression and transmission, have beenwell investigated and their merits adequately discussed in the literature [1, 6, 7]. The present paper will concentrate on the remaining two general above methodologies, which have not been adequately analysed in the literature, even though they are very useful for the transmission of not only medical 3D data but also of other types of three dimensional information such asneededinquality control, virtual museum and teleshopping applications. Specifically, the present paper will investigate details of the implementation of these methods using alternative coding procedures and will experimentally assess the eciency of these implementations. First, methods for surface transmission will be examined, based on the lossless contourfollowing technique. Alternately, the receiver may use a depth map supplied by the encoder to visualise a still medical object, or if a depiction of a moving medical object is required, the receiver may rely on a series of depth maps corresponding to the required object motion. Such motion may for example be specied by the medical observer. Depth maps may be transmitted by using depth map motion compensated prediction. 1

3 In the above operations it is necessary to perform the nal visualisation of the data at the receiver site. This assumes the existence of a powerful workstation at the receiver site with the capacity of fast data visualisation. In practice it is very often preferable to have apowerful central workstation perform the visualisation and transmit the images to the peripheral stations in the form of images. This minimises the transmission delay and allows the viewing peripheral stations to be relatively simple and inexpensive \digital lightboxes". It also allows with minimum delay interactive manipulation of the data, with user initiated operations possible such as translation and rotation of the medical objects. To achieve this end, the present paperintroduces a special-purpose 3D object-based codec. This codec takes advantage of the fact that the motion of the object is user initiated and hence is completely known to the receiver. It also takes advantage of the knowledge at the transmitter site of the 3D shape of the object. Hence the only parameters needed to be transmitted for each group of pictures are : a) the initial image, b) a succession of depth maps. Depth compensation is used for the compression of the latter. Alternatively, the depths of the nodes of a wire mesh model of the object may be transmitted. The paper is organised as follows. In Section 2, the problem of coding of visualised data is presented, and solutions are proposed for the case where visualisation is performed at the receiver. Section 2.2 outlines a lossless technique for 3D surface compression, while in Section 2.3 novel ecient techniques for low bit rate coding of a sequence of depth maps corresponding to multiple views of the 3D scene, are described. An alternate depth map coding technique based on wire mesh modeling is then examined in Section 2.4. In Section 3, a solution is proposed for the case where visualisation is performed at the transmitter, using a novel object-based technique for motion compensation of the sequences of visualised images. Finally, Section 4 evaluates the use of the examined coding techniques for stereoscopic medical image viewing. Experimental results described in Section 5, demonstrate the performance of the proposed methods. 2 VISUALISATION AT THE DECODER SITE If a full-capacityworkstation is available at the decoder site, it will be sucient to transmit the 3D information: A. Using a 3D volume data form. 2

4 B. In 3D surface form. An alternative to the transmission of the whole 3D data set is the transmission of the object contour using methods which will be described later in this section. C. In the form of 2D snapshots of the depth maps. The decoder will then visualise the data. In the sequel the rendering process used for the visualisation of the medical data is presented as an introduction to the description of the proposed coding methods. 2.1 Rendering Process The term visualisation usually refers to any process that allows visual understanding and interpretation of 2D or 3D data. In medical imaging, visualisation is performed primarily using volume or surface rendering techniques. Volume rendering [4] is used to show the interior of a solid 3D object on a 2D image. This is very useful in many medical imaging applications where the boundaries between dierent tissues have to be indicated. In surface rendering [8, 9] the goal is the creation of realistic pictures of 3D objects. This is done by simulating the lighting of the scene, the surface properties (e.g. surface color), the camera-eye system, the position of the viewer and the object motion. In our experiments, visualisation of the Magnetic Resonance Imaging (MRI) data was performed by the 3DVIEWNIX software package [10]. This package provides a variety of manipulating and analysing tools for the visualisation and display ofmultidimensional image data. 2.2 Lossless 3D Surface Compression Using Contour Following The original 3D data usually consist of several slices of 2D data. If a single closed outer contour per slice is assumed, the 3D surface may beequivalently represented by stacking these 2D contours. In the contour following approach [13], an initial point on the contour is chosen and then the contour is traced in a clockwise manner, labeling the direction as we shift from one contour element to the next (see Fig. 1). The resulting data stream is then entropy coded using arithmetic coding techniques [14]. This is a lossless approach that also has the advantage of simplicity, leading to very fast implementation. 3

5 NW N NE W E...NE,E,SE,SE,E,NE,... SW S SE 2.3 Depth Map Coding Figure 1: Contour following. In several important medical applications, the method of 3D surface description using contour following is not easily applicable (for instance see Fig. 7). For example, while the test sequence \Head" (Fig. 6a) is described well by the methods discussed above, the sequence \Brain" is not easily amenable to such description, since it is extremely dicult to dene an enveloping contour for so complicated an object. In such cases a preferable description of the depth maps may be aorded directly by the ray-tracing used (Fig. 8). Specically, a number of snapshots of the depth maps, corresponding to certain angles of viewing the object, may be used for the depiction of the medical information that must be stored or transmitted. This series of snapshots may then be coded as though they were consecutive in time and their aggregate may be treated as a \moving" sequence. In fact, since motion parallax is a powerful aid to image understanding, their actual positioning in a moving sequence may be precisely what the medical observer requires [4]. The rotation back and forth for example of a 3D MRI image gives a very good understanding of medical detail. Ecient techniques must then be found for the coding and compression of the depth maps corresponding to multiple views of a particular 3D data set. If ~V t =(X t Y t Z t ) is the 3D vector denoting the position of a point in the 3D space at time instant t, the position of the same point at time t + 1 will be : ~ V t+1 = R ~ V t + ~ T (1) where R is the rotation matrix and ~ T is the translation vector. The general form of 4

6 equation (1) is : X t+1 Y t+1 Z t = 0 R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R C A 6 4 X t Y t Z t T x T y T z 1 C A : (2) The image formation process of the visualised images, is a perspective projection of a 3D object onto the image plane. The center of projection is located at the origin of the 3D world coordinate system, at a distance f from the image plane. If (X t Y t Z t ) T position vector of a point in the 3D space and (x t y t f) T is the is the position vector of its projection at the image plane, then the following relation exists between the image and the world coordinates at two consequent time instants: x t+1 = f X t+1 Z t+1 x t = f X t Z t y t+1 = f Y t+1 y t = f Y t : (3) Z t+1 Z t On replacing in (2), we obtain the following system of equations relating the image plane pixel coordinates (x t y t ) with the corresponding depth Z t and their evolution in time. If fr ij g ft x T y T z g are the object motion parameters: x t+1 Z t+1 = R 11 x t Z t + R 12 y t Z t + R 13 Z t + ft x y t+1 Z t+1 = R 21 x t Z t + R 22 y t Z t + R 23 Z t + ft y Z t+1 = R 31 x t Z t + R 32 y t Z t + R 33 Z t + ft z : (4a) (4b) (4c) The third of the above equations provides the new depth Z t+1 at time instant t +1 corresponding to pixel (x t y t ), while the rst two provide its new position (x t+1 y t+1 ). Some problems arise in the motion compensation procedure, due to the fact that x t+1 and y t+1 are oating point numbers. Thus the new position of the point (x t+1 y t+1 ) at time instant t + 1, lies outside the sampling grid of the subsequent frame. Therefore an interpolation procedure has to be adopted, in order to assign values at integer pixel locations. One such procedure, based on upsampling of the depth map Z t at time instant t was implemented. This technique led to satisfactory results that improve as the upsampling rate increases. With this scheme, only the object boundary information need to be transmitted for motion compensation. The object boundary information is coded losslessly using arithmetic coding techniques [14]. Further, to oset errors in the motion compensated depth map prediction, particularly those occuring in newly occluded or newly appearing areas, the error depth map is transmitted, using DCT and Human encoding. A block-diagram of the proposed depth map coder is shown in Figure 9. 5

7 2.4 Depth Map Transmission with Wire Mesh Modeling Alternately, depth information may be modeled using a wire mesh. We shall assume a surface model of the form z = S(x y ^P) (5) where ^P is a set of 3D \control" points ^P = f(xi y i z i ) i=1 ::: Ng that determine the shape of the surface and (x y z) are the coordinates of a 3D point. Consider a set of image points Â = f(x i Y i ) i=1 ::: Ng and a triangulation ^T of Â such as the triangles of ^T cover the image plane. An initial choice for Â is the regular tesselation shown in Figure 2. Figure 2: Initial triangulation of the image plane. Then, we can model S by z = NX i=1 z i N i (x y) (6) 6

8 where N i (x y) are basis functions shown in Figure (x i,y i ) Figure 3: Basis functions It may be easily seen that (6) is a piecewise linear surface, consisting of adjoint triangular patches, which may be written in the form if (x y z)isapoint on the triangular patch z = z 1 g 1 (x y)+z 2 g 2 (x y)+z 3 g 3 (x y) (7) d P1 P 2 P 3 = f(x 1 y 1 z 1 ) (x 2 y 2 z 2 ) (x 3 y 3 z 3 )g. The functions g i (x y) are the barycentric coordinates of (x y z) relative to the triangle and they are given by g i (x y) =Area(A i )=Area( P1 P 2 P 3 ) (Figure 4). P 1 d A 3 A 2 P 2 A 1 P 3 Figure 4: Barycentric coordinates. This may be also written in the better known form z = ax + y + (8) where (a ) is the slope and the elevation of the corresponding triangular patch. The reconstruction of a surface from dense depth measurements may be eected by minimizing a functional of the form X M E 0 ( ^P )= i ~y i ^P ) ; ~zi ) i=1(s(~x 2 + E s (9) 7

9 where the rst term of (9) expresses condence to the data (~x i ~y i ~z i ), i =1 ::: M and E s is the function measuring departure from smoothness of S, given by E s = ZZ!(x y)(s 2 xx +2S 2 xy + Syy) 2 dx dy (10) where!(x y) is a binary function that equals zero if (x y) lies on a discontinuity. Taking derivatives of the surface function S(x y ^P) written in the form (8) and replacing in (10) we obtain, after simple manipulations E s = X e2e e l e f(a 1 e ; a2 e )2 +( 1 e ; 2 e )2 g (11) where E is the set of all non-boundary edges of the wireframe, l e is the length of an edge and (a 1 e 1) e (a2 e 2 e ) are the slopes of the triangular patches that have a common edge e. The binary variable e assumes the zero value if e crosses a discontinuity. The existence of the term E s guarantees that we obtain a smooth surface. In the case of the regular tesselation (11) becomes X E s = l2 (z i ; z a + z j ; z b ) 2 (12) A 2 e2e where l is the edge length, A is the area of each triangle, and P i P j P a P b are shown in Figure 5. Replacing (12),(7) in (9) P i P b e P a P j Figure 5: Edge e between two neighboring triangles. E 0 ( ^P )=ka ^P ; Bk 2 + kc ^P k 2 (13) where ( gj (~x A ij = i ~y i ) if (~x i ~y i ) inside triangle f(x j y j ) (x k y k ) (x l y l )g 0 otherwise i =1 ::: M j =1 ::: N 8

10 B i =~z i i =1 ::: M and C ki =1 C ka = ;1 C kj =1 C kb = ;1 C kl =0ifl=2fi j a bg k =1 ::: L where L is the number of non-boundary edges, and i j a b are the indices of the nodes that belong to triangles incident to the edge k as shown in Figure 5. Minimization of (13) occurs when : ^P =(A T A + C T C) ;1 A T B (14) Using (8), the depth z of any point onapatch can be expressed in terms of the depth information of the nodes of the wireframe and the X and Y coordinates of that point. Hence, full depth information will be available if only the depths of the nodes of the wireframe are transmitted. After wire mesh model initialization, motion compensated estimates may befound of the depth maps corresponding to subsequent time instants, by updating the position of the nodes of the wireframe only. Occlusions may then be detected and compensated by transmitting the depth map error information, as explained in the preceding section. Using this occlusion information, the wire mesh model is updated by adding new nodes at the new appearing areas, which occur at the boundary of the object. The wire mesh update information need also be transmitted at each time instant. 3 VISUALISATION AT THE ENCODER SITE In many practical applications, where the receiver must be a relatively simple and inexpensive workstation, visualisation at the receiver will require an inordinate amount of time. Then, if a powerful workstation is available at the database site or the diagnostic image source, the visualisation may be performed on this workstation and the resulting 2D images transmitted to the receiver. This will allow the receivers to be inexpensive personal computers, or even simple high-resolution monitors which may be dispersed throughout a hospital as \digital lightboxes". The need for visualisation at the transmitter or archive site also arises when the archive contains images of diagnostic value that the physician has decided to save in visualised format. The original 3D data may also be on le, but 9

11 the physician will normally want toavoid repeating the time-consuming segmentation and rendering procedure. The coding scheme to be used must allow remote interactive handling and manipulation (rotation, translation, zoom) of the 3D data by the user at the receiver site, using a simple signaling device such as a mouse or a joystick. 3.1 Special-Purpose 3D Object-Based Coding A coding scheme suitable for the coding of 3D medical images visualised at the encoder, may be based on the knowledge of the 3D motion eld. In fact, since the motion of the object is user-initiated, both encoder and decoder know the 3D motion of the object with precision. Thus, if a depth map of the object is also provided, the 2D motion eld may be obtained from the projection of the 3D motion and be used for interframe motion compensation. The production of the series of images (visualisations) corresponding to the prescribed 3D object motion may be accelerated using incremental visualization techniques such as reported in [25]. If the series of images to be transmitted are divided into MPEG-like groups of images, the initial image (\I-frame" in MPEG terminology) in each group can be transmitted using lossless coding, simple DPCM, DCT or wavelet-based coding [16, 17]. Interframe coding may beachieved using the known 3D motion parameters, on the basis of the analysis presented in Section 2.3. Specically, equations (4a),(4b),(4c) may be used for motion compensation, provided that the relation is found between the luminances of a pixel at its successive positions (x t y t )and(x t+1 y t+1 ). In theory this is possible. However, this relationship is an extremely complex function of the mechanics of the rendering procedure and depends on visualisation parameters such as the lighting conditions assumed, and surface reectabilitychosen, as well as the gradient of the surface at each point. A simple alternative to this very complicated construction is the working assumption of equal intensity intwo successive pixel positions: I(x t+1 y t+1 )=I(x t y t ) : (15) The resulting object-based 3D motion compensation scheme is dened by (4) and (15). Clearly, the scheme may be decoupled into two separate procedures : 10

12 Depth map evaluation using, Z t+1 = R 31 x t Z t + R 32 y t Z t + R 33 Z t + ft z (16) followed by motion compensated image intensity prediction using (15) and, x t+1 Z t+1 = R 11 x t Z t + R 12 y t Z t + R 13 Z t + ft x (17) y t+1 Z t+1 = R 21 x t Z t + R 22 y t Z t + R 23 Z t + ft y : (18) Alternately, the depth map Z t+1 may beevaluated by means other than (16), for example using the contour information discussed in section 2.2. In the experimental results presented in Section 5, it is assumed that (16) is used for depth map evaluation. The information contained in the depth map supplied to the receiver may be used to enable stereo viewing of the data (Section 4). A block-diagram of the proposed visualized image coding scheme is shown in Figure Transmission using a general-purpose object-based coder based on 3D motion compensation A general purpose object-based coder, for the coding of general scenes such as described in [20, 21, 22, 23, 24], may also be used for the coding of medical visualisations. This is of interest for the purpose of comparison with the special 3D medical image coder described above. The coder in [24] was implemented for this purpose. This coder supplies full depth information to the receiver. When such information is needed (e.g. for stereoscopic viewing applications), this coder is therefore advantageous compared to MPEG-2. The depth map is estimated by the coder, using for each time instant two visualisations of the object separated bya2 rotation about the vertical axis. This approximates a stereo view of the medical object and may be used to determine the approximate depth map. The motion eld is estimated using the sequence of visualisations. Specically, the generalpurpose coder works as follows [24]: a) Uniformly displaced regions are identied: Block-based estimation is used to identify an initial estimate of the 2D motion eld. Then a split-and-merge segmentation algorithm is used to identify the objects indicated by the uniformly displaced regions. 11

13 b) The 3D motion of each object is found: The 3D motion parameters (translation and rotation) are related to the 2D motion eld vectors d =(d x d y )by the equation: d x = f t z x ; X t z z ; XY w f x +(f + X2 )w f y ; Yw z d y = f t z y ; Y t z z ; (f + Y 2 )w f x + XY w (19) f y + Xw z where (X Y )isapoint in the current image and z is its corresponding depth. Also f is the focal length and (t x t y t z ),(w x w y w z ) are the 3D object translation and rotation parameters, respectively. Equations (19), are used to estimate the 3D motion of each object at two stages. In the rst stage, equations (19) are used for N of the initially estimated 2D vectors, forming a system of 2*N equations and 6 unknowns. With N 3 this system is overdetermined and can be solved using least-squares methods. In the second stage, improved estimation of model parameters is achieved by minimising the displaced frame dierence (DFD) between the object as seen at the current image, and the motion compensated reconstruction produced from the previous image, using the estimated 3D parameters. A steepest-descent technique is used to achieve this minimisation. c) The interframe coding scheme operates in two basic modes: object-based mode and block-based mode. In each image region the 3D motion parameters are rst estimated. If the resulting reconstruction error is smaller for a region than that of the 2D motion vectors, then the coder goes into object-based mode and the six 3D motion parameters are quantised and coded using PCM. We used a uniform scalar quantiser between the minimum and maximum values of the parameters. Otherwise the coder goes into blockbased mode, the region is divided into 8 8blocks and 2D motion compensation is used. A bitmap is also transmitted indicating the coding mode for each region. 4 CODING FOR STEREOSCOPIC VIEWING OF 3D MEDICAL DATA Depth perception, hence stereo viewing, is useful in many medical applications. If visualisation is performed at the decoder site, generation of a second (stereo) view will necessitate the completion of two separate rendering procedures. If visualisation is done at the encoder site, a second (right) channel image may be coded precisely as the rst (left) sequence. Alternately, the right channel image may be formed from the left image using disparity compensation. This is possible with all coding methods discussed in Sections 12

14 2 and 3 of this paper, because in all these methods the depth map is also transmitted. Therefore, at the decoder side the disparity eld is found from the depth map using the relation d = bf z (20) where b is the baseline, f is the focal length, d is the disparity and z is the depth. The right channel image is then formed from the left channel image and disparity compensation : I r (x y) =I l (x + d y) : (21) 5 EXPERIMENTAL RESULTS The methods in Sections 2-4 were tested using MRI data of the head (represented with 8 bits=pixel) forming 256 slices with a 1mm distance between slices. The resolution of each slice is pixels. A mouse driven segmentation procedure based on thresholding was used to extract from the original 3D data set \Head", the \Brain" data set. The sequence of depth maps corresponding to 2D views of the 3D data sets were generated from the original MRI slices. The depth maps corresponding to these data sets and the corresponding rendered views are illustrated in Fig Transmission of Contour and Depth Map Information To enable visualization at the receiver, the lossless contour-following method was used to transmit the 3D surface of the \Head" image. This required the transmission of bytes and resulted in the reconstruction shown in Figure 6b. Note that the bytes, necessary to transmit the entire 3D surface, correspond to an average of =( ) = 0:03807 bits=pixel when 180 depth maps (one every 2 degrees) have to be transmitted. Also, to enable the receiver to visualize a rotation of the medical object, a sequence of depth maps corresponding to 2D views of the 3D data set were generated from the original MRI slices. Each frame of the depth map sequence, corresponds to an object rotation by 2 degrees around the Y axis. An object-based 3D motion compensated prediction scheme for the interframe compression of the resulting sequence of depth maps is dened by equation (4c). The following information must be made available to the decoder: The previous depth map. 13

15 Denition of the objects in the scene (segmentation information). The error map corresponding to areas which were occluded in the previous depth map. The 3D object-based motion compensation method is integrated in an MPEG-like coding scheme for the transmission of a sequence of depth maps. In this scheme it is assumed that the rst depth map is intra coded using I-Frame coding techniques, while the subsequent frames are coded using motion compensation (P-Frames) : IPP...PPIPP... Each P-frame is predicted from the previous frame (P or I) in the sequence. The decoder which has already available a reconstruction of the previous frame, uses 3D objectbased motion compensation of the already available reconstruction of the previous frame, to obtain a motion-compensated approximation of the current frame. The error map corresponding to areas which were occluded in the previous depth map is then coded using DCT and Human coding. The block diagram of the resulting codec is shown in Fig 9. In the decoder, the rendering procedure described in Sec. 2.1 is used to visualise the reconstructed depth map information. Experimental results evaluating the performance of the proposed method before and after the coding of the prediction error images are given in Tables 1 and 2 respectively, relating the bits/pixel needed for the transmission of the depth maps to the quality of the images visualised on the basis of these depth maps. Furthermore, in Figure 10 the original and reconstructed depth maps of the frame 2 of the sequence \Head" are shown along with the rendered images corresponding to these depth maps. Alternately, if the wire mesh modeling technique is used for depth map coding the initial wireframe is computed using the techniques described in Section 2.4. The initial wireframe adapted to the rst frame of \Head" is shown in Figure 11. The wire mesh model parameters (coordinates of the 1112 nodes) were then quantised and coded using DPCM techniques, followed by arithmetic coding. We used uniform scalar quantisers between the minimum and maximumvalues of each model parameter. Depth information was then interpolated from the depths of the nodes of the wireframe. The depth map corresponding to the modeled rst frame of \Head" is shown in Figure 12a and the corresponding visualized image is illustrated in Figure 12b.. Motion compensation of the 14

16 nodes of the wireframe is then performed followed by depth error transmission. Table 3 shows the the bits/pixel needed for the transmission of the wire mesh versus the quality of the images visualised on the basis of the interpolated depth maps. 5.2 Transmission of Visualised Images The results of Section 3 were also tested to simulate the transmission of images visualized at the encoder. In the proposed special purpose object-based scheme, 3D motion compensation is possible for every point of an object in the scene, if the following information is made available to the decoder: A reconstruction of the previous frame in the visualised image sequence. The previous depth map. The error image, which includes the newly apparent, previously occluded regions and the errors due to the inaccuracy of assumption (15). An MPEG-like motion compensating scheme was implemented in this case as well. The 3D object-based motion-compensation method was integrated in an MPEG-like coding scheme for the transmission of a sequence of images. In this scheme, the rst visualised image (I-frame) is coded using DCT and Human techniques, while the subsequent frames are coded using 3D motion compensation (P-frames). Each P-frame is predicted from the previous frame (P or I) in the sequence. The decoder which has already available a reconstruction of the previous frame, uses 3D object-based motion compensation to obtain a motion-compensated approximation of the current frame. The number of P- frames between successive I-frames was chosen to be 6. The sequence of the corresponding depth maps which is necessary for motion compensation, is also coded using the adopted MPEG-like scheme. The prediction error images, may also be coded and transmitted. Note that in this scheme the error images involve not only intensity error images from equations (17),(18) but also depth-map errors from equation (16). These are coded using DCT and Human coding and transmitted. A block diagram of the resulting codec is shown in Fig. 13. Since the 3D motion eld is accurate, DCT and Human coding of the prediction errors produce near-lossless compression of the visualised images (Table 6). The small coding error is due to the accurate knowledge of the observer-initiated translation and 15

17 rotation parameters. By comparison of Tables 5 and 6 we conclude that the proposed method performs slightly better than MPEG-2 [15]. In the demonstration of the RACE II project DISTIMA, the evaluators failed to detect a signicant qualitative dierence between the two methods. Furthermore, the method proposed does not require motion estimation and hence is considerably simpler than MPEG-2 encoding/decoding. Finally, unlike MPEG, this coding method provides full depth map information to the receiver. 5.3 Transmission for Stereoscopic Viewing of Images Depth information is needed at the receiver site if stereoscopic viewing of the data is desired (Section 4). It is also useful in applications where the image data are to be stored and subsequently interactively manipulated in a 3D manner (rotation, 3D translation) at the receiver. For reasons of comparison, the original frame 2 of the \Brain" image, along with the reconstructed images using the 3D object coder, and the MPEG-2 method are shown in Figures 14(a,b,c). The performance of the general purpose stereoscopic coder was also tested on the \Brain" sequence. Tables 7 and 8 present the PSNR of the reconstructed left image and the number of the bytes that have tobesent to the decoder, with and without coding of the prediction error images respectively. The reconstructed frame 2 of the visualised image sequence is depicted in Figure 14d. As noted, the general-purpose object-based coder produces results inferior to those of both the special-purpose object-based coder and MPEG. Compared with MPEG however, the general-purpose object-based coder has the advantage of fully transmitting depth information which as noted, permits stereo viewing and 3D storage of the sequence and may otherwise be needed in medical use. With all above methods, disparity may be estimated from depth and disparity compensation may be used to generate the right image sequence from the left image sequence. For better quality stereo image, the error image resulting from the inaccuracy of (21) and from the presence in the right view of areas occluded in the left must be transmitted. This will result in a modest overhead in bitrate as shown in Table 9. 6 CONCLUSIONS In telemedicine applications, and also for ecient storage and retrieval of 3D images in Picture Archiving and Communication Systems, coding and compression methods are 16

18 necessary for the implementation of several data storage and transfer modes. include methods for, These Coding of the whole 3D (volume) data set. Coding of the 3D object surface for visualisation at the receiver. Visualisation at the transmitter and coding of the visualised images. For these needs, in summary, the paper discusses : afor visualisation at the decoder site, transmission of surface information using { the 3D contour method of Section 2.2 or, { the depth-map snapshot method of Section 2.3. { the depth-map snapshot method of Section 2.4 using wire-mesh modelling of the depth maps. b If rendering is performed at the encoder site, transmission of the resulting visualisation via the novel object-based coder of Section 3. Although this coder supplies full depth map information to the receiver, it was seen to perform slightly better than MPEG-2. cfor stereo viewing of images produced either at the encoder or the decoder site construction of the right image from the left channel images is proposed in Section 4. In fact, since the proposed monoscopic coders require depth map transmission, the disparity eld may be obtained and hence a stereo view may be composed from the monoscopic view by disparity compensation. In this way, an adequate stereo presentation may be obtained without additional computational cost or bitrate increase. For a better-quality stereo view, the disparity compensation error image must be transmitted, at a modest overhead in bitrate. 7 Acknowledgment This material is based on work supported by the RACE 2045-DISTIMA and the ACTS 092-PANORAMA projects. 17

19 References [1] P. Roos and M. A. Viergever, \Reversible 3D Decorrelation of Medical Images," IEEE Trans. on Medical Imaging, vol. 12, pp. 413{420, Sep [2] H. Musmann, P. Pirsch, and H. Grallert, \Advances in Picture Coding," Proc. IEEE, vol. 73, pp. 523{548, Apr [3] A. Netravali and B. Haskell, Digital Pictures: Representation and Compression. New Jersey: Plenum, [4] J. Udupa and G. Herman, 3D Imaging in Medicine. Boca Raton., FL: CRC Press, [5] S. H. Oguz, O. N. Gerek and A. E. Cetin, \Motion Compensated Prediction Based Algorithm for Medical Image Sequence Compression", Signal Processing : Image Communication, vol. 7, No. 1, pp , [6] L. Kondis, C. Chrysas, H. Sahinoglou, and M. G. Strintzis, \3D Medical Data Compression," in Proc. Int 'l. Workshop on Image Processing, (Budapest), June [7] H. Sahinoglou, S. Malassiotis, and M. G. Strintzis, \Lossless Coding and Visualization of 3D Medical Data With Lossy Preview Capability," Bioimaging, to appear, [8] A. Kaufman, A Tutorial on Volume Visualization. Los Alamitos, CA: IEEE Computer Society Press, [9] J. Udupa, H. Hung, and K. Chuang, \Surface and Volume Rendering in 3D Imaging : a comparison," Journal of Digital Imaging, vol. 4, pp. 196{214, [10] J. Udupa, R. Goncalves, K. Iyer, S. Narendula, D. Odhner, S. Samarasekera, and S. Sharma, \3DVIEWNIX : An open, transportable software system for the visualization and analysis of multidimensional, myltimodality, multiparametric images," in SPIEE Proceedings, vol. 1987, pp. 47{58, [11] R. Thoma and M.Bierling, \Motion Compensating Interpolation Considering Covered and Uncovered Background," Signal Processing : Image Communication, vol. 1, no. 1, pp. 191{212, [12] N. A. J.Weng and T. Huang, \Matching two perspective views," IEEE Trans. Pattern Anal. and Mach. Intell., vol. 14, pp. 806{825, August [13] J. A. Saghri and H. Freeman, \Computer processing of line drawing images," IEEE Trans. PAMI, pp. 533{539,

20 [14] R. Witten, I. Neal, and J. Cleary, \Arithmetic Coding for Data Compression," Communications of the ACM, vol. 30, pp. 520{540, Jun [15] MPEG, \MPEG Video Simulation Model Three," tech. rep., ISO/IEC JTC1/SC2/WG11 N0010, MPEG 90/041, July [16] M. G. Strintzis, \Optimal Biorthogonal Wavelet Bases for Signal Representation," IEEE Transactions on Signal Processing, vol. 44, no. 6, pp , June [17] S. Efstratiadis, D. Tzovaras, and M. G. Strintzis, \Hierarchical Partition Priority Wavelet Image Compression," IEEE Transactions on Image Processing, vol. 5, no. 7, pp , July [18] J. Weng, T. Huang, and N. Ahuja, \Motion and Structure from Two Perspective Views: Algorithms, Error Analysis and Error Estimation," IEEE Trans. Pattern Anal. and Mach. Intell., vol. 11, pp. 451{476, May [19] S. Malassiotis, and M. G. Strintzis, \Joint Motion/Disparity MAP Estimation for Stereo Image Sequences," accepted for publication, IEE Proceedings: Vision, Image and Signal Processing, [20] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo Image Sequences Using Joint 3D Motion/Disparity Compensation," IEEE Transaction on Circuits & Systems for Video Technology, to appear, Vol. 7, Apr [21] N. Diehl, \Object-Oriented Motion Estimation and Segmentation in Image Sequences," Signal Processing : Image Communication, pp. 23{56, Feb [22] H. G. Musmann, M. Hoetter, and J. Ostermann, \Object-Oriented Analysis- Synthesis Coding of Moving Images," Signal Processing : Image Communication, pp. 117{138, Oct [23] J. Dugelay and D. Pele, \Motion and Disparity Analysis of a Stereoscopic Sequence, Application to 3DTV Coding," Signal Processing, pp. 1295{1298, Oct [24] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo Image Sequence Coding based on 3D Motion Estimation and Compensation," Signal Processing : Image Communication, vol. 7, pp. 129{145, Aug [25] B. Gudmundson, and M. Rand, \Incremental Generation of Projections of CT - Volumes", Proc. 1st Conf. on Visualization in Biomedical Computing, N. F. Ezquerra, Editor, pp ,

21 Depth-Map Bit Rate PSNR Depth PSNR Corresponding Number (bits/pixel) Map (db) Visualised Image (db) Table 1: Depth Map Coding, \Head" : Bit-rate versus PSNR performance of the motion compensation method for depth map coding for : a) the depth map, b) the visualisation of the reconstructed depth map information. Depth Bit Rate Bit Rate Bit Rate PSNR PSNR Map Motion Comp. Error Encoding Total Depth Map Corresponding Number (bits/pixel) (bits/pixel) (bits/pixel) (db) Vis. Image (db) Table 2: Depth Map Coding + DCT Error Coding, \Head" : Bit-rate versus PSNR performance of the motion compensation method for depth map coding for : a) the depth map, b) the visualisation of the reconstructed depth map information. 20

22 Depth Bit Rate Bit Rate Bit Rate PSNR PSNR Map Motion Comp. Error Encoding Total Depth Map Corresponding Number (bits/pixel) (bits/pixel) (bits/pixel) (db) Vis. Image (db) Table 3: Wire Mesh Modeling of the Depth + DCT Error Coding, \Head" : Bit-rate versus PSNR performance of the wire mesh modeling method for depth map coding for : a) the depth map, b) the visualisation of the reconstructed depth map information. 21

23 Frame Bit Rate PSNR Number (bits/pixel) (db) Table 4: Visualised Image Coding using the special 3D object codec, \Brain" : Bit-rate versus PSNR performance of the 3D motion compensation method for visualised image coding, using the reconstructed depth maps (with DCT coding of the depth map error). Frame Bit Rate PSNR Number MPEG Vis. Image (bits/pixel) (db) Table 5: Visualised Image Coding using MPEG-2, \Brain" : Bit-rate versus PSNR performance of the MPEG-2 method for visualised image coding. 22

24 Frame Bit Rate Bit Rate Bit Rate PSNR Number Motion Comp. Error Encoding Total Vis. Image (bits/pixel) (bits/pixel) (bits/pixel) (db) Table 6: Visualised Image Coding using the special 3D object codec \Brain" : Bit-rate versus PSNR performance of the 3D motion compensation method for visualised image coding followed by DCT and Human coding of the error image and the error depth-map. 23

25 Frame Bit Rate PSNR Number (bits/pixel) (db) Table 7: Coding using the general-purpose coder, \Brain" : Bit-rate versus PSNR performance for visualised image coding using the general purpose object-based coder. Frame Bit Rate Bit Rate Bit Rate PSNR Number G-P Coder Error Encoding Total Vis. Image (bits/pixel) (bits/pixel) (bits/pixel) (db) Table 8: Coding using the general-purpose coder, \Brain" : Bit-rate versus PSNR performance for visualised image coding using the the general-purpose coder followed by DCT and Human coding of the error images. 24

26 Frame Bit Rate Bit Rate Bit Rate PSNR Number G-P Coder Error Encoding Total Vis. Image (bits/pixel) (bits/pixel) (bits/pixel) (db) Table 9: Coding Using Disparity Compensation + Error Coding, \Brain" : Bit-rate versus PSNR performance for stereoscopic viewing of the visualised image sequence followed by DCT and Human coding of the error images. 25

27 (a) (b) Figure 6: Performance of the contour following modeling technique. Original (a) and reconstructed (b) rendered views. 26

28 (a) (b) Figure 7: Performance of the contour following modeling technique. Original (a) and reconstructed (b) rendered views. 27

29 (a) (b) (c) (d) Figure 8: (a,b) Depth maps of the front view, (c,d) Rendered Views. 28

30 Current Depth Map Object Boundary Extraction Arithmetic Coder Previous Depth Map Previous Depth Map Reconstructed Depth Map 3 D Motion Compensation ENCODER Desired Motion Parameters DCT and Huffman Coding IDCT and Huffman Decoding DECODER 3 D Motion Compensation + Depth Map Estimate Figure 9: Block Diagram of the proposed Depth Map Coding Scheme. 29

31 (a) (b) (c) (d) Figure 10: a) Original depth map frame 2, b) reconstructed depth map using the depth map coding scheme, c) visualised image corresponding to the original depth map, d) visualized image corresponding to the reconstructed depth map. 30

32 Figure 11: Wire mesh adapted to the image \Head". (a) (b) Figure 12: (a) Wire mesh modeled depth map, (b) the corresponding visualized image. 31

33 Current Image Object Boundary Extraction Arithmetic Coder Previous Image Previous Image 3 D Motion Compensation Desired Motion Parameters Previous Depth Map 3 D Motion Compensation Reconstructed Image ENCODER Depth Map Coder DCT and Huffman Coding Depth Map Decoder IDCT and Huffman Decoding DECODER + Current Image Estimate Figure 13: Block Diagram of the proposed Visualised Image Coding Scheme. 32

34 (a) (b) (c) (d) Figure 14: a) Original visualised image frame 2, b) Reconstructed image using the special 3D object coder, c) Reconstructed image using the MPEG-II coder, d) Reconstructed image using the general purpose object-based coder. 33