IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Low power data transfer and storage exploration for

Size: px
Start display at page:

Download "IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Low power data transfer and storage exploration for"

Transcription

1 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Low power data transfer and storage exploration for H.263 video decoder system Lode Nachtergaele, Francky Catthoor, Bhanu Kapoor, Stefan Janssens and Dennis Moolenaar Abstract We describe a power exploration methodology for data-dominated applications using a H.263 video decoding demonstrator application. The starting point for our exploration is a C specication of the video decoder, available in the public domain from Telenor Research. We have transformed the data transfer scheme in the specication and have optimised the distributed memory organisation. This results in a memory architecture with signicantly reduced power consumption. For the worst-case mode using Predicted (P) frames, memory power consumption is reduced by a factor of 7 when compared to the reference design. For the worst-case mode using Predicted and Bidirectional (PB) frames, memory power consumption is reduced by a factor of 9. To achieve these results, we make use of our formalised high-level memory management methodology, partly supported in our ATOMIUM environment. Keywords Videophone systems, Logic design, Very-largescale integration T I. Introduction HE video coding algorithm of Draft Recommendation H.263 is based on motion -compensated hybrid predictive and transform coding with improvements to t bit rates less than 64kbit/s. It is a complex and relevant example of a data-dominant application. A hardware realisation of such a decoder has to be power ecient in order to reduce the size of the chip packages where it is embedded, or the battery if it would be used in a mobile application. It is well-known by now that any future complex chip realisation has to take power reduction into account [1]. Our previous research has clearly shown that the dominant power contribution in data-dominated designs lies in the data transfer and storage of multi-dimensional array signals and other complex data types [2], [3]. In this paper we exploit this feature to achieve large savings in the system power without having to worry about the detailed data-path, foreground registers, and controller architecture. The main contributions in this paper will be the evaluation of the applicability and eectiveness of our power oriented methodology for data-dominated applications [4], [5], [3], (see Fig. 1), a study of the eect of the possible optimisations, and the application of the most promising alternatives in the correct sequence on the H.263 decoder This research was partly sponsored by a co-operation with Texas Instruments Incorporated. L. Nachtergaele and F. Catthoor are with IMEC, Kapeldreef 75, B3001 Heverlee, Belgium. F. Catthoor is also Professor at the Katholieke Universiteit, Leuven, Belgium. B. Kapoor was a resident at IMEC from the Corporate R&D labs of Texas Instruments Incorporated, Dallas, Texas. S. Janssens was a student from Erasmus Hogeschool and is now with IMEC D. Moolenaar was a student from Delft Univ. of Technology and is now with IMEC. algorithm. In addition, we have substantiated our earlier claims [2] that the cost of the background storage and related transfers is dominant during the system exploration. This will be shown in section VI by investigating the power in a representative data-path in H.263, including its corresponding local memories. In the rest of this paper, we have concentrated on the main storage (memory) and transfer related parts of the H.263 decoder architecture. This exploration has been done based on a power model described in section III. The nal results for the dierent steps are illustrated in Fig. 12. A brief version of this paper has been published in [6]. The numerous pointers and variables in the C code, which are used in the reference implementation, have been removed by rewriting the specication into a mixed applicative-procedural DFL description [7]. As a result, more indices and some extra signal copies and accesses are present in the code but the dependencies are much more transparent. This allows for systematic identication of the sources for potential optimisation. Moreover, this step is essential in applying a number of data storage and transfer related analysis and exploration/optimisation techniques which are collected in our high-level memory management methodology/script, partly supported by the prototype ATOMIUM environment [4]. Our strategy to obtain area and power gures is based on selecting the worst-case parameters and modes in the H.263 specication. This is valid for computing the maximal power budget and for nding the component size, which aects mainly area, but not directly for the average power consumption. Still, we believe that the maximal power consumption gives a good view on the relative importance of the dierent components in the power budget and on the savings which can be obtained. In order to have a good view on the absolute average power consumption, we require accurate statistics on the occurrence of the different cases. In the sequel, we will only give some relative indication of this. The following major algorithmic transformations and memory organisation optimisations have been performed on the DFL specication, incorporating mainly the power budget related to the access to/from the frame memories: 1. First the code was pruned to retain the operations relevant to the overall complexity of the description with respect to the number of cycles, area and power consumption. This boils down to keeping the relevant storage and accesses of the arrays storing the picture information explicitly and hiding details of arithmetic operations in function calls. As a result, the potential overhead of transfers and storage in the applicative writing style is removed when

2 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Formal verification #Cycles Memory Library #,type,ports, memory System Specification Pruning/analysis Pruned system specification Data flow trafo Loop trafo Optimized flow-graph Mem.Hierarchy Flow-graph balancing Extended/ordered flow-graph Allocation/ Assignment Signal to ports and memories Inplace Optimization Index expressions in reduced memories Address optimization Address hardware generation Netlist of Memories, Address logic w1. #r/w w2. size (POWER/AREA) C-code generation Simulation Updated flow-graph #Cycles HBB lib Data-path/ control synthesis Fig. 1. ATOMIUM script for storage/communication optimisation in the specication to be used for simulation and hardware/software synthesis. interpreted eectively. 2. Several data-ow transformations have been performed. The methodology for carrying out these transformations and their eects are described in [8]. One of the major transformations results in the removal of all the border accesses used in the H.263 decoder, as discussed in subsection V-A. 3. Advanced transformations on the global function hierarchy and loop nests have been performed. These transformations have a signicant eect and will be partly discussed in subsection V-B. They are also essential to enable the application of the further exploration steps. 4. In order to further exploit locality of access and data reuse, extra memory hierarchy levels have been incorporated (see subsection V-C). For the P mode, this step has been especially eective in the \overlapped-block motion compensation" (OBMC) mode which has the largest power consumption. We will only show the principle involved in this optimisation as depicted in Fig. 6. For the B pictures and for the combination with computation, we will show more details. 5. Finally we have performed actual memory allocation and in-place mapping to determine the detailed memory organisation for the frame memories and some of the smaller intermediate memories. This step will be discussed in Section V-D. It has a large eect on the required area, which is reduced by almost a factor 2, with only a very limited increase in the power budget. II. H.263 video decoding H.263 is a draft recommendation for video coding for narrow telecommunication channels at < 64kbit/s [9]. The coding/decoding is a block based algorithm that exploits spatial and temporal redundancy. Three standard video formats are used in conjunction with H.263, called QCIF, Sub-QCIF, and CIF. A QCIF picture has pixels, represented by 9 11 macroblocks. Each macroblock has six blocks of 8 8 pixels. This is due to the (4:2:0) decimation of chrominance values. The picture that serves as the reference for prediction is called the P-picture. From the past P-picture, a future P-picture is predicted. This is called the forward P prediction. Interpolation between past and future P-pictures yield Bidirectional B-pictures (see Fig. 2). A PB-frame consists of two pictures : a P-picture, which is predicted from last decoded P-picture, and a B-picture, which is predicted from last decoded P-picture and the P- picture currently being decoded. Parts of the B-picture may be bidirectionally predicted from the past and future P-pictures. For PB-frames the coding mode intra (I) implies the P-blocks are intra coded, and the B-blocks are inter coded with prediction as for an inter block. A decoder can be in one of the three modes; I, P, or PB mode. Two extensions are orthogonal to the P and the PB modes: the unrestricted motion vector extension allows motion vectors pointing outside the frame, whereas in overlapped block motion compensation (OBMC), 4 extra motion vectors are used to compensate motion. When we refer in this paper to the P or PB mode, we assume that both extensions are in use. Hence, the P and the PB mode refer to two modes that are most energy consuming. Past P-picture Forward B Forward P B-picture Backward B Future P-picture Fig. 2. Forward P, forward B and backward B predictions. III. Power model Time For data intensive applications, such as video decoding, data transfers dominate the power consumption. Therefore the primary design goal is to reduce memory transfers

3 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH between large frame memories and datapaths. The cost of a data transfer is a function of the memory size, memory type, and the access frequency F real. F real is dened as the real number of accesses per second and not the clock frequency. When there is a clock tick and the memory is not accessed, it is assumed that memory is in power-down mode. This assumption holds for most modern low-power RAMs [10]. The memory itself is characterised by the number of ports, words, bits, and the aspect ratio of the layout. We make use of an accurate but proprietary power model from Texas Instruments for the power exploration. In this paper, only the number of transfers per frame/picture (directly related to F real ) will be discussed. Data-paths R W R e g f R W 54,912w 9bit 1-port SRAM edgeframe?k,?bit?-port buffers 38,016w 9bit 1-port SRAM 76032R W Shared memory space 38,016w 9bit 1-port SRAM IV. The reference design To obtain an acceptable reference, we have counted the number of transfers to the arrays, that hold the past P, future P, and B pictures, in the Telenor C implementation [11]. These numbers depend on the mode of reconstruction. The ow of data using all extensions is depicted in Fig. 3 using thin lines. The order of computation of pictures P T-1, Pext T-1, Pnew T-1, P T, B1 T, B2 T, and B T is shown in the gure. The dashed lines indicate that pictures Pnew T-1 and P T are stored in array signal whereas the pictures B1 T, B2 T and B T are stored in. The rectangles with a bold border are the nal pictures after decoding. The thick line indicates that oldframe and are interchanged after each decoded frame. In the C code, this is done by swapping the pointers to oldframe and. This reects that main memory is not being wasted in the C implementation, because the simulation speed is also aected by this. The corresponding abstract organisation for the continuous P mode is shown in Fig. 4. Fig. 4. Reference memory organisation and worst-case number of transfers while decoding 1 PB-frame TABLE I Frame memory transfers per picture in the reference C code old/new frame Mode Worst-Case Average edgeframe Mode Worst-Case Average Mode Worst-Case Average P T-1 oldframe Add Border Pext T edgeframe Decoding Forward P Forward B In-place switch Pnew T B1T P T Backward B B2T Fig. 3. Data ow for decoding PB-frames. Table I lists worst-case and average number of transfers to the frame memories per picture. The worst-case numbers are obtained analytically and not by simulation. This means that whenever code is executed conditionally, the conditions are assumed to branch to the most energy consuming option. For example, it is assumed that every macroblock is motion compensated. This is clearly a worst-case assumption. Mode 1 uses prediction with overlapped motion compensation and unrestricted motion vectors. In Mode 2, bidirectional prediction is also included, introducing the extra transfers to the. An acceptable value for the average case was obtained by counting the accesses when simulating the decoding of the video stream suz This stream contains 75 QCIF frames (which B T corresponds with 2.5 second real-time video). The length of stream depends on the encoding options and is listed in Table II together with the compression ratio. The C code, used as a reference, is optimised to run as fast as possible on a given workstation. It is indeed not optimised for ecient implementation. But it is a typical documentation that implementation groups start with. Mostly a direct mapping of the algorithm and the datastructures is made on a block diagram and each block is then optimised locally and implemented eciently. This is why most video decoders have a large external memory (with high bandwidth) that holds 3 complete images. Also, the memory interface typically becomes a big component of the design. When compared to similar state of the art video decoders [12], [13], [14], [15], [16], [17], [18], [19] which also include memory for three pictures, we believe that the access numbers to these memories will be comparable if the bi-directional mode is considered. If bi-directional is not used, the accesses will be comparable to the accesses corresponding to the P-picture.

4 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH TABLE II Number of words of 8 bit in test streams suz and the compression factor, Mode d stands for \unrestricted motion vectors", Mode f stands for \Overlapped Block Motion Compensation (OBMC)" and Mode g stands for \bidirectional prediction" Mode Length Compression d f fd g gd gf gfd V. System exploration for power We will now give a summary for each of the main optimisations listed in section I. They have been applied starting from the initial mixed applicative and procedural DFL description of the video decoder. The high-level memory management methodology/script, partly supported by the prototype ATOMIUM environment [4], has been applied here. A. Removal of the border In order to accommodate for unrestricted motion vectors, a complete border consisting of 44 macroblocks is added to the oldframe. It is not just lled with zeroes but with real data copies in a non-trivial way [9]. To simplify the control ow in the original C, these data are duplicated in the frame signals (cfr. edgeframe in gure 3) prior to the actual image manipulations, resulting in storage and transfer overhead both for reading and writing. Actually, this requires an extra pixels to be stored. To reduce this overhead, the dependences on the border data can be checked by (manifest) conditions on the position of the pixels to be read. Now, instead of storing and accessing duplicate data, the original pixels are read at the boundary row/columns of the image frame. These guarding conditions have to be implemented in the controller and will steer the data-path. Usually, also some local buering is necessary then. Several stages of optimisation are possible here, starting from a simple context-independent caching of the border data (which is apparently selected in most industrial designs) up to a heavily optimised context-dependent checking and reduced local buering. All of these alternatives make the storage for the extra borders superuous but only the latter option allows to remove all redundant picture accesses. If we assume that on the average this reduction is about a factor 8 1, we have an extra reduction 1 This is realistic because the motion vectors are in practice relatively small and not uniformly distributed between 0 and 15. If we assume a uniform range of (-7,7) for the motion, the number of pixels residing outside the oldframe border is on the average about 1/8 of in read accesses of about This is however datadependent. In terms of power consumption, our detailed models show that we obtain a saving of between 24% and 27% by the combined eect of less transfers and a reduced frame size. The gain in power comes at the price of an increased complexity of the code and the size of the controller though. Still, as the power consumed in the controller is quite small, the trade-o for power is clearly in favour of transforming all the border accesses. The resulting data ow without the border is depicted in Fig. 5. B. Loop and function restructuring to combine backward and forward P and B predictions, and In the Telenor C code, decoding a PB-frame starts with decoding the incoming bit stream and results in a P and a B macroblock containing dierential errors in the frequency domain and motion information (Task 1 in Fig. 5). Next, the forward P and B predictions are performed based on the motion information (Task 2 and 3). This yields a forward predicted B and P block. Both blocks are directly stored in a picture called B1 T and Pnew T respectively. Then, the decoded P macroblock is transformed to the spacial domain by means of an (Task 4). This P macroblock is added together with the macroblock read back from picture PnewT and stored in picture P T (Task 5). This picture, together with the macroblock stored in B1 T, is needed to do the backward B prediction (Task 6). The result is stored in picture B2 T. Also this picture is corrected with dierential errors similar as for the P-picture (Task 7 and 8). P T oldframe Decoding 1 Forward P 3 Forward B 2 Pnew T B1T 5 4 P T 7 Backward B 6 B2T 8 B T Fig. 5. Data ow after removal of the border and ordering of the main tasks while inter decoding a PB-frame The gure also illustrates that instead of just producing a P and B picture once, the pictures are read and written several times in the original description. More precisely, since B1 T, B2 T and B T are stored in, every pixel in it is three times written and two times read. Pnew T and P T are stored in, hence this picture memory is written twice and read once. Probably, the reason for this was to simplify the algorithmic description eort for the system designers. As an illustrative example, we will now explain how global loop transformations and complex restructuring of the hierarchy in the code allows to create more locality of access based on the pseudo code for task 2 and 3 in Fig. 5. The code for Forward P (Task 3) is : the pixels inside

5 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH if "Advanced prediction mode" { for (comp=0; comp < 4; comp) { P[T][0] = recon_comb_obmc(p[t-1][0],,comp); P[T][1] = recon_comp(p[t-1][1], ); P[T][2] = recon_comp(p[t-1][2], ); else { In this code the rst index of signal P indicates the previous ([T-1]) or the current ([T]) prediction. The second index is to access the luminance information Y (index [0]) or the chrominance information Cr (index [1]) and Cb (index [2]). The code for Forward B (Task 2) is : if "Advanced prediction mode" { if "Overlapped motion compensation" { B[T][0] = recon_comp(p[t-1][0],,comp); else { B[T][1] = recon_comp(p[t-1][1], ); B[T][2] = recon_comp(p[t-1][2], ); else { C. Memory hierarchy related optimisations This step involves data ow transformations which introduce extra transfers between the dierent memory levels and which are used mainly to reduce the power cost. In particular, temporary values { to be assigned to a \lower level" { are added wherever a signal in a \higher" level is read more than once. The duplicate read is then performed on the lower level temporary signal. The same can happen in the other direction for writes. If a signal assigned to a higher level is composed of several contributions, it does not make sense to update the nal result always in the higher level memory. Instead, it is usually better to perform the composition from the contributions consecutively (or in a close ordering) in a lower level (or several levels in more complex situations) and then directly transfer the - nal result to the higher level. The principle of this buering process on the macro-block access is shown in Fig. 6. P T-1 oldframe 3x3 Buffer Reconctruction P T Remark that in this code the recon comp function is issued once instead of four times the recon comp obmc function in the comp loop. When merging task 2 and 3, we get : Fig. 6. Principle of 3 3 (macro-)block buering between old=edgeframe and motion compensation routines, which act on central block to be stored in. if "Advance prediction mode" { for (comp=0; comp < 4; comp) { P[T][0] = recon_comp_obmc(p[t-1][0],,comp); if (mode == MODE_INTER4V) { B[T][0] = recon_comp(p[t-1][0],,comp); else { P[T][1] = recon_comp(p[t-1][1], ); B[T][1] = recon_comp(p[t-1][1], ); P[T][2] = recon_comp(p[t-1][2], ); B[T][2] = recon_comp(p[t-1][2], ); else { In Fig. 5, the reconstructed macroblocks are rst written to Pnew T-1. After reconstruction, a correction is performed with the dierential errors resulting from the transform. This process is shown in the following pseudo code : for (macroblocknr=1 to 99) { (Pblock[][], Bblock[][]) = Decoding(); (reconf_pblock[][], reconf_bblock[][]) = Forward_B&P_prediction(P[T-1]); "Store Preconblock[][] in Pnew[T]"; _Pblock[][] = (Pblock[][]); "Add _Pblock[][] to Pnew[T] and store in P[T]"; This can be rewritten as : The recon comp, recon comp new and the recon comp obmc functions perform dierent kinds of motion compensations depending on the motion vectors. Moreover, they are not embedded in the same loop scopes. However, with complex code restructuring it is possible to combine them. This class of optimisations is crucial because they enable further optimisation on the memory hierarchy, which is discussed hereafter in subsection V-C. for (macroblocknr=1 to 99) { (Pblock[][], Bblock[][]) = Decoding(); (recon_pblock[][], recon_bblock[][]) = Forward_B&P_prediction(buffer); _Pblock[][] = (Pblock[][]); new_pblock[][] = recon_pblock[][] _Pblock[][]; "Store new_pblock[][] in P[T]";

6 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH In this code, a buer, called buffer, is created. Now the task Forward B&P prediction will read several times from buffer instead of from picture P T-1. This results in large power savings. Also, an extra block, called new Pblock is introduced. Therefore extra copies from this block to picture P T are necessary. Since these extra copies are situated at a lower memory hierarchy level, the global power consumption due to the memory transfers will still be reduced. We now apply this principle in case of decoding an PB-frame, like depicted in Fig. 5. Instead of storing the forward predicted macroblock in Pnew T, the result is stored in a buer called reconf Pblock. This buer is corrected with the dierential errors that result from the, called Pblock, to yield the nal forward P prediction new Pblock. This nal block together with the forward predicted B macroblock, called reconf Bblock, and motion information is required for the backward reconstruction. Instead of reading from B1 T and P T, the backward reconstruction is based on the buers reconf Bblock and new Pblock. The result in is stored in buer reconb Bblock. instead of B2 T. Similar as for the P macroblock, it is corrected with the differential errors in Bblock to yield the nal B prediction block new Bblock. Extra transfers are introduced to transfer the nal block to the picture B T stored at the highest level. The resulting data ow, when decoding a PB-frame after introducing extra memory hierarchy, is shown in Fig. 7. The pictures with a bold border are to be stored at the \highest" level of hierarchy. This level corresponds to the memory with the biggest transfer cost. Other smaller buers, such as buffer, Pblock, Bblock, reconf Pblock, reconf Bblock, new Pblock, reconb Bblock and new Bblock are stored at \lower" levels. In addition to this, many other similar optimisations have been performed for the dierent decoder modes (especially in the \overlapped -overlappedblock motion compensation" mode). P T-1 oldframe Buffer Decoding 1 Forward P&B 2 Pblock reconf_pblock reconf_bblock 4 Bblock 3 _Pblock new_pblock 6 5 Backward B _Bblock reconb_bblock 7 new_bblock B T P T Fig. 7. Modied dataow after introducing memory hierarchy. one that needs most memory, corresponds with the dependence in the following pseudo code : for (y=1; y <= 11; y) { for (x=1; x <= 9; x) { Read from block (y-1,x-1) from P[T-1]; Predict block (y,x); Write block (y,x) in P[T]; Subtracting the consumption address (y? 1) 11 x? 1 from the production address y 11 x : [y 11 x]? [(y? 1) 11 x? 1] = 11 1 = 12 yields the numbers of blocks in the diagonally shaded intersection of Fig. 8 (Right). When introducing a least-in-rstout (LIFO) buer of 121 macro blocks, picture P T-1 and picture P T can be stored inplace in only 1 picture called old/ : for (y=1; y <= 11; y) { for (x=1; x <= 9; x) { Read from block (y-1,x-1) from old/; Predict block (y,x); Pop block from buffer and store at position (y-1,x-1); Push block (y,x) in the buffer; The buer mechanism can be implemented by calculating the block addresses modulo 13 [20]. This results in a snake-like operation of the buer, as illustrated in Fig. 9. The resulting data ow is depicted in Fig. 10 where the 13 macroblocks are shown in the pipeline of the snake. Implementing this dataow, taking into account extra possibilities of memory hierarchy optimisations, leads to the detailed organisation depicted in Fig. 11. This in-place optimisation does not aect the number of background transfers but signicantly reduces the total size of the background memories. This will result in a smaller area cost. The combined picture is only 13 macroblocks larger than one of the two pictures required initially. oldframe old/ D. In-place storage of past and future P-pictures In Fig. 8 (Left), the light gray area covers the portion of oldframe that is still needed for reconstruction. In Fig. 8 (Middle), the gray area covers pixels that already are calculated. Array signals oldframe and can be stored in-place if the shaded area in Fig. 8 (Right) is stored in a buer. Decoding the macroblock in row y and column x uses data that is stored in blocks with coordinates (y 1; x 1) in oldframe. The worst-case dependence, the Fig. 8. Put oldframe and in-place E. Relative impact of the dierent exploration steps Fig. 12 gives an overview of the relative power consumption for each optimisation stage for the PB mode. This

7 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH New frame New active 16 8 Block (y-1,x-1) at T-1 Block (y,x) at T Block (y,x) at T-1 Fig. 9. Principle of buer update mechanism Old active Old frame recon error Fig. 11. Detailed memory organisation of in-place Header decoding old/ Decoding 2 1 Forward P&B Backward B 7 B T Relative power H263 reference (PB) Remove border Combine back/forward Combine Local cache OBMC Interpolation buffer Inplace Fig. 10. Data ow after in-place storage of oldframe and. is the power consumed by the picture memories when decoding bidirectional B frames with unrestricted motion vectors and overlapped motion compensation. The power consumption is normalised with respect to the power consumption for the reference design. The power gures are based on worst-case assumptions. The bar chart shows that when all optimisations discussed in this paper are applied, the power consumption is reduced by a factor of 9. Similar optimisations as reported in this article have been applied on the H.263 decoder running in the P mode. They reduced the worst-case power consumption by a factor of 7. The main dierence with optimisations for the PB mode is the absence of optimisations related to bidirectional coding. The optimisations described in this paper has been partially applied on the public domain C code from Telenor Research [11]. Simulation of the resulting C code, while decoding stream suz for all the dierent decoding modes, shows that the average power consumption of the memories reduced to 57%. Remark that in these simulations, the extra transfers due to an extra layer of hierarchy are taken into account. VI. Power consumption of A DFL specication of algorithm [21] was simulated and veried using Mentor's DSP Station. This specication was synthesised using our datapath synthesis tool Dolphin [22]. Dolphin synthesis has resulted in a VHDL Fig. 12. Relative power in continuous PB mode for each optimisation stage of H.263 frame access netlist which was mapped to the TI TGC2000 library using Synopsys' Design Analyzer and converted to the Verilog netlist. A net capacitance le for the design was generated using Synopsys' Design Analyzer tool. The Verilog netlist has been simulated for toggle counts using Cadence's Verilog-XL simulator. The average power consumption for the datapath was then computed using the net capacitance and the toggle count les. The computation of power consumption of the memory unit in the uses the power modelling described in section III. Table III lists the average power consumption of the for the 3 video formats used in conjunction with H.263. The computation uses a frame rate of 30 frames per second to derive the smallest possible frequency of operation for the datapath and memory units. This module is the most arithmetic dominant in the entire H.263 specication. Still, it has been shown that the power for a direct realisation with commercial logic synthesis and gate array circuits is about 2 orders of magnitude smaller than the power in the combined unoptimised frame accesses. So, initially ignoring this arithmetic in the system exploration is motivated. VII. Conclusion We believe that the results described in this paper clearly substantiate the validity of the proposed high-level memory

8 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH TABLE III power consumption for the three video formats, Pt = Normalised average power for the, Pm = Normalised average memory power at frequency fm, fm = Smallest frequency (in MHz) at which memory can be operated, Pd = Normalised average datapath power in mw at frequency fd, fd = Smallest frequency (in MHz) at which datapath can be operated. Format Pt Pm fm Pd fd QCIF Sub-QCIF CIF management methodology for data-dominated applications like the H.263 video decoder. They show the very promising results on power reduction which can be obtained by system level exploration, i.e. up to a factor of 9 of maximal power in the worst-case mode. The same methodology has also been applied to an MPEG-2 [23] video decoder, a medical back-projector [24] and a segment protocol processor of the common adaptation layer of ATM [24]. Also there signicant savings have been obtained. In the future, we will also explore the possibilities of these optimisations on a mixed software-hardware platform, as provided e.g. by the TI cdsp approach which supports a single-chip heterogeneous design consisting of embedded cores, sea-of-gate logic and embedded memories. Acknowledgements: We gratefully acknowledge the discussions with our colleagues and especially the contributions of E. De Greef, M. Eyckmans, P. Six and S. Wuytack. This research was partly sponsored by Texas Instruments Incorporated, Dallas, Texas. References [1] R-H.Yan, L.Terman (eds.), \Special issue on Low Power Electronics," Proceedings of the IEEE, vol. 83, no. 4, pp. 495{700, April [2] F.Catthoor, F.Franssen, S.Wuytack, L. Nachtergaele, and H. De Man, \Global Communication and Memory Optimizing Transformations for Low Power Signal Processing Systems," in VLSI Signal Processing VII, Jan Rabaey, Paul M. Chau, and John Eldon, Eds., New York, October 1994, IEEE workshop on VLSI signal processing, pp. 178{187, IEEE Press. [3] Sven Wuytack, Francky Catthoor, Lode Nachtergaele, and Hugo De Man, \Power exploration for data dominated video applications," in nternational Symposium on Low Power Electronics and Design, Monterey, California, August 1996, pp. 359{364. [4] Lode Nachtergaele, Francky Catthoor, Florin Balasa, Frank Franssen, Eddy De Greef, Hans Samsom, and Hugo De Man, \Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems," in Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing, San Jose, California, August 1995, pp. 82{87. [5] Eddy De Greef, Francky Catthoor, and Hugo De Man, \Memory organization for video algorithms on programmable signal processors," in Computer Design : VLSI in Computers & Processors. IEEE, October 1995, pp. pp. 552{557. [6] Lode Nachtergaele, Francky Catthoor, Bhanu Kapoor, Stefan Janssens, and Dennis Moolenaar, \Low power storage exploration for h.263 video decoder," in VLSI Signal processing, November [7] P.N. Hilnger, J. Rabaey, D. Genin, C. Scheers, and H. De Man, \DSP specication using the Silage language," in Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Alburquerque, NM, April 1990, pp. 1057{1060. [8] F. Catthoor, M. Janssen, L. Nachtergaele, and H. De Man, \System-level data-ow transformations for power reductionin image and video processing," in Proceedings of the International Conference on Electronics, Circuits and Systems, Rhodos, Greece, October 1996, IEEE, pp. 1025{1028. [9] Karel Rijkse, \Video coding for narrow telecommunication channels at < 64 kbit/s," Tech. Rep., Telenor R & D, [10] Kiyoo Itoh, Katsuro Sasaki, and Yoshinobu Nakagome, \Trends in low-power ram circuit technologies," Proceedings of the IEEE, vol. 83, no. 4, pp. 524{543, April [11] Digital Video Coding at Telenor R & D, \Telenor's h.263 software, version 1.3," February 1995, software/. [12] Aldo Cugnini and Richard Shen, \Mpeg-2 video decoder for the digital hdtv grand alliance system," IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 748{753, August [13] T. Demura, et. al., \A single-chip mpeg2 video decoder lsi," in International Solid-State Circuits Conference. IEEE, ferbruary 1994, pp. 72{73. [14] D. Galbi, et. al., \An mpeg-1 audio/video decoder with runlength compressed antialiased video overlays," in International Solid-State Circuits Conference. IEEE, February 1995, pp. 289{ 287. [15] GEC Plessey Semiconductor, \An overview of the h.261 video compression standard and its implementation in the gps chipset," October [16] Michel Harrand, Michel Henry, Philippe Chaisemartin, Paul Mougeat, Yves Durand, Alain Tournier, Robin Wilson, Jean- Claude Herluison, Jean-Claude Langchambon, Jean-Luc Bauer, and Michel Runtz andjoseph Bulone, \A single chip videophone encoder/decoder," in International Solid-State Circuits Conference. IEEE, February 1995, pp. 292{293. [17] Toshihiro Masaki, Yasuo Morimoto, Takao Onoye, and Isao Shirakawa, \Vlsi implementation of inverse discrete cosine transformer and motion compensator for mpeg2 hdtv video decoding," IEEE Transaction on Circuit and Systems for Video Technology, vol. 5, no. 5, pp. 387{395, october [18] M. Toyokura, et. al., \A video dsp with a macroblock-levelpipeline and a simd type vector-pipeline architecture for mpeg2 codec," in International Solid-State Circuits Conference. IEEE, february 1994, pp. 74{75. [19] Shinobu Ueda, Y. Kiyose, Y. Kishida, S. Sotoda, M. Kawabata, T. Furukawa, and S. Kawabe, \Development of an mpeg2 decoder for magneto-optical disk video players," IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 521{527, august [20] J. Vanhoof, K. van Rompaey, I. Bolsens, G. Goossens, and H. De Man, High-Level Synthesis for Real-Time Digital Signal Processing, Kluwer Academic Publishers, Boston, [21] W-H Chen, C. H. Smith, and S. C. Fralick, \A fast computational algorithm for the discrete cosine transform," IEEE Transactions on Communications, pp. 1004{1009, September [22] P. Schaumont, B. Van Thournout, I. Bolsens, and H. De Man, \Synthesis of pipelined dsp accelerators with dynamic scheduling," in Proceedings of the 8 th International Symposium on System-Level Synthesis, Cannes, France, September 1995, ACM/IEEE, pp. 72{77. [23] D. Moolenaar, \System specication and storage exploration for two video compression standards," M.S. thesis, Delft University, Delft, The Netherlands, May 1996, ftp://ftp.imec.be/pub/vsdm/reports/video codec optim/ MPEG2 code optim.ps.gz. [24] F. Catthoor, L. Nachtergaele, and S. Wuytack, \Optimizing data transfers and memory for low power," accepted for publication in ASIC & EDA magazine, ftp://ftp.imec.be/pub/vsdm/reports/system lev power opt/ fc-asic eda96.ps.gz, 1997.

9 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Lode Nachtergaele is member of the Multimedia Image Compression Systems (MICS) group since '96. This group is part of the application group of the VLSI Systems & Design Methodology (VSDM) division of the Interuniversity Micro Electronics Center (IMEC). His current aim is to further distill an operational methodology that improves design times of embedded multimedia systems. The resulting design ow is reinjected as a stepping stone in future application challenges. Ing. Nachtergaele received his degree of Industrial Engineer in 1989 from the Katholieke Industriele Hogeschool, Oostende, Belgium. In the same year he joined IMEC starting his career in the group that worked on the Cathedral-II silicon compiler. There he was involved in the development of the Silage simulator S2C. In '92, he joined the System Exploration for Memory and Power (SEMP) group. Together with his colleagues, he worked on the ATOMIUM methodology, partly supported with prototype tools. Francky Catthoor received the engineering degree and a Ph.D. in electrical engineering from the Katholieke Universiteit Leuven, Belgium in 1982 and 1987 respectively. From September 1983 till June 1987 he has been a researcher in the area of VLSI design methodologies for Digital Signal Processing, with Prof. Hugo De Man and Prof. Joos Vandewalle as Ph.D. thesis advisors. Since 1987, he has headed several research domains in the area of high-level and system synthesis techniques and architectural methodologies, all within the VLSI Systems & Design Methodology (VSDM) division at the Inter-university Micro- Electronics Center (IMEC), Heverlee, Belgium. He is assistant professor at the EE department of the K.U.Leuven since His current research activities belong to the eld of architecture design methods and system-level exploration for power and area, mainly oriented towards memory management and global data transfer optimization. The major target application domains are real-time signal and data processing algorithms in image, video and end-user telecom applications, and data structure dominated modules in telecom networks. Both customized architectures and programmable multimedia processors are targeted. In 1986 he received the Young Scientist Award from the Marconi International Fellowship. Since 1995 he is an associate editor for the IEEE Trans. on VLSI Systems and since 1996 also for the Journal of VLSI Signal Processing. Stefan Janssens is with the System Exploration for Memory and Power group (SEMP) since '96. This group is part of the design technology group of the VLSI Systems & Design Methodology division (VSDM) of the Interuniversity Micro-Electronics Center (IMEC). He is currently focussed on the application and evaluation of the ATOMIUM and ADOPT methodologies in industrial applications. Ing. Janssens received his degree of Industrial Engineer in 1996 and joined IMEC in the same year; before he was involved with IMEC for his thesis. Dennis Moolenaar joined IMEC's Wireless Systems group since '96. This group is part of the application group of the VLSI Systems & Design Methodology (VSDM) division of the Interuniversity Micro Electronics Center (IMEC). The mission of this group is to do research in future telecom systems on silicon. Ir. Moolenaar current aim is to investigate the integration of multi-processor systems on a single chip. At the moment he is implementing a custom low power multi processor architecture for a DECT/GSM/DCS1800 multi-mode terminal. His interests are in processor architectures, multi processor systems and low power design. Ir. Moolenaar joined IMEC in '96. Before that he was involved with IMEC as a student for his internship and master thesis. Bhanu Kapoor received his B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Kanpur, India, in He received his M.S. and Ph.D. degrees in Computer Science from the Southern Methodist University, Dallas, Texas, in 1990 and 1994, respectively. He has been with the Corporate R&D labs of Texas Instruments Incorporated since His main research interests are in the areas of high performance and low power VLSI design and CAD tools, with an emphasis on algorithms and architectures for DSP applications. He is a member of the IEEE and the ACM.

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Version ECE IIT, Kharagpur Lesson H. andh.3 Standards Version ECE IIT, Kharagpur Lesson Objectives At the end of this lesson the students should be able to :. State the

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder Performance Analysis and Comparison of 15.1 and H.264 Encoder and Decoder K.V.Suchethan Swaroop and K.R.Rao, IEEE Fellow Department of Electrical Engineering, University of Texas at Arlington Arlington,

More information

Efficient Motion Estimation by Fast Three Step Search Algorithms

Efficient Motion Estimation by Fast Three Step Search Algorithms Efficient Motion Estimation by Fast Three Step Search Algorithms Namrata Verma 1, Tejeshwari Sahu 2, Pallavi Sahu 3 Assistant professor, Dept. of Electronics & Telecommunication Engineering, BIT Raipur,

More information

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Video Coding Basics Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Outline Motivation for video coding Basic ideas in video coding Block diagram of a typical video codec Different

More information

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Study and Implementation of Video Compression standards (H.264/AVC, Dirac) Study and Implementation of Video Compression standards (H.264/AVC, Dirac) EE 5359-Multimedia Processing- Spring 2012 Dr. K.R Rao By: Sumedha Phatak(1000731131) Objective A study, implementation and comparison

More information

Efficient Video Coding with Fractional Resolution Sprite Prediction Technique

Efficient Video Coding with Fractional Resolution Sprite Prediction Technique Efficient Video Coding with Fractional Resolution Sprite Prediction Technique Yan Lu, Wen Gao and Feng Wu An efficient algorithm for dynamic sprite-based video coding with fractional resolution motion

More information

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary: Video Compression : 6 Multimedia Systems (Module Lesson ) Summary: 6 Coding Compress color motion video into a low-rate bit stream at following resolutions: QCIF (76 x ) CIF ( x 88) Inter and Intra Frame

More information

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac) Project Proposal Study and Implementation of Video Compression Standards (H.264/AVC and Dirac) Sumedha Phatak-1000731131- sumedha.phatak@mavs.uta.edu Objective: A study, implementation and comparison of

More information

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

More information

White paper. H.264 video compression standard. New possibilities within video surveillance.

White paper. H.264 video compression standard. New possibilities within video surveillance. White paper H.264 video compression standard. New possibilities within video surveillance. Table of contents 1. Introduction 3 2. Development of H.264 3 3. How video compression works 4 4. H.264 profiles

More information

DESIGN OF VLSI ARCHITECTURE USING 2D DISCRETE WAVELET TRANSFORM

DESIGN OF VLSI ARCHITECTURE USING 2D DISCRETE WAVELET TRANSFORM INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE DESIGN OF VLSI ARCHITECTURE USING 2D DISCRETE WAVELET TRANSFORM Lavanya Pulugu 1, Pathan Osman 2 1 M.Tech Student, Dept of ECE, Nimra

More information

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet DICTA2002: Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne, Australia Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet K. Ramkishor James. P. Mammen

More information

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) September 30, 2013 Todays lecture Memory subsystem Address Generator Unit (AGU) Memory subsystem Applications may need from kilobytes to gigabytes of memory Having large amounts of memory on-chip is expensive

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Nandakishore Ramaswamy Qualcomm Inc 5775 Morehouse Dr, Sam Diego, CA 92122. USA nandakishore@qualcomm.com K.

More information

High performance digital video servers: storage. Seungyup Paek and Shih-Fu Chang. Columbia University

High performance digital video servers: storage. Seungyup Paek and Shih-Fu Chang. Columbia University High performance digital video servers: storage and retrieval of compressed scalable video Seungyup Paek and Shih-Fu Chang Department of Electrical Engineering Columbia University New York, N.Y. 10027-6699,

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

VLSI Architecture for DCT Based On High Quality DA

VLSI Architecture for DCT Based On High Quality DA International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-6, June 2014 VLSI Architecture for DCT Based On High Quality DA Urbi Sharma, Tarun Verma, Rita Jain

More information

Transparent D Flip-Flop

Transparent D Flip-Flop Transparent Flip-Flop The RS flip-flop forms the basis of a number of 1-bit storage devices in digital electronics. ne such device is shown in the figure, where extra combinational logic converts the input

More information

Figure 1: Relation between codec, data containers and compression algorithms.

Figure 1: Relation between codec, data containers and compression algorithms. Video Compression Djordje Mitrovic University of Edinburgh This document deals with the issues of video compression. The algorithm, which is used by the MPEG standards, will be elucidated upon in order

More information

Intra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann

Intra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann Intra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann Institut für Informationsverarbeitung, University of Hannover Appelstr 9a, D-30167 Hannover, Germany Abstract. Two fast

More information

Software-embedded data retrieval and error concealment scheme for MPEG-2 video sequences

Software-embedded data retrieval and error concealment scheme for MPEG-2 video sequences Software-embedded data retrieval and error concealment scheme for MPEG-2 video sequences Corinne Le Buhan Signal Processing Laboratory Swiss Federal Institute of Technology 1015 Lausanne - Switzerland

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Circuit and System Representation. IC Designers must juggle several different problems

Circuit and System Representation. IC Designers must juggle several different problems Circuit and System Representation IC Designers must juggle several different problems Multiple levels of abstraction IC designs requires refining an idea through many levels of detail, specification ->

More information

VESA Display Stream Compression

VESA Display Stream Compression Written by Frederick Walls, Associate Technical Director, and Sandy MacInnis, Senior Technical Director, Broadcom Corporation (VESA member) OVERVIEW Display manufacturers are turning to higher- resolution

More information

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute

More information

Enhancing High-Speed Telecommunications Networks with FEC

Enhancing High-Speed Telecommunications Networks with FEC White Paper Enhancing High-Speed Telecommunications Networks with FEC As the demand for high-bandwidth telecommunications channels increases, service providers and equipment manufacturers must deliver

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Peter Eisert, Thomas Wiegand and Bernd Girod. University of Erlangen-Nuremberg. Cauerstrasse 7, 91058 Erlangen, Germany

Peter Eisert, Thomas Wiegand and Bernd Girod. University of Erlangen-Nuremberg. Cauerstrasse 7, 91058 Erlangen, Germany RATE-DISTORTION-EFFICIENT VIDEO COMPRESSION USING A 3-D HEAD MODEL Peter Eisert, Thomas Wiegand and Bernd Girod Telecommunications Laboratory University of Erlangen-Nuremberg Cauerstrasse 7, 91058 Erlangen,

More information

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder. A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER Manoj Kumar 1 Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA ABSTRACT The MPEG/Audio is a standard for both

More information

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com

More information

Performance Oriented Management System for Reconfigurable Network Appliances

Performance Oriented Management System for Reconfigurable Network Appliances Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate

More information

Mobile Virtual Network Computing System

Mobile Virtual Network Computing System Mobile Virtual Network Computing System Vidhi S. Patel, Darshi R. Somaiya Student, Dept. of I.T., K.J. Somaiya College of Engineering and Information Technology, Mumbai, India ABSTRACT: we are planning

More information

Simulation & Synthesis Using VHDL

Simulation & Synthesis Using VHDL Floating Point Multipliers: Simulation & Synthesis Using VHDL By: Raj Kumar Singh - B.E. (Hons.) Electrical & Electronics Shivananda Reddy - B.E. (Hons.) Electrical & Electronics BITS, PILANI Outline Introduction

More information

Power Optimized Memory Organization Using Clock Gating

Power Optimized Memory Organization Using Clock Gating International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-6, June 2014 Power Optimized Memory Organization Using Clock Gating Lucky Khandelwal, Arpan Shah, Ramesh

More information

Multidimensional Transcoding for Adaptive Video Streaming

Multidimensional Transcoding for Adaptive Video Streaming Multidimensional Transcoding for Adaptive Video Streaming Jens Brandt, Lars Wolf Institut für Betriebssystem und Rechnerverbund Technische Universität Braunschweig Germany NOSSDAV 2007, June 4-5 Jens Brandt,

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL B. Dilip, Y. Alekhya, P. Divya Bharathi Abstract Traffic lights are the signaling devices used to manage traffic on multi-way

More information

Touchstone -A Fresh Approach to Multimedia for the PC

Touchstone -A Fresh Approach to Multimedia for the PC Touchstone -A Fresh Approach to Multimedia for the PC Emmett Kilgariff Martin Randall Silicon Engineering, Inc Presentation Outline Touchstone Background Chipset Overview Sprite Chip Tiler Chip Compressed

More information

International Journal of Electronics and Computer Science Engineering 1482

International Journal of Electronics and Computer Science Engineering 1482 International Journal of Electronics and Computer Science Engineering 1482 Available Online at www.ijecse.org ISSN- 2277-1956 Behavioral Analysis of Different ALU Architectures G.V.V.S.R.Krishna Assistant

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Implementation of H.264 Video Codec for Block Matching Algorithms

Implementation of H.264 Video Codec for Block Matching Algorithms Implementation of H.264 Video Codec for Block Matching Algorithms Vivek Sinha 1, Dr. K. S. Geetha 2 1 Student of Master of Technology, Communication Systems, Department of ECE, R.V. College of Engineering,

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

Zukang Shen Home Address: Work: 214-480-3198 707 Kindred Lane Cell: 512-619-7927

Zukang Shen Home Address: Work: 214-480-3198 707 Kindred Lane Cell: 512-619-7927 Zukang Shen Home Address: Work: 214-480-3198 707 Kindred Lane Cell: 512-619-7927 Richardson, TX 75080 Email: zukang.shen@ti.com Education: The University of Texas, Austin, TX, USA Jun. 2003 May 2006 Ph.D.,

More information

H.264 AVC Encoder IP Core Datasheet V.4.2, 2015

H.264 AVC Encoder IP Core Datasheet V.4.2, 2015 SOC H.264 AVC Video/Audio Encoder IP Core Datasheet Standard version I-Frame Version Slim Version Low-Bit-rate Version (with B frame) Special version for Zynq-7020 1. Product Overview (Integration information

More information

Audio Coding Algorithm for One-Segment Broadcasting

Audio Coding Algorithm for One-Segment Broadcasting Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient

More information

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Video Coding Standards Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Yao Wang, 2003 EE4414: Video Coding Standards 2 Outline Overview of Standards and Their Applications ITU-T

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

Secured Embedded Many-Core Accelerator for Big Data Processing

Secured Embedded Many-Core Accelerator for Big Data Processing Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,

More information

What are embedded systems? Challenges in embedded computing system design. Design methodologies.

What are embedded systems? Challenges in embedded computing system design. Design methodologies. Embedded Systems Sandip Kundu 1 ECE 354 Lecture 1 The Big Picture What are embedded systems? Challenges in embedded computing system design. Design methodologies. Sophisticated functionality. Real-time

More information

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden) Quality Estimation for Scalable Video Codec Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden) Purpose of scalable video coding Multiple video streams are needed for heterogeneous

More information

Algorithm and Programming Considerations for Embedded Reconfigurable Computers

Algorithm and Programming Considerations for Embedded Reconfigurable Computers Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor University Waco, Texas Douglas Fouts, Professor

More information

Digital System Design. Digital System Design with Verilog

Digital System Design. Digital System Design with Verilog Digital System Design with Verilog Adapted from Z. Navabi Portions Copyright Z. Navabi, 2006 1 Digital System Design Automation with Verilog Digital Design Flow Design entry Testbench in Verilog Design

More information

Thesis work and research project

Thesis work and research project Thesis work and research project Hélia Pouyllau, INRIA of Rennes, Campus Beaulieu 35042 Rennes, helia.pouyllau@irisa.fr July 16, 2007 1 Thesis work on Distributed algorithms for endto-end QoS contract

More information

An Embedded Based Web Server Using ARM 9 with SMS Alert System

An Embedded Based Web Server Using ARM 9 with SMS Alert System An Embedded Based Web Server Using ARM 9 with SMS Alert System K. Subbulakshmi 1 Asst. Professor, Bharath University, Chennai-600073, India 1 ABSTRACT: The aim of our project is to develop embedded network

More information

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Edson L. Horta 1 and John W. Lockwood 2 1 Department of Electronic Engineering, Laboratory of Integrated Systems, EPUSP

More information

Embedded Systems. 9. Low Power Design

Embedded Systems. 9. Low Power Design Embedded Systems 9. Low Power Design Lothar Thiele 9-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

A Tool for Multimedia Quality Assessment in NS3: QoE Monitor

A Tool for Multimedia Quality Assessment in NS3: QoE Monitor A Tool for Multimedia Quality Assessment in NS3: QoE Monitor D. Saladino, A. Paganelli, M. Casoni Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia via Vignolese 95, 41125

More information

A Real-time MPEG Video Encryption Algorithm using AES

A Real-time MPEG Video Encryption Algorithm using AES A Real-time MPEG Video Encryption Algorithm using AES Jayshri Nehete*, K. Bhagyalakshmi, M. B. Manjunath, Shashikant Chaudhari, T. R. Ramamohan Central Research Laboratory Bharat Electronics Ltd., Bangalore-560013,

More information

JPEG Image Compression by Using DCT

JPEG Image Compression by Using DCT International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 JPEG Image Compression by Using DCT Sarika P. Bagal 1* and Vishal B. Raskar 2 1*

More information

Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

More information

MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute,

MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute, PARALLEL NEURAL NETWORK TRAINING ON MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C International Computer Science Institute, Berkeley, CA 9474 Multi-Spert is a scalable parallel system built from multiple

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Floating Point Fused Add-Subtract and Fused Dot-Product Units Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,

More information

High Definition Display System Based on Digital Micromirror Device

High Definition Display System Based on Digital Micromirror Device High Definition Display System Based on Digital Micromirror Device Robert J. Gove, Vishal Markandey, Stephen W. Marshall, Donald B. Doherty, Gary Sextro, Mary DuVal Digital Imaging, Texas Instruments Inc.

More information

VLSI BASED COLOR INTERPOLATION ALGORITHM FOR REAL TIME IMAGE APPLICATIONS

VLSI BASED COLOR INTERPOLATION ALGORITHM FOR REAL TIME IMAGE APPLICATIONS VOL. 10, NO. 7, APRIL 2015 ISSN 1819-6608 VLSI BASED COLOR INTERPOLATION ALGORITHM FOR REAL TIME IMAGE APPLICATIONS Sudalai Utchimahali C. 1 and Rajakumar G. 2 1 M.E VLSI Design, Francis Xavier Engineering

More information

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl uhl@cosy.sbg.ac.

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl uhl@cosy.sbg.ac. Video Encryption Exploiting Non-Standard 3D Data Arrangements Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl uhl@cosy.sbg.ac.at Andreas Uhl 1 Carinthia Tech Institute & Salzburg University Outline

More information

H.264 Based Video Conferencing Solution

H.264 Based Video Conferencing Solution H.264 Based Video Conferencing Solution Overview and TMS320DM642 Digital Media Platform Implementation White Paper UB Video Inc. Suite 400, 1788 west 5 th Avenue Vancouver, British Columbia, Canada V6J

More information

Networking Remote-Controlled Moving Image Monitoring System

Networking Remote-Controlled Moving Image Monitoring System Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University

More information

The Emerging Trends in Electrical and Computer Engineering

The Emerging Trends in Electrical and Computer Engineering 18-200 Fall 2006 The Emerging Trends in Electrical and Computer Engineering Hosting instructor: Prof. Jimmy Zhu; Time: Thursdays 3:30-4:20pm; Location: DH 2210 Date Lecturer Lecture Contents L01 08/31

More information

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Insight, Analysis, and Advice on Signal Processing Technology BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Steve Ammon Berkeley Design Technology, Inc.

More information

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Accelerating Wavelet-Based Video Coding on Graphics Hardware Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

More information

Automatic Telephone Answering Machine With Image Processing

Automatic Telephone Answering Machine With Image Processing Automatic Telephone Answering Machine With Image Processing, Disha Malik 2 1 M.Tech Scholar, Department of ECE, Jayoti Vidyapeeth Women s University, Rajasthan, INDIA, richasinghkgi@gmail.com 2 M. Tech

More information

IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD

IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD R.Dhanya 1, Mr. G.R.Anantha Raman 2 1. Department of Computer Science and Engineering, Adhiyamaan college of Engineering(Hosur).

More information

Communications Systems Laboratory. Department of Electrical Engineering. University of Virginia. Charlottesville, VA 22903

Communications Systems Laboratory. Department of Electrical Engineering. University of Virginia. Charlottesville, VA 22903 Turbo Trellis Coded Modulation W. J. Blackert y and S. G. Wilson Communications Systems Laboratory Department of Electrical Engineering University of Virginia Charlottesville, VA 22903 Abstract Turbo codes

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS

AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS International Journal of Software Engineering and Knowledge Engineering World Scientific Publishing Company AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS

More information

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1} An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

Design and Implementation of 64-Bit RISC Processor for Industry Automation

Design and Implementation of 64-Bit RISC Processor for Industry Automation , pp.427-434 http://dx.doi.org/10.14257/ijunesst.2015.8.1.37 Design and Implementation of 64-Bit RISC Processor for Industry Automation P. Devi Pradeep 1 and D.Srinivasa Rao 2 1,2 Assistant Professor,

More information

AsicBoost A Speedup for Bitcoin Mining

AsicBoost A Speedup for Bitcoin Mining AsicBoost A Speedup for Bitcoin Mining Dr. Timo Hanke March 31, 2016 (rev. 5) Abstract. AsicBoost is a method to speed up Bitcoin mining by a factor of approximately 20%. The performance gain is achieved

More information

Oscillations of the Sending Window in Compound TCP

Oscillations of the Sending Window in Compound TCP Oscillations of the Sending Window in Compound TCP Alberto Blanc 1, Denis Collange 1, and Konstantin Avrachenkov 2 1 Orange Labs, 905 rue Albert Einstein, 06921 Sophia Antipolis, France 2 I.N.R.I.A. 2004

More information

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Fast Arithmetic Coding (FastAC) Implementations

Fast Arithmetic Coding (FastAC) Implementations Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by

More information

For Articulation Purpose Only

For Articulation Purpose Only E305 Digital Audio and Video (4 Modular Credits) This document addresses the content related abilities, with reference to the module. Abilities of thinking, learning, problem solving, team work, communication,

More information

Software Synthesis from Dataflow Models for G and LabVIEW

Software Synthesis from Dataflow Models for G and LabVIEW Presented at the Thirty-second Annual Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, California, U.S.A., November 1998 Software Synthesis from Dataflow Models for G and LabVIEW

More information

Spezielle Anwendungen des VLSI Entwurfs Applied VLSI design (IEF170)

Spezielle Anwendungen des VLSI Entwurfs Applied VLSI design (IEF170) Spezielle Anwendungen des VLSI Entwurfs Applied VLSI design (IEF170) Course and contest Intermediate meeting 3 Prof. Dirk Timmermann, Claas Cornelius, Hagen Sämrow, Andreas Tockhorn, Philipp Gorski, Martin

More information

Sample Project List. Software Reverse Engineering

Sample Project List. Software Reverse Engineering Sample Project List Software Reverse Engineering Automotive Computing Electronic power steering Embedded flash memory Inkjet printer software Laptop computers Laptop computers PC application software Software

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Jianguo Cao School of Electrical and Computer Engineering RMIT University Melbourne, VIC 3000 Australia Email: j.cao@student.rmit.edu.au

More information

VON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology

VON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS WWW.VONBRAUNLABS.COM Issue #1 VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS State Machine Technology IoT Solutions Learn

More information