IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Low power data transfer and storage exploration for
|
|
- Mercy Richards
- 6 years ago
- Views:
Transcription
1 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Low power data transfer and storage exploration for H.263 video decoder system Lode Nachtergaele, Francky Catthoor, Bhanu Kapoor, Stefan Janssens and Dennis Moolenaar Abstract We describe a power exploration methodology for data-dominated applications using a H.263 video decoding demonstrator application. The starting point for our exploration is a C specication of the video decoder, available in the public domain from Telenor Research. We have transformed the data transfer scheme in the specication and have optimised the distributed memory organisation. This results in a memory architecture with signicantly reduced power consumption. For the worst-case mode using Predicted (P) frames, memory power consumption is reduced by a factor of 7 when compared to the reference design. For the worst-case mode using Predicted and Bidirectional (PB) frames, memory power consumption is reduced by a factor of 9. To achieve these results, we make use of our formalised high-level memory management methodology, partly supported in our ATOMIUM environment. Keywords Videophone systems, Logic design, Very-largescale integration T I. Introduction HE video coding algorithm of Draft Recommendation H.263 is based on motion -compensated hybrid predictive and transform coding with improvements to t bit rates less than 64kbit/s. It is a complex and relevant example of a data-dominant application. A hardware realisation of such a decoder has to be power ecient in order to reduce the size of the chip packages where it is embedded, or the battery if it would be used in a mobile application. It is well-known by now that any future complex chip realisation has to take power reduction into account [1]. Our previous research has clearly shown that the dominant power contribution in data-dominated designs lies in the data transfer and storage of multi-dimensional array signals and other complex data types [2], [3]. In this paper we exploit this feature to achieve large savings in the system power without having to worry about the detailed data-path, foreground registers, and controller architecture. The main contributions in this paper will be the evaluation of the applicability and eectiveness of our power oriented methodology for data-dominated applications [4], [5], [3], (see Fig. 1), a study of the eect of the possible optimisations, and the application of the most promising alternatives in the correct sequence on the H.263 decoder This research was partly sponsored by a co-operation with Texas Instruments Incorporated. L. Nachtergaele and F. Catthoor are with IMEC, Kapeldreef 75, B3001 Heverlee, Belgium. F. Catthoor is also Professor at the Katholieke Universiteit, Leuven, Belgium. B. Kapoor was a resident at IMEC from the Corporate R&D labs of Texas Instruments Incorporated, Dallas, Texas. S. Janssens was a student from Erasmus Hogeschool and is now with IMEC D. Moolenaar was a student from Delft Univ. of Technology and is now with IMEC. algorithm. In addition, we have substantiated our earlier claims [2] that the cost of the background storage and related transfers is dominant during the system exploration. This will be shown in section VI by investigating the power in a representative data-path in H.263, including its corresponding local memories. In the rest of this paper, we have concentrated on the main storage (memory) and transfer related parts of the H.263 decoder architecture. This exploration has been done based on a power model described in section III. The nal results for the dierent steps are illustrated in Fig. 12. A brief version of this paper has been published in [6]. The numerous pointers and variables in the C code, which are used in the reference implementation, have been removed by rewriting the specication into a mixed applicative-procedural DFL description [7]. As a result, more indices and some extra signal copies and accesses are present in the code but the dependencies are much more transparent. This allows for systematic identication of the sources for potential optimisation. Moreover, this step is essential in applying a number of data storage and transfer related analysis and exploration/optimisation techniques which are collected in our high-level memory management methodology/script, partly supported by the prototype ATOMIUM environment [4]. Our strategy to obtain area and power gures is based on selecting the worst-case parameters and modes in the H.263 specication. This is valid for computing the maximal power budget and for nding the component size, which aects mainly area, but not directly for the average power consumption. Still, we believe that the maximal power consumption gives a good view on the relative importance of the dierent components in the power budget and on the savings which can be obtained. In order to have a good view on the absolute average power consumption, we require accurate statistics on the occurrence of the different cases. In the sequel, we will only give some relative indication of this. The following major algorithmic transformations and memory organisation optimisations have been performed on the DFL specication, incorporating mainly the power budget related to the access to/from the frame memories: 1. First the code was pruned to retain the operations relevant to the overall complexity of the description with respect to the number of cycles, area and power consumption. This boils down to keeping the relevant storage and accesses of the arrays storing the picture information explicitly and hiding details of arithmetic operations in function calls. As a result, the potential overhead of transfers and storage in the applicative writing style is removed when
2 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Formal verification #Cycles Memory Library #,type,ports, memory System Specification Pruning/analysis Pruned system specification Data flow trafo Loop trafo Optimized flow-graph Mem.Hierarchy Flow-graph balancing Extended/ordered flow-graph Allocation/ Assignment Signal to ports and memories Inplace Optimization Index expressions in reduced memories Address optimization Address hardware generation Netlist of Memories, Address logic w1. #r/w w2. size (POWER/AREA) C-code generation Simulation Updated flow-graph #Cycles HBB lib Data-path/ control synthesis Fig. 1. ATOMIUM script for storage/communication optimisation in the specication to be used for simulation and hardware/software synthesis. interpreted eectively. 2. Several data-ow transformations have been performed. The methodology for carrying out these transformations and their eects are described in [8]. One of the major transformations results in the removal of all the border accesses used in the H.263 decoder, as discussed in subsection V-A. 3. Advanced transformations on the global function hierarchy and loop nests have been performed. These transformations have a signicant eect and will be partly discussed in subsection V-B. They are also essential to enable the application of the further exploration steps. 4. In order to further exploit locality of access and data reuse, extra memory hierarchy levels have been incorporated (see subsection V-C). For the P mode, this step has been especially eective in the \overlapped-block motion compensation" (OBMC) mode which has the largest power consumption. We will only show the principle involved in this optimisation as depicted in Fig. 6. For the B pictures and for the combination with computation, we will show more details. 5. Finally we have performed actual memory allocation and in-place mapping to determine the detailed memory organisation for the frame memories and some of the smaller intermediate memories. This step will be discussed in Section V-D. It has a large eect on the required area, which is reduced by almost a factor 2, with only a very limited increase in the power budget. II. H.263 video decoding H.263 is a draft recommendation for video coding for narrow telecommunication channels at < 64kbit/s [9]. The coding/decoding is a block based algorithm that exploits spatial and temporal redundancy. Three standard video formats are used in conjunction with H.263, called QCIF, Sub-QCIF, and CIF. A QCIF picture has pixels, represented by 9 11 macroblocks. Each macroblock has six blocks of 8 8 pixels. This is due to the (4:2:0) decimation of chrominance values. The picture that serves as the reference for prediction is called the P-picture. From the past P-picture, a future P-picture is predicted. This is called the forward P prediction. Interpolation between past and future P-pictures yield Bidirectional B-pictures (see Fig. 2). A PB-frame consists of two pictures : a P-picture, which is predicted from last decoded P-picture, and a B-picture, which is predicted from last decoded P-picture and the P- picture currently being decoded. Parts of the B-picture may be bidirectionally predicted from the past and future P-pictures. For PB-frames the coding mode intra (I) implies the P-blocks are intra coded, and the B-blocks are inter coded with prediction as for an inter block. A decoder can be in one of the three modes; I, P, or PB mode. Two extensions are orthogonal to the P and the PB modes: the unrestricted motion vector extension allows motion vectors pointing outside the frame, whereas in overlapped block motion compensation (OBMC), 4 extra motion vectors are used to compensate motion. When we refer in this paper to the P or PB mode, we assume that both extensions are in use. Hence, the P and the PB mode refer to two modes that are most energy consuming. Past P-picture Forward B Forward P B-picture Backward B Future P-picture Fig. 2. Forward P, forward B and backward B predictions. III. Power model Time For data intensive applications, such as video decoding, data transfers dominate the power consumption. Therefore the primary design goal is to reduce memory transfers
3 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH between large frame memories and datapaths. The cost of a data transfer is a function of the memory size, memory type, and the access frequency F real. F real is dened as the real number of accesses per second and not the clock frequency. When there is a clock tick and the memory is not accessed, it is assumed that memory is in power-down mode. This assumption holds for most modern low-power RAMs [10]. The memory itself is characterised by the number of ports, words, bits, and the aspect ratio of the layout. We make use of an accurate but proprietary power model from Texas Instruments for the power exploration. In this paper, only the number of transfers per frame/picture (directly related to F real ) will be discussed. Data-paths R W R e g f R W 54,912w 9bit 1-port SRAM edgeframe?k,?bit?-port buffers 38,016w 9bit 1-port SRAM 76032R W Shared memory space 38,016w 9bit 1-port SRAM IV. The reference design To obtain an acceptable reference, we have counted the number of transfers to the arrays, that hold the past P, future P, and B pictures, in the Telenor C implementation [11]. These numbers depend on the mode of reconstruction. The ow of data using all extensions is depicted in Fig. 3 using thin lines. The order of computation of pictures P T-1, Pext T-1, Pnew T-1, P T, B1 T, B2 T, and B T is shown in the gure. The dashed lines indicate that pictures Pnew T-1 and P T are stored in array signal whereas the pictures B1 T, B2 T and B T are stored in. The rectangles with a bold border are the nal pictures after decoding. The thick line indicates that oldframe and are interchanged after each decoded frame. In the C code, this is done by swapping the pointers to oldframe and. This reects that main memory is not being wasted in the C implementation, because the simulation speed is also aected by this. The corresponding abstract organisation for the continuous P mode is shown in Fig. 4. Fig. 4. Reference memory organisation and worst-case number of transfers while decoding 1 PB-frame TABLE I Frame memory transfers per picture in the reference C code old/new frame Mode Worst-Case Average edgeframe Mode Worst-Case Average Mode Worst-Case Average P T-1 oldframe Add Border Pext T edgeframe Decoding Forward P Forward B In-place switch Pnew T B1T P T Backward B B2T Fig. 3. Data ow for decoding PB-frames. Table I lists worst-case and average number of transfers to the frame memories per picture. The worst-case numbers are obtained analytically and not by simulation. This means that whenever code is executed conditionally, the conditions are assumed to branch to the most energy consuming option. For example, it is assumed that every macroblock is motion compensated. This is clearly a worst-case assumption. Mode 1 uses prediction with overlapped motion compensation and unrestricted motion vectors. In Mode 2, bidirectional prediction is also included, introducing the extra transfers to the. An acceptable value for the average case was obtained by counting the accesses when simulating the decoding of the video stream suz This stream contains 75 QCIF frames (which B T corresponds with 2.5 second real-time video). The length of stream depends on the encoding options and is listed in Table II together with the compression ratio. The C code, used as a reference, is optimised to run as fast as possible on a given workstation. It is indeed not optimised for ecient implementation. But it is a typical documentation that implementation groups start with. Mostly a direct mapping of the algorithm and the datastructures is made on a block diagram and each block is then optimised locally and implemented eciently. This is why most video decoders have a large external memory (with high bandwidth) that holds 3 complete images. Also, the memory interface typically becomes a big component of the design. When compared to similar state of the art video decoders [12], [13], [14], [15], [16], [17], [18], [19] which also include memory for three pictures, we believe that the access numbers to these memories will be comparable if the bi-directional mode is considered. If bi-directional is not used, the accesses will be comparable to the accesses corresponding to the P-picture.
4 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH TABLE II Number of words of 8 bit in test streams suz and the compression factor, Mode d stands for \unrestricted motion vectors", Mode f stands for \Overlapped Block Motion Compensation (OBMC)" and Mode g stands for \bidirectional prediction" Mode Length Compression d f fd g gd gf gfd V. System exploration for power We will now give a summary for each of the main optimisations listed in section I. They have been applied starting from the initial mixed applicative and procedural DFL description of the video decoder. The high-level memory management methodology/script, partly supported by the prototype ATOMIUM environment [4], has been applied here. A. Removal of the border In order to accommodate for unrestricted motion vectors, a complete border consisting of 44 macroblocks is added to the oldframe. It is not just lled with zeroes but with real data copies in a non-trivial way [9]. To simplify the control ow in the original C, these data are duplicated in the frame signals (cfr. edgeframe in gure 3) prior to the actual image manipulations, resulting in storage and transfer overhead both for reading and writing. Actually, this requires an extra pixels to be stored. To reduce this overhead, the dependences on the border data can be checked by (manifest) conditions on the position of the pixels to be read. Now, instead of storing and accessing duplicate data, the original pixels are read at the boundary row/columns of the image frame. These guarding conditions have to be implemented in the controller and will steer the data-path. Usually, also some local buering is necessary then. Several stages of optimisation are possible here, starting from a simple context-independent caching of the border data (which is apparently selected in most industrial designs) up to a heavily optimised context-dependent checking and reduced local buering. All of these alternatives make the storage for the extra borders superuous but only the latter option allows to remove all redundant picture accesses. If we assume that on the average this reduction is about a factor 8 1, we have an extra reduction 1 This is realistic because the motion vectors are in practice relatively small and not uniformly distributed between 0 and 15. If we assume a uniform range of (-7,7) for the motion, the number of pixels residing outside the oldframe border is on the average about 1/8 of in read accesses of about This is however datadependent. In terms of power consumption, our detailed models show that we obtain a saving of between 24% and 27% by the combined eect of less transfers and a reduced frame size. The gain in power comes at the price of an increased complexity of the code and the size of the controller though. Still, as the power consumed in the controller is quite small, the trade-o for power is clearly in favour of transforming all the border accesses. The resulting data ow without the border is depicted in Fig. 5. B. Loop and function restructuring to combine backward and forward P and B predictions, and In the Telenor C code, decoding a PB-frame starts with decoding the incoming bit stream and results in a P and a B macroblock containing dierential errors in the frequency domain and motion information (Task 1 in Fig. 5). Next, the forward P and B predictions are performed based on the motion information (Task 2 and 3). This yields a forward predicted B and P block. Both blocks are directly stored in a picture called B1 T and Pnew T respectively. Then, the decoded P macroblock is transformed to the spacial domain by means of an (Task 4). This P macroblock is added together with the macroblock read back from picture PnewT and stored in picture P T (Task 5). This picture, together with the macroblock stored in B1 T, is needed to do the backward B prediction (Task 6). The result is stored in picture B2 T. Also this picture is corrected with dierential errors similar as for the P-picture (Task 7 and 8). P T oldframe Decoding 1 Forward P 3 Forward B 2 Pnew T B1T 5 4 P T 7 Backward B 6 B2T 8 B T Fig. 5. Data ow after removal of the border and ordering of the main tasks while inter decoding a PB-frame The gure also illustrates that instead of just producing a P and B picture once, the pictures are read and written several times in the original description. More precisely, since B1 T, B2 T and B T are stored in, every pixel in it is three times written and two times read. Pnew T and P T are stored in, hence this picture memory is written twice and read once. Probably, the reason for this was to simplify the algorithmic description eort for the system designers. As an illustrative example, we will now explain how global loop transformations and complex restructuring of the hierarchy in the code allows to create more locality of access based on the pseudo code for task 2 and 3 in Fig. 5. The code for Forward P (Task 3) is : the pixels inside
5 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH if "Advanced prediction mode" { for (comp=0; comp < 4; comp) { P[T][0] = recon_comb_obmc(p[t-1][0],,comp); P[T][1] = recon_comp(p[t-1][1], ); P[T][2] = recon_comp(p[t-1][2], ); else { In this code the rst index of signal P indicates the previous ([T-1]) or the current ([T]) prediction. The second index is to access the luminance information Y (index [0]) or the chrominance information Cr (index [1]) and Cb (index [2]). The code for Forward B (Task 2) is : if "Advanced prediction mode" { if "Overlapped motion compensation" { B[T][0] = recon_comp(p[t-1][0],,comp); else { B[T][1] = recon_comp(p[t-1][1], ); B[T][2] = recon_comp(p[t-1][2], ); else { C. Memory hierarchy related optimisations This step involves data ow transformations which introduce extra transfers between the dierent memory levels and which are used mainly to reduce the power cost. In particular, temporary values { to be assigned to a \lower level" { are added wherever a signal in a \higher" level is read more than once. The duplicate read is then performed on the lower level temporary signal. The same can happen in the other direction for writes. If a signal assigned to a higher level is composed of several contributions, it does not make sense to update the nal result always in the higher level memory. Instead, it is usually better to perform the composition from the contributions consecutively (or in a close ordering) in a lower level (or several levels in more complex situations) and then directly transfer the - nal result to the higher level. The principle of this buering process on the macro-block access is shown in Fig. 6. P T-1 oldframe 3x3 Buffer Reconctruction P T Remark that in this code the recon comp function is issued once instead of four times the recon comp obmc function in the comp loop. When merging task 2 and 3, we get : Fig. 6. Principle of 3 3 (macro-)block buering between old=edgeframe and motion compensation routines, which act on central block to be stored in. if "Advance prediction mode" { for (comp=0; comp < 4; comp) { P[T][0] = recon_comp_obmc(p[t-1][0],,comp); if (mode == MODE_INTER4V) { B[T][0] = recon_comp(p[t-1][0],,comp); else { P[T][1] = recon_comp(p[t-1][1], ); B[T][1] = recon_comp(p[t-1][1], ); P[T][2] = recon_comp(p[t-1][2], ); B[T][2] = recon_comp(p[t-1][2], ); else { In Fig. 5, the reconstructed macroblocks are rst written to Pnew T-1. After reconstruction, a correction is performed with the dierential errors resulting from the transform. This process is shown in the following pseudo code : for (macroblocknr=1 to 99) { (Pblock[][], Bblock[][]) = Decoding(); (reconf_pblock[][], reconf_bblock[][]) = Forward_B&P_prediction(P[T-1]); "Store Preconblock[][] in Pnew[T]"; _Pblock[][] = (Pblock[][]); "Add _Pblock[][] to Pnew[T] and store in P[T]"; This can be rewritten as : The recon comp, recon comp new and the recon comp obmc functions perform dierent kinds of motion compensations depending on the motion vectors. Moreover, they are not embedded in the same loop scopes. However, with complex code restructuring it is possible to combine them. This class of optimisations is crucial because they enable further optimisation on the memory hierarchy, which is discussed hereafter in subsection V-C. for (macroblocknr=1 to 99) { (Pblock[][], Bblock[][]) = Decoding(); (recon_pblock[][], recon_bblock[][]) = Forward_B&P_prediction(buffer); _Pblock[][] = (Pblock[][]); new_pblock[][] = recon_pblock[][] _Pblock[][]; "Store new_pblock[][] in P[T]";
6 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH In this code, a buer, called buffer, is created. Now the task Forward B&P prediction will read several times from buffer instead of from picture P T-1. This results in large power savings. Also, an extra block, called new Pblock is introduced. Therefore extra copies from this block to picture P T are necessary. Since these extra copies are situated at a lower memory hierarchy level, the global power consumption due to the memory transfers will still be reduced. We now apply this principle in case of decoding an PB-frame, like depicted in Fig. 5. Instead of storing the forward predicted macroblock in Pnew T, the result is stored in a buer called reconf Pblock. This buer is corrected with the dierential errors that result from the, called Pblock, to yield the nal forward P prediction new Pblock. This nal block together with the forward predicted B macroblock, called reconf Bblock, and motion information is required for the backward reconstruction. Instead of reading from B1 T and P T, the backward reconstruction is based on the buers reconf Bblock and new Pblock. The result in is stored in buer reconb Bblock. instead of B2 T. Similar as for the P macroblock, it is corrected with the differential errors in Bblock to yield the nal B prediction block new Bblock. Extra transfers are introduced to transfer the nal block to the picture B T stored at the highest level. The resulting data ow, when decoding a PB-frame after introducing extra memory hierarchy, is shown in Fig. 7. The pictures with a bold border are to be stored at the \highest" level of hierarchy. This level corresponds to the memory with the biggest transfer cost. Other smaller buers, such as buffer, Pblock, Bblock, reconf Pblock, reconf Bblock, new Pblock, reconb Bblock and new Bblock are stored at \lower" levels. In addition to this, many other similar optimisations have been performed for the dierent decoder modes (especially in the \overlapped -overlappedblock motion compensation" mode). P T-1 oldframe Buffer Decoding 1 Forward P&B 2 Pblock reconf_pblock reconf_bblock 4 Bblock 3 _Pblock new_pblock 6 5 Backward B _Bblock reconb_bblock 7 new_bblock B T P T Fig. 7. Modied dataow after introducing memory hierarchy. one that needs most memory, corresponds with the dependence in the following pseudo code : for (y=1; y <= 11; y) { for (x=1; x <= 9; x) { Read from block (y-1,x-1) from P[T-1]; Predict block (y,x); Write block (y,x) in P[T]; Subtracting the consumption address (y? 1) 11 x? 1 from the production address y 11 x : [y 11 x]? [(y? 1) 11 x? 1] = 11 1 = 12 yields the numbers of blocks in the diagonally shaded intersection of Fig. 8 (Right). When introducing a least-in-rstout (LIFO) buer of 121 macro blocks, picture P T-1 and picture P T can be stored inplace in only 1 picture called old/ : for (y=1; y <= 11; y) { for (x=1; x <= 9; x) { Read from block (y-1,x-1) from old/; Predict block (y,x); Pop block from buffer and store at position (y-1,x-1); Push block (y,x) in the buffer; The buer mechanism can be implemented by calculating the block addresses modulo 13 [20]. This results in a snake-like operation of the buer, as illustrated in Fig. 9. The resulting data ow is depicted in Fig. 10 where the 13 macroblocks are shown in the pipeline of the snake. Implementing this dataow, taking into account extra possibilities of memory hierarchy optimisations, leads to the detailed organisation depicted in Fig. 11. This in-place optimisation does not aect the number of background transfers but signicantly reduces the total size of the background memories. This will result in a smaller area cost. The combined picture is only 13 macroblocks larger than one of the two pictures required initially. oldframe old/ D. In-place storage of past and future P-pictures In Fig. 8 (Left), the light gray area covers the portion of oldframe that is still needed for reconstruction. In Fig. 8 (Middle), the gray area covers pixels that already are calculated. Array signals oldframe and can be stored in-place if the shaded area in Fig. 8 (Right) is stored in a buer. Decoding the macroblock in row y and column x uses data that is stored in blocks with coordinates (y 1; x 1) in oldframe. The worst-case dependence, the Fig. 8. Put oldframe and in-place E. Relative impact of the dierent exploration steps Fig. 12 gives an overview of the relative power consumption for each optimisation stage for the PB mode. This
7 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH New frame New active 16 8 Block (y-1,x-1) at T-1 Block (y,x) at T Block (y,x) at T-1 Fig. 9. Principle of buer update mechanism Old active Old frame recon error Fig. 11. Detailed memory organisation of in-place Header decoding old/ Decoding 2 1 Forward P&B Backward B 7 B T Relative power H263 reference (PB) Remove border Combine back/forward Combine Local cache OBMC Interpolation buffer Inplace Fig. 10. Data ow after in-place storage of oldframe and. is the power consumed by the picture memories when decoding bidirectional B frames with unrestricted motion vectors and overlapped motion compensation. The power consumption is normalised with respect to the power consumption for the reference design. The power gures are based on worst-case assumptions. The bar chart shows that when all optimisations discussed in this paper are applied, the power consumption is reduced by a factor of 9. Similar optimisations as reported in this article have been applied on the H.263 decoder running in the P mode. They reduced the worst-case power consumption by a factor of 7. The main dierence with optimisations for the PB mode is the absence of optimisations related to bidirectional coding. The optimisations described in this paper has been partially applied on the public domain C code from Telenor Research [11]. Simulation of the resulting C code, while decoding stream suz for all the dierent decoding modes, shows that the average power consumption of the memories reduced to 57%. Remark that in these simulations, the extra transfers due to an extra layer of hierarchy are taken into account. VI. Power consumption of A DFL specication of algorithm [21] was simulated and veried using Mentor's DSP Station. This specication was synthesised using our datapath synthesis tool Dolphin [22]. Dolphin synthesis has resulted in a VHDL Fig. 12. Relative power in continuous PB mode for each optimisation stage of H.263 frame access netlist which was mapped to the TI TGC2000 library using Synopsys' Design Analyzer and converted to the Verilog netlist. A net capacitance le for the design was generated using Synopsys' Design Analyzer tool. The Verilog netlist has been simulated for toggle counts using Cadence's Verilog-XL simulator. The average power consumption for the datapath was then computed using the net capacitance and the toggle count les. The computation of power consumption of the memory unit in the uses the power modelling described in section III. Table III lists the average power consumption of the for the 3 video formats used in conjunction with H.263. The computation uses a frame rate of 30 frames per second to derive the smallest possible frequency of operation for the datapath and memory units. This module is the most arithmetic dominant in the entire H.263 specication. Still, it has been shown that the power for a direct realisation with commercial logic synthesis and gate array circuits is about 2 orders of magnitude smaller than the power in the combined unoptimised frame accesses. So, initially ignoring this arithmetic in the system exploration is motivated. VII. Conclusion We believe that the results described in this paper clearly substantiate the validity of the proposed high-level memory
8 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH TABLE III power consumption for the three video formats, Pt = Normalised average power for the, Pm = Normalised average memory power at frequency fm, fm = Smallest frequency (in MHz) at which memory can be operated, Pd = Normalised average datapath power in mw at frequency fd, fd = Smallest frequency (in MHz) at which datapath can be operated. Format Pt Pm fm Pd fd QCIF Sub-QCIF CIF management methodology for data-dominated applications like the H.263 video decoder. They show the very promising results on power reduction which can be obtained by system level exploration, i.e. up to a factor of 9 of maximal power in the worst-case mode. The same methodology has also been applied to an MPEG-2 [23] video decoder, a medical back-projector [24] and a segment protocol processor of the common adaptation layer of ATM [24]. Also there signicant savings have been obtained. In the future, we will also explore the possibilities of these optimisations on a mixed software-hardware platform, as provided e.g. by the TI cdsp approach which supports a single-chip heterogeneous design consisting of embedded cores, sea-of-gate logic and embedded memories. Acknowledgements: We gratefully acknowledge the discussions with our colleagues and especially the contributions of E. De Greef, M. Eyckmans, P. Six and S. Wuytack. This research was partly sponsored by Texas Instruments Incorporated, Dallas, Texas. References [1] R-H.Yan, L.Terman (eds.), \Special issue on Low Power Electronics," Proceedings of the IEEE, vol. 83, no. 4, pp. 495{700, April [2] F.Catthoor, F.Franssen, S.Wuytack, L. Nachtergaele, and H. De Man, \Global Communication and Memory Optimizing Transformations for Low Power Signal Processing Systems," in VLSI Signal Processing VII, Jan Rabaey, Paul M. Chau, and John Eldon, Eds., New York, October 1994, IEEE workshop on VLSI signal processing, pp. 178{187, IEEE Press. [3] Sven Wuytack, Francky Catthoor, Lode Nachtergaele, and Hugo De Man, \Power exploration for data dominated video applications," in nternational Symposium on Low Power Electronics and Design, Monterey, California, August 1996, pp. 359{364. [4] Lode Nachtergaele, Francky Catthoor, Florin Balasa, Frank Franssen, Eddy De Greef, Hans Samsom, and Hugo De Man, \Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems," in Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing, San Jose, California, August 1995, pp. 82{87. [5] Eddy De Greef, Francky Catthoor, and Hugo De Man, \Memory organization for video algorithms on programmable signal processors," in Computer Design : VLSI in Computers & Processors. IEEE, October 1995, pp. pp. 552{557. [6] Lode Nachtergaele, Francky Catthoor, Bhanu Kapoor, Stefan Janssens, and Dennis Moolenaar, \Low power storage exploration for h.263 video decoder," in VLSI Signal processing, November [7] P.N. Hilnger, J. Rabaey, D. Genin, C. Scheers, and H. De Man, \DSP specication using the Silage language," in Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Alburquerque, NM, April 1990, pp. 1057{1060. [8] F. Catthoor, M. Janssen, L. Nachtergaele, and H. De Man, \System-level data-ow transformations for power reductionin image and video processing," in Proceedings of the International Conference on Electronics, Circuits and Systems, Rhodos, Greece, October 1996, IEEE, pp. 1025{1028. [9] Karel Rijkse, \Video coding for narrow telecommunication channels at < 64 kbit/s," Tech. Rep., Telenor R & D, [10] Kiyoo Itoh, Katsuro Sasaki, and Yoshinobu Nakagome, \Trends in low-power ram circuit technologies," Proceedings of the IEEE, vol. 83, no. 4, pp. 524{543, April [11] Digital Video Coding at Telenor R & D, \Telenor's h.263 software, version 1.3," February 1995, software/. [12] Aldo Cugnini and Richard Shen, \Mpeg-2 video decoder for the digital hdtv grand alliance system," IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 748{753, August [13] T. Demura, et. al., \A single-chip mpeg2 video decoder lsi," in International Solid-State Circuits Conference. IEEE, ferbruary 1994, pp. 72{73. [14] D. Galbi, et. al., \An mpeg-1 audio/video decoder with runlength compressed antialiased video overlays," in International Solid-State Circuits Conference. IEEE, February 1995, pp. 289{ 287. [15] GEC Plessey Semiconductor, \An overview of the h.261 video compression standard and its implementation in the gps chipset," October [16] Michel Harrand, Michel Henry, Philippe Chaisemartin, Paul Mougeat, Yves Durand, Alain Tournier, Robin Wilson, Jean- Claude Herluison, Jean-Claude Langchambon, Jean-Luc Bauer, and Michel Runtz andjoseph Bulone, \A single chip videophone encoder/decoder," in International Solid-State Circuits Conference. IEEE, February 1995, pp. 292{293. [17] Toshihiro Masaki, Yasuo Morimoto, Takao Onoye, and Isao Shirakawa, \Vlsi implementation of inverse discrete cosine transformer and motion compensator for mpeg2 hdtv video decoding," IEEE Transaction on Circuit and Systems for Video Technology, vol. 5, no. 5, pp. 387{395, october [18] M. Toyokura, et. al., \A video dsp with a macroblock-levelpipeline and a simd type vector-pipeline architecture for mpeg2 codec," in International Solid-State Circuits Conference. IEEE, february 1994, pp. 74{75. [19] Shinobu Ueda, Y. Kiyose, Y. Kishida, S. Sotoda, M. Kawabata, T. Furukawa, and S. Kawabe, \Development of an mpeg2 decoder for magneto-optical disk video players," IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 521{527, august [20] J. Vanhoof, K. van Rompaey, I. Bolsens, G. Goossens, and H. De Man, High-Level Synthesis for Real-Time Digital Signal Processing, Kluwer Academic Publishers, Boston, [21] W-H Chen, C. H. Smith, and S. C. Fralick, \A fast computational algorithm for the discrete cosine transform," IEEE Transactions on Communications, pp. 1004{1009, September [22] P. Schaumont, B. Van Thournout, I. Bolsens, and H. De Man, \Synthesis of pipelined dsp accelerators with dynamic scheduling," in Proceedings of the 8 th International Symposium on System-Level Synthesis, Cannes, France, September 1995, ACM/IEEE, pp. 72{77. [23] D. Moolenaar, \System specication and storage exploration for two video compression standards," M.S. thesis, Delft University, Delft, The Netherlands, May 1996, ftp://ftp.imec.be/pub/vsdm/reports/video codec optim/ MPEG2 code optim.ps.gz. [24] F. Catthoor, L. Nachtergaele, and S. Wuytack, \Optimizing data transfers and memory for low power," accepted for publication in ASIC & EDA magazine, ftp://ftp.imec.be/pub/vsdm/reports/system lev power opt/ fc-asic eda96.ps.gz, 1997.
9 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, VOL. XX, NO. Y, MONTH Lode Nachtergaele is member of the Multimedia Image Compression Systems (MICS) group since '96. This group is part of the application group of the VLSI Systems & Design Methodology (VSDM) division of the Interuniversity Micro Electronics Center (IMEC). His current aim is to further distill an operational methodology that improves design times of embedded multimedia systems. The resulting design ow is reinjected as a stepping stone in future application challenges. Ing. Nachtergaele received his degree of Industrial Engineer in 1989 from the Katholieke Industriele Hogeschool, Oostende, Belgium. In the same year he joined IMEC starting his career in the group that worked on the Cathedral-II silicon compiler. There he was involved in the development of the Silage simulator S2C. In '92, he joined the System Exploration for Memory and Power (SEMP) group. Together with his colleagues, he worked on the ATOMIUM methodology, partly supported with prototype tools. Francky Catthoor received the engineering degree and a Ph.D. in electrical engineering from the Katholieke Universiteit Leuven, Belgium in 1982 and 1987 respectively. From September 1983 till June 1987 he has been a researcher in the area of VLSI design methodologies for Digital Signal Processing, with Prof. Hugo De Man and Prof. Joos Vandewalle as Ph.D. thesis advisors. Since 1987, he has headed several research domains in the area of high-level and system synthesis techniques and architectural methodologies, all within the VLSI Systems & Design Methodology (VSDM) division at the Inter-university Micro- Electronics Center (IMEC), Heverlee, Belgium. He is assistant professor at the EE department of the K.U.Leuven since His current research activities belong to the eld of architecture design methods and system-level exploration for power and area, mainly oriented towards memory management and global data transfer optimization. The major target application domains are real-time signal and data processing algorithms in image, video and end-user telecom applications, and data structure dominated modules in telecom networks. Both customized architectures and programmable multimedia processors are targeted. In 1986 he received the Young Scientist Award from the Marconi International Fellowship. Since 1995 he is an associate editor for the IEEE Trans. on VLSI Systems and since 1996 also for the Journal of VLSI Signal Processing. Stefan Janssens is with the System Exploration for Memory and Power group (SEMP) since '96. This group is part of the design technology group of the VLSI Systems & Design Methodology division (VSDM) of the Interuniversity Micro-Electronics Center (IMEC). He is currently focussed on the application and evaluation of the ATOMIUM and ADOPT methodologies in industrial applications. Ing. Janssens received his degree of Industrial Engineer in 1996 and joined IMEC in the same year; before he was involved with IMEC for his thesis. Dennis Moolenaar joined IMEC's Wireless Systems group since '96. This group is part of the application group of the VLSI Systems & Design Methodology (VSDM) division of the Interuniversity Micro Electronics Center (IMEC). The mission of this group is to do research in future telecom systems on silicon. Ir. Moolenaar current aim is to investigate the integration of multi-processor systems on a single chip. At the moment he is implementing a custom low power multi processor architecture for a DECT/GSM/DCS1800 multi-mode terminal. His interests are in processor architectures, multi processor systems and low power design. Ir. Moolenaar joined IMEC in '96. Before that he was involved with IMEC as a student for his internship and master thesis. Bhanu Kapoor received his B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Kanpur, India, in He received his M.S. and Ph.D. degrees in Computer Science from the Southern Methodist University, Dallas, Texas, in 1990 and 1994, respectively. He has been with the Corporate R&D labs of Texas Instruments Incorporated since His main research interests are in the areas of high performance and low power VLSI design and CAD tools, with an emphasis on algorithms and architectures for DSP applications. He is a member of the IEEE and the ACM.
Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur
Module 8 VIDEO CODING STANDARDS Version ECE IIT, Kharagpur Lesson H. andh.3 Standards Version ECE IIT, Kharagpur Lesson Objectives At the end of this lesson the students should be able to :. State the
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationPerformance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder
Performance Analysis and Comparison of 15.1 and H.264 Encoder and Decoder K.V.Suchethan Swaroop and K.R.Rao, IEEE Fellow Department of Electrical Engineering, University of Texas at Arlington Arlington,
More informationStudy and Implementation of Video Compression standards (H.264/AVC, Dirac)
Study and Implementation of Video Compression standards (H.264/AVC, Dirac) EE 5359-Multimedia Processing- Spring 2012 Dr. K.R Rao By: Sumedha Phatak(1000731131) Objective A study, implementation and comparison
More informationEfficient Motion Estimation by Fast Three Step Search Algorithms
Efficient Motion Estimation by Fast Three Step Search Algorithms Namrata Verma 1, Tejeshwari Sahu 2, Pallavi Sahu 3 Assistant professor, Dept. of Electronics & Telecommunication Engineering, BIT Raipur,
More informationVideo Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu
Video Coding Basics Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Outline Motivation for video coding Basic ideas in video coding Block diagram of a typical video codec Different
More informationH 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:
Video Compression : 6 Multimedia Systems (Module Lesson ) Summary: 6 Coding Compress color motion video into a low-rate bit stream at following resolutions: QCIF (76 x ) CIF ( x 88) Inter and Intra Frame
More informationStudy and Implementation of Video Compression Standards (H.264/AVC and Dirac)
Project Proposal Study and Implementation of Video Compression Standards (H.264/AVC and Dirac) Sumedha Phatak-1000731131- sumedha.phatak@mavs.uta.edu Objective: A study, implementation and comparison of
More informationIntroduction to Digital System Design
Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital
More informationTracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object
More informationVideo Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm
Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Nandakishore Ramaswamy Qualcomm Inc 5775 Morehouse Dr, Sam Diego, CA 92122. USA nandakishore@qualcomm.com K.
More informationEE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
More informationWhite paper. H.264 video compression standard. New possibilities within video surveillance.
White paper H.264 video compression standard. New possibilities within video surveillance. Table of contents 1. Introduction 3 2. Development of H.264 3 3. How video compression works 4 4. H.264 profiles
More informationBandwidth Adaptation for MPEG-4 Video Streaming over the Internet
DICTA2002: Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne, Australia Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet K. Ramkishor James. P. Mammen
More informationMP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu
MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose
More informationEE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
More informationHigh performance digital video servers: storage. Seungyup Paek and Shih-Fu Chang. Columbia University
High performance digital video servers: storage and retrieval of compressed scalable video Seungyup Paek and Shih-Fu Chang Department of Electrical Engineering Columbia University New York, N.Y. 10027-6699,
More informationFigure 1: Relation between codec, data containers and compression algorithms.
Video Compression Djordje Mitrovic University of Edinburgh This document deals with the issues of video compression. The algorithm, which is used by the MPEG standards, will be elucidated upon in order
More informationIntra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann
Intra-Prediction Mode Decision for H.264 in Two Steps Song-Hak Ri, Joern Ostermann Institut für Informationsverarbeitung, University of Hannover Appelstr 9a, D-30167 Hannover, Germany Abstract. Two fast
More information7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
More informationFPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL
FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL B. Dilip, Y. Alekhya, P. Divya Bharathi Abstract Traffic lights are the signaling devices used to manage traffic on multi-way
More informationEnhancing High-Speed Telecommunications Networks with FEC
White Paper Enhancing High-Speed Telecommunications Networks with FEC As the demand for high-bandwidth telecommunications channels increases, service providers and equipment manufacturers must deliver
More informationAn Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration
An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute
More informationPeter Eisert, Thomas Wiegand and Bernd Girod. University of Erlangen-Nuremberg. Cauerstrasse 7, 91058 Erlangen, Germany
RATE-DISTORTION-EFFICIENT VIDEO COMPRESSION USING A 3-D HEAD MODEL Peter Eisert, Thomas Wiegand and Bernd Girod Telecommunications Laboratory University of Erlangen-Nuremberg Cauerstrasse 7, 91058 Erlangen,
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationInternational Journal of Electronics and Computer Science Engineering 1482
International Journal of Electronics and Computer Science Engineering 1482 Available Online at www.ijecse.org ISSN- 2277-1956 Behavioral Analysis of Different ALU Architectures G.V.V.S.R.Krishna Assistant
More informationDESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL
IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com
More informationMultidimensional Transcoding for Adaptive Video Streaming
Multidimensional Transcoding for Adaptive Video Streaming Jens Brandt, Lars Wolf Institut für Betriebssystem und Rechnerverbund Technische Universität Braunschweig Germany NOSSDAV 2007, June 4-5 Jens Brandt,
More informationMobile Virtual Network Computing System
Mobile Virtual Network Computing System Vidhi S. Patel, Darshi R. Somaiya Student, Dept. of I.T., K.J. Somaiya College of Engineering and Information Technology, Mumbai, India ABSTRACT: we are planning
More informationWhat is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
More informationPerformance Oriented Management System for Reconfigurable Network Appliances
Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationAnalysis of Compression Algorithms for Program Data
Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory
More informationHow To Improve Performance Of The H264 Video Codec On A Video Card With A Motion Estimation Algorithm
Implementation of H.264 Video Codec for Block Matching Algorithms Vivek Sinha 1, Dr. K. S. Geetha 2 1 Student of Master of Technology, Communication Systems, Department of ECE, R.V. College of Engineering,
More informationA HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.
A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER Manoj Kumar 1 Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA ABSTRACT The MPEG/Audio is a standard for both
More informationDesign Cycle for Microprocessors
Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types
More informationMULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute,
PARALLEL NEURAL NETWORK TRAINING ON MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C International Computer Science Institute, Berkeley, CA 9474 Multi-Spert is a scalable parallel system built from multiple
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationJPEG Image Compression by Using DCT
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 JPEG Image Compression by Using DCT Sarika P. Bagal 1* and Vishal B. Raskar 2 1*
More informationPower Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
More informationVideo Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu
Video Coding Standards Yao Wang Polytechnic University, Brooklyn, NY11201 yao@vision.poly.edu Yao Wang, 2003 EE4414: Video Coding Standards 2 Outline Overview of Standards and Their Applications ITU-T
More informationQuality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)
Quality Estimation for Scalable Video Codec Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden) Purpose of scalable video coding Multiple video streams are needed for heterogeneous
More informationVideo Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl uhl@cosy.sbg.ac.
Video Encryption Exploiting Non-Standard 3D Data Arrangements Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl uhl@cosy.sbg.ac.at Andreas Uhl 1 Carinthia Tech Institute & Salzburg University Outline
More informationAn Embedded Based Web Server Using ARM 9 with SMS Alert System
An Embedded Based Web Server Using ARM 9 with SMS Alert System K. Subbulakshmi 1 Asst. Professor, Bharath University, Chennai-600073, India 1 ABSTRACT: The aim of our project is to develop embedded network
More informationZukang Shen Home Address: Work: 214-480-3198 707 Kindred Lane Cell: 512-619-7927
Zukang Shen Home Address: Work: 214-480-3198 707 Kindred Lane Cell: 512-619-7927 Richardson, TX 75080 Email: zukang.shen@ti.com Education: The University of Texas, Austin, TX, USA Jun. 2003 May 2006 Ph.D.,
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More informationA Tool for Multimedia Quality Assessment in NS3: QoE Monitor
A Tool for Multimedia Quality Assessment in NS3: QoE Monitor D. Saladino, A. Paganelli, M. Casoni Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia via Vignolese 95, 41125
More informationAgenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
More informationCHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationTouchstone -A Fresh Approach to Multimedia for the PC
Touchstone -A Fresh Approach to Multimedia for the PC Emmett Kilgariff Martin Randall Silicon Engineering, Inc Presentation Outline Touchstone Background Chipset Overview Sprite Chip Tiler Chip Compressed
More informationAn Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
More informationMICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
More informationAudio Coding Algorithm for One-Segment Broadcasting
Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient
More informationSecured Embedded Many-Core Accelerator for Big Data Processing
Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationA Dynamic Link Allocation Router
A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection
More informationThesis work and research project
Thesis work and research project Hélia Pouyllau, INRIA of Rennes, Campus Beaulieu 35042 Rennes, helia.pouyllau@irisa.fr July 16, 2007 1 Thesis work on Distributed algorithms for endto-end QoS contract
More informationBDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions
Insight, Analysis, and Advice on Signal Processing Technology BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Steve Ammon Berkeley Design Technology, Inc.
More informationFloating Point Fused Add-Subtract and Fused Dot-Product Units
Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,
More informationIMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD
IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD R.Dhanya 1, Mr. G.R.Anantha Raman 2 1. Department of Computer Science and Engineering, Adhiyamaan college of Engineering(Hosur).
More informationChapter 3 ATM and Multimedia Traffic
In the middle of the 1980, the telecommunications world started the design of a network technology that could act as a great unifier to support all digital services, including low-speed telephony and very
More informationAccelerating Wavelet-Based Video Coding on Graphics Hardware
Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing
More informationAN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS
International Journal of Software Engineering and Knowledge Engineering World Scientific Publishing Company AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS
More informationOscillations of the Sending Window in Compound TCP
Oscillations of the Sending Window in Compound TCP Alberto Blanc 1, Denis Collange 1, and Konstantin Avrachenkov 2 1 Orange Labs, 905 rue Albert Einstein, 06921 Sophia Antipolis, France 2 I.N.R.I.A. 2004
More informationH.264 Based Video Conferencing Solution
H.264 Based Video Conferencing Solution Overview and TMS320DM642 Digital Media Platform Implementation White Paper UB Video Inc. Suite 400, 1788 west 5 th Avenue Vancouver, British Columbia, Canada V6J
More informationSample Project List. Software Reverse Engineering
Sample Project List Software Reverse Engineering Automotive Computing Electronic power steering Embedded flash memory Inkjet printer software Laptop computers Laptop computers PC application software Software
More informationAsicBoost A Speedup for Bitcoin Mining
AsicBoost A Speedup for Bitcoin Mining Dr. Timo Hanke March 31, 2016 (rev. 5) Abstract. AsicBoost is a method to speed up Bitcoin mining by a factor of approximately 20%. The performance gain is achieved
More informationPROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 TUTORIAL OUTCOME 2 Part 1
UNIT 22: PROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 TUTORIAL OUTCOME 2 Part 1 This work covers part of outcome 2 of the Edexcel standard module. The material is
More informationKhalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska
PROBLEM STATEMENT A ROBUST COMPRESSION SYSTEM FOR LOW BIT RATE TELEMETRY - TEST RESULTS WITH LUNAR DATA Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska The
More informationPerformance Evaluation of VoIP Services using Different CODECs over a UMTS Network
Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Jianguo Cao School of Electrical and Computer Engineering RMIT University Melbourne, VIC 3000 Australia Email: j.cao@student.rmit.edu.au
More informationNetworking Remote-Controlled Moving Image Monitoring System
Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University
More informationResistors in Series and Parallel
OpenStax-CNX module: m42356 1 Resistors in Series and Parallel OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Draw a circuit
More informationThe Emerging Trends in Electrical and Computer Engineering
18-200 Fall 2006 The Emerging Trends in Electrical and Computer Engineering Hosting instructor: Prof. Jimmy Zhu; Time: Thursdays 3:30-4:20pm; Location: DH 2210 Date Lecturer Lecture Contents L01 08/31
More informationA Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
More informationEmbedded System Hardware - Processing (Part II)
12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.
More informationAn Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework
An Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework Zutao Zhu Zhenjun Li YunYong Duan Department of Business Support Department of Computer Science Department of Business
More informationHow To Design A Chip Layout
Spezielle Anwendungen des VLSI Entwurfs Applied VLSI design (IEF170) Course and contest Intermediate meeting 3 Prof. Dirk Timmermann, Claas Cornelius, Hagen Sämrow, Andreas Tockhorn, Philipp Gorski, Martin
More informationLecture 5: Gate Logic Logic Optimization
Lecture 5: Gate Logic Logic Optimization MAH, AEN EE271 Lecture 5 1 Overview Reading McCluskey, Logic Design Principles- or any text in boolean algebra Introduction We could design at the level of irsim
More informationA Tutorial On Network Marketing And Video Transoding
SCALABLE DISTRIBUTED VIDEO TRANSCODING ARCHITECTURE Tewodros Deneke Master of Science Thesis Supervisor: Prof. Johan Lilius Advisor: Dr. Sébastien Lafond Embedded Systems Laboratory Department of Information
More informationImplementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
More informationInternational Workshop on Field Programmable Logic and Applications, FPL '99
International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationChapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
More informationA Scalable Video-on-Demand Service for the Provision of VCR-Like Functions 1
A Scalable Video-on-Demand Service for the Provision of VCR-Like Functions H.J. Chen, A. Krishnamurthy, T.D.C. Little, and D. Venkatesh, Boston University Multimedia Communications Laboratory Department
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationColour Image Encryption and Decryption by using Scan Approach
Colour Image Encryption and Decryption by using Scan Approach, Rinkee Gupta,Master of Engineering Scholar, Email: guptarinki.14@gmail.com Jaipal Bisht, Asst. Professor Radharaman Institute Of Technology
More informationLow Power AMD Athlon 64 and AMD Opteron Processors
Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD
More informationParallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
More informationİSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
More informationVHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz
More informationCentral Processing Unit
Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing
More informationHow To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)
FPGA IMPLEMENTATION OF 4D-PARITY BASED DATA CODING TECHNIQUE Vijay Tawar 1, Rajani Gupta 2 1 Student, KNPCST, Hoshangabad Road, Misrod, Bhopal, Pin no.462047 2 Head of Department (EC), KNPCST, Hoshangabad
More informationVON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology
VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS WWW.VONBRAUNLABS.COM Issue #1 VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS State Machine Technology IoT Solutions Learn
More informationImplementation and Design of AES S-Box on FPGA
International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar
More informationISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7
ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems
More informationSerial port interface for microcontroller embedded into integrated power meter
Serial port interface for microcontroller embedded into integrated power meter Mr. Borisav Jovanović, Prof. dr. Predrag Petković, Prof. dr. Milunka Damnjanović, Faculty of Electronic Engineering Nis, Serbia
More informationManaging large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT
More informationNon-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
More informationWe are presenting a wavelet based video conferencing system. Openphone. Dirac Wavelet based video codec
Investigating Wavelet Based Video Conferencing System Team Members: o AhtshamAli Ali o Adnan Ahmed (in Newzealand for grad studies) o Adil Nazir (starting MS at LUMS now) o Waseem Khan o Farah Parvaiz
More information