Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization Abstrat Parameter variations ause high yield losses due to their large impat on iruit delay. In this paper, we propose the use of so-alled soft-edge flip-flops as an effetive way to mitigate these yield losses. Soft-edge flip-flops have a small window of transpareny (ranging from.25-3 FO4) instead of a hard edge, allowing limited yle stealing on ritial paths, and thus ompensating for delay variations. By enabling time borrowing, soft-edge flip-flops allow random delay variations to average out aross multiple logi stages. In addition, they address small amounts of delay imbalane between logi stages, further maximizing the frequeny of operation. We develop a library of softedge flip-flops with varying amounts of softness. We show that the power and area overhead of soft-edge flip-flops grows diretly with the amount of softness. We then propose a statistially aware flip-flop assignment algorithm that maximizes the gain in timing yield while minimizing the inurred power overhead. Experimental results on a wide range of benhmark iruits show that the proposed approah improves the mean delay by 1.9-22.3% while simultaneously reduing the standard deviation of delay by 1.9-24.1% while inreasing power by a small amount (.3-2.8%). 1 INTRODUCTION Modern CMOS proesses suffer from large variations in transistor parameters suh as length and threshold voltage [1]. As the iruit delay has a strong dependene on these parameters, it shows a signifiant spread due to proess variations. This an result in substantial yield losses due to timing failures in the iruit, as nominally sub-ritial paths may now beome ritial. Saling trends point to an overall inrease in the variations with tehnology saling. This inrease an be attributed to a number of fators, suh as inreasing diffiulty of manufaturing ontrol, inrease in atomi sale randomness like variation in dopant profile of the transistor hannel, and introdution of new systemati variation generating mehanisms [2]. Hene, there is a need for extensive design and modelling in order to address the variations. A signifiant amount of researh has foussed on proess variation aware analysis. One suh approah is orner based stati timing methodology [3]. However, traditional orner based design approah mostly leads to overly pessimisti guardbanding [4]. Moreover, while global variations an be approximated by onsidering appropriate orner ases, it does not provide a statistial way of modelling the variations aross dies. Also, as variations are inreasing, number of proess orners to be onsidered for true auray is beoming too large for omputational effiieny. A seond approah, namely, statistial stati timing analysis (SSTA) has emerged as an alternative analysis approah. While muh work has been done on analysis using SSTA [5, 15], there has been a limited work to aount for variability in iruit Vivek Joshi, David Blaauw, Dennis Sylvester University of Mihigan, Ann Arbor, MI vivekj@umih.edu optimization[6, 7, 8, 13, 16]. In [7], the authors extend deterministi optimization using gate sizing approah to the statistial domain, while in [9], authors used geometri programming to address the problem. Most of the optimization in this area uses gate sizing as a solution. Retiming and so-alled useful lok skew assignment has been onsidered as an effetive way to balane out the path delays and maximize the frequeny of operation. In the past, this has been formulated as a linear programming problem [1]. However, with the exeption of [11], these solutions are deterministi in nature and do not onsider the effet of proess variations. Useful skew assignment also results in a large number of paths pushed to the very edge of satisfying the timing requirements. Hene, even a slight variation in delay on these ritial paths an ause a timing based failure. This makes useful skew assignment more effetive in improving nominal delay than for improving yield. Furthermore, useful skew assignment is typially performed on an individual flip-flop basis, whih is diffiult in pratie. In the results of this paper, we show that performing useful skew assignment for lusters of flip-flops an address lok skew, but is ineffetive for addressing proess variation and iruit imbalane. Another approah to address delay variations in the iruit is to use lath based design. Lathes have no hard boundary and are transparent for half a lok period. This means that there an be variations in the data arrival time and still the orret data would be aptured by the lath. In [12], the authors present lok sheduling for lathes in order to improve yield onsidering the delay variations. However, lath based design has its own limitations. The most important problem with lath based design is the need for two separate loks, whih means signifiant power and area overhead. Furthermore, generating two non-overlapping loks an be diffiult in high performane design. Other problems with lath based design inlude serious hold time violation issues due to the large period of transpareny, and diffiulties in designing with lathes using standard EDA tools. The goal of our work is to use the onept of averaging and yle stealing, while maintaining the onventional flipflop based paradigm. In this paper we propose the use of soft-edge flip-flops as a useful method to address proess variation as well as limited iruit imbalane through time borrowing and averaging aross stages. The key idea is to delay the lok edge of the master lath so as to reate a window of transpareny instead of a hard boundary for apturing the data. As shown in Figure 1, the transpareny window allows yle stealing in order to ompensate for the differene in delays of the two paths resulting from either iruit imbalane or from loal random proess variations. Soft-edge flip-flops utilize the fat that

Setup Setup Hard edge Hard edge Logi FF Logi FF Logi Mean Delay = 15ps Soft Edge FF Logi Mean Delay = 1ps Stage slower than nominally Transparent window aptures delayed signal Time borrowing Setup met Figure 1. Transpareny window ompensating for delay variation. random variations average out along a iruit path, and by allowing averaging aross stages they redue the sensitivity of the design towards proess variations. By reating a window of transpareny, we inrease the hold time of the flipflop. However, as the window is small, the resulting inrease in hold time is not large enough to ause hold time violations. This is verified for all the test iruits as disussed later in the results setion. The softness omes at a ost of power. Hene, we designed a library of soft-edge flip-flops with varying amounts of softness in order to analyze the power overhead involved. A variation aware algorithm was then used to statistially optimize yield with minimum power overhead. When applied to a wide variety of benhmark iruits synthesized, plaed and routed with an industrial library, the resulting improvement in the mean delay ranged from 1.9-22.3%, for a power overhead of about 1.8% on average. The orresponding improvement in µ +3 σ value for the delay distribution ranged from 4.8-23.6%, whih gives an estimate of improvement in timing yield. The yield data was obtained using Monte Carlo simulation. The remainder of this paper is organized as follows. Setion 2 disusses the proposed methodology while Setion 3 desribes the soft-edge flip-flop assignment tehnique. The atual design of the soft-edge flip-flops is disussed in Setion 4. Setions 5 and 6 give the experimental results and onlusions, respetively. 2 PROPOSED METHODOLOGY Figure 2 shows a sample setup to demonstrate the effetiveness of soft-edge flip-flops in addressing proess variations. The setup onsists of two stages of logi(logi1 and Logi2) separated by a soft-edge flip-flop whose softness is varied from to 1ps in steps of 2 ps. Logi1 has a nominal delay of 15 ps, and Logi2 has a nominal delay of 1 ps. Eah logi stage has a delay distribution with a global variation of 5%( σ / µ ) and another 5% of random variation ( σ / µ ). In the absene of softness the ritial delay is the maximum of the two delays. However, on introduing softness in the flip-flop, for ases where the path preeding the flip-flop (Logi1) is ritial, the softness is used to borrow time from the next stage (Logi2) to average out the delays. This leads to the shift in the mean delay as softness is inreased. If skew assignment was done, it would have assigned the flip-flop a skew of 25 ps, so as to balane out the nominal delays. And for eah sample, the ritial delay would have been the maximum of the two delays. With standard D flip-flop, the distribution of ritial delay has a mean of 159.8 ps, and a standard deviation of 68.4 ps. Frequeny Mean Critial Delay (in ps) 165 16 155 15 145 14 135 13 7 6 5 4 3 2 1 2 4 6 8 1 Softness (in ps) 8 9 1 11 12 13 C riti a l D e la y (in p s ) O rigina l W ith S o ftne ss = 8 ps W ith ske w a ss ignm e nt Figure 2. Setup for demonstrating the effetiveness of soft-edge flipflops and the orresponding plots of mean v/s softness and delay distribution for softness of and 8 ps and for skew assignment. As shown in the Figure 2, the mean dereases as softness is inreased and beomes almost onstant after a softness of 8 ps. This is beause almost all the ases involving the path preeding the soft-edge flip-flop being ritial, have been averaged out by time borrowing to the maximum limit allowed by the slak available. Beyond this point, inreasing softness has little effet on the mean. The value of mean for softness of 8 ps is 13.7 ps with a standard deviation of 63.6 ps. This means that the mean of the delay improved by 29.1 ps whih is a fairly large fration of the initial standard deviation (about 42.5%). For the ase of skew assignment, the mean delay is 153.1 for an improvement of only 6.7 ps with a standard deviation of 66.7 ps. While skew assignment obtains a 25ps improvement in nominal delay, the statistial mean of the delay under proess variation is improved by muh less. This is aused by the balaning of delay paths in skew assignment, whih makes the design more sensitive to proess variation and ounter ats the improvement in nominal delay. The delay distribution urves plotted for softness of and 8ps along with the distribution for the ase of skew assignment are shown in Figure 2. Note that skew assignment shifts the delay distribution urve slightly to the left, while the shift is more signifiant in the ase with softness of 8 ps. This shift orresponds to the improvement in mean. The urve for the ase of 8 ps softness is narrower than that for no softness, and this shows up as improvement in the standard deviation. As seen in this example, there is little improvement in mean if we inrease the softness beyond 8 ps. Softness omes at a ost of power overhead, thus, there is a need for an algorithm to intelligently assign softness so as to get best possible improvements in mean for minimum power. We dis-

Frequeny amin Figure 3. Calulation of yield for a given lok frequeny from the delay distribution urve. uss the soft-edge flip-flop assignment tehnique to minimize power overheads in the next setion. In order to analyze the power overhead involved, a library of soft-edge flip-flops was designed. The method to design a soft edge flip-flop is to delay the master s lok. This an be done in different ways, and the designs are desribed in setion 4. 3 SOFT-EDGE FLIP-FLOP ASSIGNMENT TECHNIQUE Soft-edge flip-flops have a power overhead assoiated with them due to the additional delaying elements used to delay the lok to the master lath. The flip-flop assignment problem an thus be formulated as the minimization of power for a given yield onstraint. In theory, for a ustom design, it is possible to onstrut a soft-edge flip-flop with any arbitrary amount of softness, by proper sizing. However, for a typial standard ell based design, there will be a library of suh flip-flops with varying amounts of softness, and so our goal in this ase is to assign flip-flops available in a library. Thus, the design variable whih is the amount of softness, is a disrete variable, while the yield onstraint is statistial in nature. The alulation of yield from the statistial data is illustrated in Figure 3. For a given frequeny of operation, there is a ertain value of ritial delay beyond whih a iruit fails to meet the timing. This value is shown as A in the figure. So for all values of delay between amin and A, the iruit meets the timing requirements and for delays beyond that it fails. The yield in this ase is given by the area of the shaded portion divided by the total area under the urve. Thus, the yield Y is given by equation 1. Clearly, any yield onstraint of the form Y>Y o an be translated into an equivalent onstraint on the ritial delay A for that given lok period expressed as A > A. Now, we an express the flip-flop assignment problem as a simple robust integer linear programming (ILP) problem as shown below: Minimize power Y P = f(x) f(x) C ritia l D e la y A am ax = A amin amax amin fx ( ) dx -------------------------- F n f = 1 k = 1 fx ( ) dx P k Yx ( 1, x 2,, x F ) > Y subjet to: (1) (2) D CK CK CK 1 n1 n n n1 1 Figure 4. Designs for the soft-edge flip-flop. n k = 1 n n1 n Baseline n = 1 k = 1 S k x f = {, 1} 1 Design B Design A where F is the total number of flip-flops in the iruit, and n is the total number of quantization levels of softness that a flip-flop an have (softness is a disrete design variable and an be varied in steps). is the k th quantization level variable assoiated with the f th flip-flop, S k is the softness assoiated with the k th quantization level, and x f is the value of softness for the f th flip-flop. Note that softness assoiated with the first quantization level is, and this orresponds to the standard D flip-flop from the library. P k is the power onsumed by a flip-flop with softness S k. The power for the rest of the iruit remains the same, and the only overhead is due to the assignment of softness to the flip-flops. Constraint (2) is the simple yield onstraint while onstraints (3), (4) and (5) together make sure that eah flip-flop has only one value of softness assigned to it out of the n possible values orresponding to the n levels of quantization. Setup and hold time onstraints are essentially aptured in the yield onstraint, and hene they need not be expressed independently. The yield onstraint makes the problem variation aware, as yield depends on the delay distribution aused due to proess variations. There have been many advanements in effiiently solving robust integer linear programming problems. However, a typial industrial iruit onsists of several thousands of flip-flops and the runtimes for solving the orresponding ILP will most likely be very large. Hene, we use a less omplex greedy heuristi to assign softness, while the ILP is onsidered for possible future extensions. We use a greedy algorithm GREEDY_SOFTNESS( ) to solve this problem of soft-edge flip-flop assignment. The idea is to searh for the most ritial path in the iruit (onsidering proess variations), and inrease the softness of the flip-flop following the path by one step. This is to be n1 n 1 1 QN Q Slave opens Master Closes (3) (4) (5)

repeated till the yield requirements are just met, or when there eases to be any appreiable improvement upon inreasing softness. The seond riterion ensures that we stop assigning further softness if the improvement gained by that step is not appreiable, and that we stop in ase the yield target is not ahievable for the given onstraint. The pseudo ode for GREEDY_SOFTNESS( ) is as follows: GREEDY_SOFTNESS( ) 1 ontinue <-- true 2 while ontinue == true 3 find most ritial path P 4 find urrent yield Y 5 hek <-- false 6 if P ends at a flip-flop f s input 7 then inrease softness of the flip-flop f by one step 8 hek <-- true 9 find new yield Y new 1 if Y new > Y 11 then ontinue <-- false 12 else if (Y - Y new ) is not appreiable 13 then ontinue <-- false 14 if hek == true 15 then derease the softness of flip-flop f by one step As we show later in the results setion, we obtain near maximum delay improvements with small power overheads (less than 3%), by using this greedy algorithm for assigning soft-edge flip-flops. 4 SOFT-EDGE FLIP-FLOP DESIGN Figure 4 shows the design of a soft-edge flip-flop. Like a standard D flip-flop, it is onstruted using bak to bak master/slave lathes. Eah lath onsists of a tristate inverter feeding into the lath followed by a ross-oupled inverter pair. For a standard D flip-flop, at the positive edge of the lok, the tristate feeding the master loses while the tristate forming the feedbak loop opens thereby lathing the value into the master lath. At the same time, the tristate feeding the slave opens, while the feedbak tristate loses, making the slave transparent. This allows the outputs Q and QB to be evaluated based on the value lathed in the master. So, in order to be displayed orretly at the output, the data should arrive sometime before the rising edge of the lok. This time is alled the setup time and it defines the hard edge before whih the data should arrive for a standard D flipflop. In order to reate a window of transpareny (soft edge), we need to delay the lok supplied to the master lath by a small amount. The amount of delay is the period for whih both the master and the slave are transparent together. This is beause when the slave s lok makes a transition ausing it to beome transparent, the tristate feeding the master is also open as its lok is a delayed version of the slave lok. In this way, even though slave is transparent, a hange in data Power (in uw) 44 42 4 38 36 34 32 3 2 4 6 8 1 12 Softness (in ps) Figure 5. Power onsumption for the designed soft-edge flip-flops. value an still hange the state of master whih, in turn, will hange the value of output. As shown in Figure 4, signal 1 whih is fed to the master lath is the delayed version of signal whih ontrols the slave lath. For the period between the two vertial lines, slave has opened while the master has not losed and this is the period of transpareny or the amount of softness. This softness allows time borrowing if the stage following the flip-flop has a slak. As in the original D flip-flop was one inverter delayed version of n, in our new design 1 and n1 (supplied to the master) should be suh that 1 is more delayed than n1. This ensures that our design is as lose to the original as possible, if we math the rise time for the new signals to the original ones. Keeping this in mind, there are two ways of delaying the lok. One is to simply insert two more inverters after as shown in design A. Here, we need to size the inverter hain suh that n1 and n have the same rise time, while and 1 have the same rise time and this should be lose to the rise time of the orresponding signal in the standard D flipflop. Along with the risetime we need to ensure that the delay between n1 (1) and n () is the same as the value of softness required. For higher values of softness, more inverters might be added to the hain of inverters. However, for design A, n1 (1) is two inverter delayed version of n (). This means that there is a lower limit to the softness that we an assign whih is determined by the minimum delay of two inverters. For lower values of softness, we use design B, whih uses pass gates for delaying the signals. In design B as shown in the Figure 4, 1 and n1 are delayed versions of and n respetively, whih have been delayed using pass gates. By appropriate sizing muh lower values of softness are possible. In our designed library, we use design B for getting the softness of 1 ps and 2 ps, while design A is used for all other values. Figure 5 shows the power values for flip-flops with different values of softness. Note that softness of orresponds to the standard D flip-flop. Power has been measured using Hspie, assuming the swithing ativity of the data to be.1. The sudden inrease in power when we go from softness of 5 ps to 6 ps, is due to the fat that two more inverters have to be added to the inverter hain in order to get values of softness greater than 5 ps. A similar inrease an be seen when inreasing the softness from 1ps to 12ps. Table 1 gives the power and area overheads for the library of softedge flip-flops. Area overheads are omputed by laying out the soft-edge flip-flops and omparing it to the area of standard ell. The power and area overheads for the flip-flop with the largest softness (12ps) are 36.8% and 45%. How-

ever, we find that only a small fration of flip-flops need this level of softness. Hene, the power overhead was found to be a very small perentage (.3-2.8%) of the overall iruit power, as refleted in the experimental results whih are disussed in the next setion. 5 EXPERIMENTAL RESULTS The proposed soft-edge flip-flop assignment tehnique was implemented based on an industrial.13µ m CMOS tehnology. All the test iruits were synthesized, plaed and routed using ommerial tools. The test iruits ranged in size from 119-36544 gates. The physial plaement details were used to luster the flip-flops for the purpose of lustering based skew assignment. We ompare our results against skew assignment for luster sizes of 3 (large), 15 (moderate), and 5 (small). Based on industrial estimates [14], we use a swithing ativity of.1 for the primary inputs of the iruit, while measuring power. In this work, we onsider hannel length as the soure of variability. We onsider inter-die, spatially orrelated intra-die as well as random omponents of variation (5% σ / µ due to eah omponent). Table 2 summarizes the main results for the proposed approah and ompares them with lustering based skew assignment for ISCAS89 benhmark iruits and two DSP iruit implementations ( Viterbi1 and Viterbi2 ). The first three olumns give the name and size of eah test iruit. The fourth and fifth olumns give the mean and standard deviation of delay for the original test iruit without any optimization. The next three pairs of olumns give the perentage improvement in mean and standard deviation of delay, by using lustering based skew assignment tehnique for luster sizes of 3, 15 and 5. The next pair of olumns gives the perentage improvement in mean and standard deviation by using soft-edge flip-flops. Note that this is the best possible improvement whih an be ahieved by using soft edge flipflops. The last two olumns provide an estimate of how this improvement ompares with that for skew assignment tehnique for a luster size of 5. Essentially, it is the ratio of the improvement ahieved by using soft-edge flip-flops, to that ahieved by using lustering based skew assignment for luster size of 5. A luster size of 5 is diffiult to ahieve. This is beause typially at least a dozen flip-flops are driven by a single lok regenerator. The results learly show that our approah gives signifiantly better improvements in mean and standard deviation even when ompared to skew assignment for a luster size of 5, whih is diffiult to ahieve. We get improvements of up to 22.6% (8.9% on average) in the mean and up to 24.1% (1% on average) in the standard deviation of the delay, over the original iruit. As ompared to the improvement using skew assignment with a luster size of 5, our approah gives improvements of up to 8.8X (2.6X on average) in the mean and up to 9.9X (2.6X on average) in the standard deviation of the iruit delay. Figure 6 shows sample delay distribution urves of one of the larger iruits (Viterbi1) for the original iruit without any optimization, for the ase of skew assignment (luster size =5), and for the ase of using soft-edge flip-flops. As shown in Table 2, the mean delay improves by 2.3% for skew Table 1. Area and power overheads for soft-edge flip-flops. Softness power overhead area overhead 1 6.8 18 2 8.2 18 3 1.2 18 4 12.9 18 5 16. 18 6 24.1 32 7 26.2 32 8 27.2 32 9 28.3 32 1 29.5 32 12 36.8 45 assignment, and by 6.1% for the ase in whih soft-edge flipflops are used. This hange is depited in the shifting of the urves towards left. The shift is less for skew assignment (lesser improvement), and a greater shift is observed for softedge flip-flops. Also, the urves beome narrower as we move from the original iruit to skew assignment and to soft-edge flip-flop assignment. This orresponds to the improvement in the standard deviation by using the two approahes. Table 3 shows results for soft-edge flip-flop assignment using the greedy algorithm. Columns two and three give improvement ahieved in mean and express it as a fration of the best possible improvement that we an get using softedge flip-flop assignment (reported in Table 2). The next olumn gives the improvement in µ +3 σ value for the delay distribution, whih provides an estimate for the improvement in timing yield. It shows improvement of 11.9% on average. Last two olumns give the perentage power overhead and the perentage of flip-flops that were assigned some softness. Power overhead is omputed based on the extra power used for making the flip-flops soft, as ompared to the power for the original iruit, assuming a swithing ativity of.1 for the primary inputs. On an average, we get.97 of the best possible improvement in mean by using the greedy algorithm for a power overhead of 1.8%. The results show that we get very lose to the best possible improvement by using greedy algorithm, with a small power overhead. For a more omplex Frequeny 1 9 Original 8 Skew assignment 7 softness assignment 6 5 4 3 2 1 6 7 8 9 Critial Delay (in ps) Figure 6. Delay distributions for the iruit Viterbi1- for the original, skew assignment and softness assignment ases.

Table 2. Improvement in mean delay for several iruits by soft-edge flip-flop assignment as ompared to skewing. Delay (ps) No. of gates No. of flipflops Values for Original Ciruit skew assignment (Cluster Size = 3) skew assignment (Cluster Size = 15) Skew assignment (Cluster Size = 5) Softness softness ompared to that by lustering (size=5) µ σ µ σ µ σ µ σ µ σ µ σ s298 119 14 345.5 15.2.%.%.%.% 1.6% 2.8% 14.3% 6.3% 8.8X 2.2X s344 16 15 567.5 3.8.%.%.%.% 5.6% 3.5% 17.1% 11.1% 2.9X 3.2X s349 161 15 573.5 31.3.%.%.%.% 7.4% 4.4% 17.2% 14.% 2.3X 3.1X s386 159 6 351.5 15.6.6% 3.2%.6% 3.2%.6% 3.2% 2.6% 4.1% 4.6X 1.2X s4 164 21 49.5 25.4.%.%.%.%.%.% 11.5% 15.6% NA NA s42 218 16 731.4 42.4.%.% 4.8% 7.4% 5.9% 7.5% 6.8% 15.2% 1.2X 2.X s51 211 6 521.5 27.6 1.2%.4% 1.2%.4% 2.%.9% 6.5% 8.7% 3.3X 9.9X s526 194 21 488.5 25.3.%.%.%.% 15.6% 13.3% 22.6% 22.8% 1.4X 1.7X s82 289 5 531.2 28.3 1.1% 4.2% 1.1% 4.2% 1.1% 4.2% 4.1% 5.5% 3.8X 1.3X s832 287 5 537.5 28.7.%.%.%.%.%.% 3.9% 5.1% NA NA s838 446 32 1144.4 71.5 4.4%.9% 4.4%.9% 4.8% 1.1% 6.7% 3.6% 1.4X 3.4X s1196 529 18 644.5 36.3 4.9% 2.8% 6.4% 8.1% 6.8% 1.7% 16.9% 24.1% 2.5X 2.2X s1238 58 18 683.4 39. 3.7% 4.6% 3.7% 4.6% 3.7% 4.6% 9.5% 1.3% 2.5X 2.3X s1423 657 74 1688.5 99.2.3%.1%.4%.3%.5%.3% 3.1% 1.9% 6.8X 5.9X s1488 653 6 579.4 31.7.%.%.%.%.%.% 2.1% 2.7% NA NA s1585 9812 534 1456. 92.5.7%.6% 1.8% 1.9% 1.8% 1.9% 6.% 5.9% 3.4X 3.X s35932 1665 1728 48.4 24. 3.7% 5.1% 3.7% 5.1% 3.7% 5.1% 8.9% 12.2% 2.4X 2.4X s38417 1516 1636 146.6 62.8 1.8% 2.2% 3.1% 4.6% 5.% 6.4% 8.1% 11.4% 1.6X 1.8X Viterbi1 1224 4215 812.3 46.8.3%.3%.3%.3% 2.3% 3.5% 6.1% 6.1% 2.7X 1.8X Viterbi2 36544 5788 1744.3 7..2%.7%.3% 1.7%.5% 2.8% 3.3% 12.8% 6.4X 4.5X Average 1.1% 1.2% 1.6% 2.1% 3.4% 3.8% 8.9% 1.% 2.6X 2.6X approah, the small gains are likely to be overweighed by the runtime overheads. Figure 7 shows the plot of power overhead versus improvement in mean for the iruit Viterbi1. The point orresponding to the improvement obtained by the greedy algorithm has been irled in the figure. Note that beyond the irled point, inrease in power is more signifiant as ompared to the further improvement in mean that we an obtain. Figure 8 shows the orresponding distribution of softness after running the greedy algorithm on the iruit Viterbi1. Maximum softness used in this ase is 8 ps. Soft-edge flip-flops have a window of transpareny instead of a hard edge. So, it is neessary to hek for possible hold time violations. Figure 9 shows the short path slaks for all test iruits after soft-edge flip-flop assignment. Short path slak is the amount of time by whih the minimum path delay meets the hold time violation onstraint. As shown in the figure, the hold time onstraints are met omfortably, and there is substantial slak for the iruit to meet the onstraints after variations. This is expeted beause the window of transpareny is very small even for the largest value of softness (about 3FO4), and so the resulting inrease in holdtime is not large enough to ause hold time violations. 6 CONCLUSIONS In this paper, we proposed the use of assigning soft-edge flip-flops as a proess variation tolerant tehnique for fine grained balaning of a iruit, in order to improve the timing yield. Soft edge flip-flops allow time borrowing and averaging aross stages, making the design less sensitive to proess variations by taking advantage of the fat that random variations average out. We onstruted a library of soft-edge flipflop variants, with softness of up to 12 ps (about 3FO4). As soft-edge flip-flops have a power overhead assoiated with them, we used a statistially aware greedy algorithm to intelligently assign these flip-flops in order to minimize the power overhead. Experimental results show that as ompared to lustering based skew assignment tehnique, our approah provides improvements of up to 22.3% (8.6% on average) in the mean and up to 24.1% (1% on average) in the standard deviation of the delay, with a very small power overhead (.3-2.8%). power overhead 1.4 1.2 1.8.6.4.2.2.4.6.8 1 1.2 Normalized improvement in mean Figure 7. Power overhead versus improvement in mean delay for the iruit Viterbi1. Cirled point orresponds to the improvement obtained by greedy algorithm.

Frequeny 25 2 15 1 5 ACKNOWLEDGMENT The authors aknowledge the support of Gigasale Systems Researh Fous Center, one of the five researh enters funded under the Fous Center Researh Program, a Semiondutor Researh Corporation program. This work was also supported in part by the NSF and STARC. REFERENCES 1 2 3 4 5 6 7 8 9 1 12 Softness (in ps) Figure 8. Distribution of softness for the iruit Viterbi1. [1] S. Nassif, Delay variability: soures, impats and trends, Pro. of ISSCC, pp. 368-369, 2. [2] Y. Taur et al., CMOS saling into nanometer regime, Pro. of the IEEE, no. 4, pp. 486-54, 1997. [3] S.R. Nassif, A.J. Strojwas and S.W. Diretor, A methodology for worstase analysis of integrated iruits, IEEE Trans. Computer-Aided Design, vol. CAD-5, pp.14-113, Jan. 1986. [4] C. Visweswariah, Death, taxes and failing hips, Pro of DAC, pp. 343-347, June 23. [5] H. Chang and S. Sapatnekar, Statistial timing analysis onsidering spatial orrelations using a single pert-like traversal, Pro. of ICCAD, pp. 621-625, 23. [6] E. Jaobs and M. Berkelaar, Gate sizing using a statistial delay model, Pro. of DAC, pp. 283-29, 2. [7] P. Seung et al., Novel sizing algorithm for yield improvement under proess variation in nanometer tehnology, Pro. of DAC, pp. 454-459, 24. [8] M. Mani and M. Orshansky, A new statistial optimization algorithm for gate sizing, Pro. of ICCD, pp. 272-277,24. [9] J. Singh, V. Nookala, Z. Q. Luo, and S. Sapatnekar, Robust gate sizing by geometri programming, Pro. of DAC, pp. 315 32, 25. [1]X. Liu, M.C. Papaefthymiou, and E.G. Friedman, Maximizing performane by retiming and lok skew sheduling, Pro. of DAC, pp. 231-236, 1999. [11]V. Nawale and T.W. Chen, Optimal useful skew sheduling in the presene of variations using robust ILP formulation, Pro. of ICCAD, pp 27-32, 26. Table 3. Results for softness assignment using greedy algorithm Ciruit improvement in Improvemen t as a fration of best possible improveme nt in µ improvemen µ +3 σ t Perenta ge overhead in power Perenta ge of flip-flops assigned softness s298 13.8%.96 16.5% 1.2% 64.3% s344 16.9%.98 17.5% 1.8% 2.% s349 16.5%.96 19.7% 2.2% 33.3% s386 2.4%.96 7.% 1.9% 5.% s4 11.4%.98 13.3% 1.8% 23.8% s42 6.4%.94 13.2% 2.% 25.% s51 6.2%.96 1.9%.9% 33.3% s526 22.3%.99 23.6% 2.8% 66.7% s82 3.9%.94 9.6% 2.6% 8.% s832 3.7%.95 8.7% 2.6% 8.% s838 6.4%.95 11.1% 2.1% 6.% s1196 16.6%.98 19.3% 1.1% 27.8% s1238 9.3%.98 1.9% 2.7% 27.8% s1423 3.1%.98 4.8% 2.6% 23.% s1488 1.9%.93 8.6%.3% 16.7% s1585 5.9%.99 7.2% 2.7% 32.2% s35932 8.7%.98 11.5% 1.2% 9.3% s38417 8.%.99 9.7% 1% 9.5% Viterbi1 6.%.98 7.7% 1.1% 11.4% Viterbi2 3.3%.98 6.1% 1.1% 9.6% Average 8.6%.97 11.9% 1.8% 35.2% [12]A. P. Hurst and R. K. Brayton, Lath based design under proess variation, IWLS 26. [13]A. Srivastava, D. Sylvester and D. Blaauw, Statistial optimization of leakage power onsidering proess variations using dual V th and sizing, Pro. of DAC, pp. 773-778, 24. [14]N. Magen, A. Kolodny, U. Weiser, N. Shamir, Interonnet power dissipation in a miroproessor, SLIP 24. [15]H. Chang, V. Zolotov, S. Narayan, and C. Visweswariah, Parameterized blok-based statistial timing analysis with non-gaussian parameters, nonlinear delay funtions, Pro. of DAC, pp. 71-76, 25. [16]K.Chopra, S. Shah, A. Srivastava, D. Blaauw and D. Sylvester, Parametri yield maximization using gate sizing based on effiient statistial power and delay gradient omputation, Pro. of ICCAD, pp. 123-128, 25. 2 18 Short path slak (in ps) 16 14 12 1 8 6 4 2 s298 s344 s349 s386 s4 s42 s444 s51 s526 s82 s832 s838 s1196 s1238 s1423 s1488 s1585 s38417 s35932 Viterbi1 Viterbi2 Figure 9. Short path slak after soft-edge flip-flop assignment.