On the Cost of Factoring RSA-1024
|
|
|
- Branden Palmer
- 10 years ago
- Views:
Transcription
1 On the Cost of Factoring RSA-1024 Adi Shamir Eran Tromer Weizmann Institute of Science Abstract As many cryptographic schemes rely on the hardness of integer factorization, exploration of the concrete costs of factoring large integers is of considerable interest. Most research has focused on PC-based implementations of factoring algorithms; these have successfully factored 530-bit integers, but practically cannot scale much further. Recent works have placed the bottleneck at the sieving step of the Number Field Sieve algorithm. We present a new implementation of this step, based on a custom-built hardware device that achieves a very high level of parallelism for free. The design combines algorithmic and technological aspects: by devising algorithms that take advantage of certain tradeoffs in chip manufacturing technology, efficiency is increased by many orders of magnitude compared to previous proposals. Using this hypothetical device (and ignoring the initial R&D costs), it appears possible to break a 1024-bit RSA key in one year using a device whose cost is about $10M (previous predictions were in the trillions of dollars). 1 Introduction The security of many cryptographic schemes and protocols depends on the hardness of finding the factors of large integers drawn from an appropriate distribution. The best known algorithm for factoring large integers is the Number Field Sieve (NFS) 1, whose time and space complexities are subexponential in the size of the composite. However, little is known about the real complexity of this problem. The evident confidence in the hardness of factoring comes from observing that despite enormous interest, no efficient factoring algorithm has been found. To determine what key sizes are appropriate for a given application, one needs concrete estimates for the cost of factoring integers of various sizes. Predicting these costs has proved notoriously difficult, for two reasons. First, the performance of modern factoring algorithms is not understood very well: their complexity analysis is often asymptotic and heuristic, and leaves large uncertainty factors. Second, even when the exact algorithmic complexity is known, it is hard to estimate the concrete cost of a suitable hypothetical large-scale computational effort using current technology; it s even harder to predict what this cost would be at the end of the key s planned lifetime, perhaps a decade or two into the future. Due to these difficulties, common practice is to rely on extrapolations from past factorization experiments. Many such experiments have been performed and published; for example, the successful factorization of a 512-bit RSA key in 1999 [5] clearly indicated 1 See [10] for the seminal works and [17] for an introduction. The subtask we discuss is defined in Section
2 the insecurity of such keys for many applications, and prompted a transition to 1024-bit keys (often necessitating software or hardware upgrades). 2 The current factorization record, obtained nearly four years later in March 2003, stands at 530 bits [1]. 3 From this data, and in light of the subexponential complexity of the algorithm used, it seems reasonable to surmise that factoring 1024-bit RSA keys, which are currently in common use, should remain infeasible for well over a decade. However, the above does not reflect a fundamental economy-of-scale consideration. While the published experiments have employed hundreds of workstations and Cray supercomputers, they have always used general-purpose computer hardware. However, when the workload is sufficiently high (either because the composites are large or because there are many of them to factor), it becomes more efficient to construct and employ custom-built hardware dedicated to the task. Direct hardware implementation of algorithms is considerably more efficient than software implementations, and makes it possible to eliminate the expensive yet irrelevant peripheral hardware found in general-purpose computers. An example of this approach is the EFF DES Cracker [7], built in 1998 at a cost of $210,000 and capable of breaking a DES key in expected time of 4.5 days using search units packed into 1536 custom-built gate array chips. Indeed, its equipment cost per unit of throughput was much lower than similar experiments that used general-purpose computers. Custom-built hardware can go beyond efficient implementation of standard algorithms it allow specialized data paths, enormous parallelism and can even use non-electronic physical phenomena. Taking advantage of these requires new algorithms or adaptation of existing ones. One example is the TWINKLE device [18, 13], which implements the sieving step of 2 Earlier extrapolations indeed warned of this prospect. 3 Better results were obtained for composites of a special form, using algorithms which are not applicable to RSA keys. the NFS factoring algorithm using a combination of highly parallel electronics and an analog optical adder. Recently, D. J. Bernstein made an important observation [3] about the major algorithmic steps in the NFS algorithm. These steps have a huge input, which is accessed over and over many times. Thus, traditional PC-based implementations are very inefficient in their use of storage: a huge number of storage bits is just sitting in memory, waiting for a single processor to access them. Most of the previous work on NFS cost analysis (with the notable exception of [21]) considered only the number of processor instructions, which is misleading because the cost of memory greatly outweighs the cost of the processor. Instead, one should consider the equipment cost per unit of throughput, i.e., the construction cost multiplied by the running time per unit of work. Following this observation, Bernstein presented a new parallel algorithm for the matrix step of the NFS algorithm, based on a mesh-connected array of processors. Intuitively, the idea is to attach a simple processor to each block of memory and execute a distributed algorithm among these processors to get better utilization of the memory. With this algorithm, and by changing some adjustable parameter in the NFS algorithm so as to minimize cost per unit of throughput rather than instruction count, Bernstein s algorithm allows one to factor integers that are 3.01 times longer compared to traditional algorithms (though only 1.17 times longer when the traditional algorithms are re-optimized for throughput cost). Subsequent works [14, 8] evaluated the practicality of Bernstein s algorithm for 1024-bit composites, and suggested improved versions that significantly reduced its cost. With these hypothetical (but detailed) designs, the cost of the matrix step was brought down from trillions of dollars [21] to at most a few dozen million dollars (all figures are for completing the task in 1 year). This left open the issue of the other major step in the Number Field Sieve, namely the sieving step. 2
3 B For 1024-bit composites it was predicted that sieving would require trillions of dollars,[21] 4 and would be impractical even when using the TWINKLE device. This article discusses a new design for a customhardware implementation of the sieving step, which reduces this cost to about $10M. The new device, called TWIRL 5, can be seen as an extension of the TWINKLE device. However, unlike TWINKLE it does not have optoelectronic components, and can thus be manufactured using standard VLSI technology on silicon wafers. The underlying idea is to use a single copy of the input to solve many subproblems in parallel. Since input storage dominates cost, if the parallelization overhead is kept low then the resulting speedup is obtained essentially for free. Indeed, the main challenge lies in achieving this parallelism efficiently while allowing compact storage of the input. Addressing this involves myriad considerations, ranging from number theory to VLSI technology. The resulting design is sketched in the following sections, and a more detailed description appears in [19]. per such prime. Each pair "! corresponds to an arithmetic progression #%$"&('")*. We are interested in identifying the sieve locations +-,-,-,./1032 that are members of many progressions 4 with large 5 : where 6 :9; < &#C D5) := >-?.@#A for some small constant E. It is permissible to have small errors in this threshold check; in particular, we round all logarithms to the nearest integer. For each that F;Gexceeds the threshold, we also need to find the set. H of progressions that contribute to 6 We shall concentrate on 1024-bit composites and a particular choice of the adjustable NFS parameters, withij2k,l2nm52- +OQP and RTS(,VUMW2-ŸX. We need to perform Z[]\(,^_M)2-Ÿ` such sieving tasks, called sieve lines, that have different (though related) inputs. 6 The numerical values that appear below refer to this specific parameter choice. 2 Context 2.1 The Sieving Task The TWIRL device is specialized to a particular task, namely the sieving task which occurs in the Number Field Sieve (and also in its predecessor, the Quadratic Sieve). This section briefly reviews the sieving problem, with many simplifications. The inputs of the sieving problem are (sieve line width), (threshold) and a set of pairs where the are the prime numbers smaller than some factor base bound. There is, on average, one pair 4 [15] gave a lower bound of about $160M for a one-day effort. This disregarded memory, but is much closer to our results since the new device greatly reduces the amortized cost of memory. 5 TWIRL stands for The Weizmann Institute Relation Locator. 2.2 Traditional Sieving The traditional method of performing the sieving task is a variant of Eratosthenes s algorithm for finding primes. It d proceeds as follows. An array of accumulators acb is initialized to. Then, the progressions e are considered one by one, and for each f the indices H B are calculated and the value &#C+D5 is added to every d such acb. Finally, d the array is scanned to find the values where acb g. The point is that when looking at a specific h its members can be enumerated very efficiently, so the amortized cost of a &#C D B contribution is low. When this algorithm is implemented on a PC, we cannot apply it to the full range i +-,-,-,/j02 since 6 In fact, for each sieve line we need to perform two sieves: a rational sieve and an algebraic sieve (see Section 3.3). The parameters given here correspond to the rational sieve, which is responsible for most (two thirds) of the device s cost. 3
4 there would not be enough RAM to store accumulators. Thus, the range is broken into smaller chunks, each of which is processed as above. However, if the chunk size is not much larger than then most progressions make very few contributions (if any) to each chunk, so the amortized cost per contribution increases. Thus, a large amount of memory is required, both for the accumulators and for storing the input (that is, the list of progressions). As Bernstein [3] observed, this is inherently inefficient because each memory bit is accessed very infrequently. 2.3 Sieving with TWINKLE An alternative way of performing the sieving was proposed in the TWINKLE device [18, 13], which operates as follows. Each TWINKLE device consists of a wafer containing numerous independent cells, each in charge of a single progression. After initialization the device operates synchronously for clock cycles, corresponding to the sieving range. At clock cycle, the cell in charge of the progression e emits the value B &#C D ) iff 4. The values emitted at each clock cycle are summed to obtain 6 :9, and if this sum exceeds the threshold then the integer is reported. This event is announced back to the cells, so that the F values of the pertaining is also reported. The global summation is done using analog optics: to emit the value B &#C 5, a cell flashes an internal LED whose intensity is proportional to B &#C. A light sensor above the wafer measures the total light intensity in each clock cycle, and reports a success when this exceeds a given threshold. The cells themselves are implemented by simple registers and ripple adders. To support the optoelectronic operations, TWINKLE uses Gallium Arsenide wafers (alas, these are relatively small, expensive and hard to manufacture compared to silicon wafers, which are readily available). Compared to traditional sieving, TWIN- KLE exchanges the roles of space and time: Traditional TWINKLE Sieve locations Space (accumulators) Time Progressions Time Space (cells) 3 TWIRL 3.1 Approach The TWIRL device follows the time-space reversal of TWINKLE, but increases the throughput by simultaneously processing thousands of sieve locations at each clock cycle. Since this is done with (almost) no duplication of the input, the equipment cost per unit of throughput decreases dramatically. Equivalently, we can say that the cost of storing the huge input is amortized across many parallel processes. As a first step toward TWIRL, consider an electronic variant of TWINKLE which still operates at a rate of one sieve location per clock cycle, but does so using a pipelined systolic chain of electronic adders. 7 Such a device would consist of a long unidirectional bus, 10 bits wide, that connects millions of conditional adders in series. Each conditional adder is in charge of one progression ; when B activated by an associated timer, it adds the value &#C D ) to the bus. At time, the -th adder handles sieve location 0. The first value to appear at the end of the pipeline is 6 %, followed by 6 2 -,-,-,K 6 %, one per clock cycle. See Fig. 1(a). The parallelization is obtained by handling the sieve range +-,-,-, / 0 2 in consecutive chunks of length Ÿ. 8 To do so, the bus is thickened by a factor of and now contains logical lines, where each line carries 2- -bit numbers. At time, the -th stage of the pipeline handles the sieve locations 0 G F, 7 This variant was considered in [13], but deemed inferior in that context. 8 applies to the rational sieve. For the algebraic sieve (see Section 3.3) we use even higher parallelism:. 4
5 F p 1 t p 1 ( t ) s+0 ( t ) s+1 ( t ) s+2 ( t ) s+1( s 1) p 2 t 1 p 2 ( t 1) s+0 ( t 1) s+1 ( t 1 ) s+2 ( t 1 ) s+1( s 1) p 3 t 2 p 3 ( t 2) s+0 ( t 2) s+1 ( t 2 ) s+2 ( t 2 ) s+1( s 1) p 4 t 3 p 4 ( t 3) s+0 ( t 3) s+1 ( t 3 ) s+2 ( t 3 ) s+1( s 1) (a) p 5 t 4 (b) p 5 ( t 4) s+0 ( t 4) s+1 ( t 4 ) s+2 ( t 4 ) s+1( s 1) Figure 1: Flow of sieve locations through the device in (a) a chain of adders and (b) TWIRL. +-,-,-, The first values to appear at the end of the pipeline are 6 % -,-,-,K 6 0_2 ; they appear simultaneously, followed by successive disjoint groups of size, one group per clock cycle. See Fig. 1(b). We now have to add the B &#C(D contributions to all lines in parallel. Obviously, the naive solution of duplicating all the adders times gains nothing in terms of equipment cost per unit of throughput. If we try to use the TWINKLE-like circuitry without duplication, we encounter difficulties in scheduling and communicating the contributions across the thick bus: the sieve locations flow down the bus (in Fig. 1(b), vertically), and the contributions should somehow travel across the bus (horizontally) and reach an appropriate adder at exactly the right time. Accordingly, we replace the simple TWINKLE-like cells by other units that perform scheduling and routing. Each such unit, called a station, handles some small portion of the progressions; its interface consists of bus input, bus output, clock and some circuitry for loading the inputs. The stations are connected serially in a pipeline, and at the end of the bus (i.e., at the output of the last station) we place a threshold check unit that produces the device output. While the function of all the stations is identical, we use a heterogeneous architecture that employs three different station designs the W come in a very large range of sizes, and different sizes involve very different design tradeoffs. The progressions are partitioned into stations according to the size of their intervals, and the optimal station design is employed in each case. Due to space limitations, we describe only the most important station design, which is used for the majority of progressions. The other station designs, and additional details, can be found in [19]. 3.2 Large primes For every prime smaller than is(,vu M 2- X there is (on average) one progression. Thus the majority of progressions have intervals 5 that are much larger than Ÿ, so they produce B &#C D ) contributions very seldom. For 1024-bit composites there is a huge number (about 2K, NM 2-Ÿ` ) of such progressions; even with TWINKLE s simple emitter cells, we could not fit all of them into a single wafer. The primary considera- 5
6 tion is thus to store these progressions as compactly as possible, while maintaining a low cost per contribution. Indeed, we will succeed in storing these progressions in compact DRAM-type memory using only sequential (and thus very efficient) read/write access. This necessitates additional support logic, but its cost is amortized across many progressions. This efficient storage lets us fit 4 independent 1024-bit TWIRL devices (each of which is Ÿ times faster than TWINKLE) into a single 30cm silicon wafer. The station design for these progressions (namely, those with U(,V\+M 2- P ) is shown in Fig. 2 (after some simplifications). The progressions are partitioned into K memory banks, so that each bank contains many (between S#\ and \(,V\5M2- P ) progressions. Each progression is stored in one of these memory banks, where at any given time it is represented by an event of the form ), whose meaning is: at time, send a contribution to bus line. Each memory bank is connected to a specialpurpose processor, which continuously processes these events and sends corresponding emissions of the form add to bus line to attached delivery lines, which span the bus. Each delivery line acts as a shift register that carries the emissions across the bus. Additionally, at every intersection between a delivery line and a bus line there is a conditional adder 9 ; when the emission reaches its destination bus line B, the value &#CGD ) is added to the value that passes through that point of the bus pipeline at that moment. Thus, sieve locations are (logically) flowing down the bus at a constant velocity, and emissions are being sent across the bus at a constant velocity. To ensure that each emission hits its target at the right time, the two perpendicular flows must be perfectly 9 We use carry-save adders, which are very compact and have low latency (the tradeoff is that the bus lines now use a redundant representation of the sums, which doubles the bit-width of the bus). synchronized, which requires a lot of care. However, the benefit is that the cost per contribution is very low: most of the time the event is stored very compactly in the form of an event in DRAM; then, for a brief moment it occupies the processor, and finally it occupies a delivery line for the minimum possible duration the amount of time needed to travel across the bus to the destination bus line. It is the processor s job to ensure accurate scheduling of emissions. 10 The ideal way to achieve this would be to store the events in a priority queue that is sorted by the emission time. Then, the processor would simply repeat the following loop: Pop the next event from the priority queue. 2. Wait until time - and then send an emission to the delivery line, addressed to bus line. 3. Compute the next event W of this progression, and push it into the priority queue. Standard implementations of priority queues (e.g., the heap data structure) are unsuitable for our purposes, due to the passive nature of standard DRAM and high latency. First, the processor would need to make a logarithmic number of memory accesses at each iteration. Worse yet, these memory accesses occur at unpredictable places, and thus incur a significant randomaccess overhead. Fortunately, by taking advantage of the unique properties of the sieving problem we can get a good approximation of a priority queue that is highly efficient. Briefly, the idea is as follows. The events are read sequentially from memory (step 1 above) in a cyclic order, at constant rate. When the new calculated event 10 In the full design [19], there is an additional component, called a buffer, which performs fine-tuning and load balancing. 11 For simplicity, here we ignore the possibility of collisions. 6
7 Memory Processor Memory Processor Memory Processor Figure 2: Schematic structure of a (simplified) largish station. is written back to memory (step 3 above), it is written to a memory address that will be read just before its schedule time. Since both and the read schedule are known, this memory address is easily calculated by the processor. In this way, after a short stabilization period the processor always reads imminent events, 12 exactly as desired. Each iteration now involves just one sequential-access read operation and one randomaccess write operation. In addition, it turns out that with appropriate choice of parameters we can cause the write operations to always occur in a small window of activity, just behind the read head. We may thus view the K memory banks as closed rings of various sizes, with an active window twirling around each ring at a constant linear velocity. Each such sliding window is handled by a fast SRAM-based cache, whose content is swapped in and out of DRAM in large blocks. This allows the bulk of events to be held in DRAM. Better yet, now the only interface to the DRAM memory is through the SRAM cache; this allows elimination of various peripheral circuits that are needed in standard DRAM. 12 Collisions are handled by adding appropriate slacks. 3.3 Other Highlights Other station designs. For progressions with small interval ( U(,V\3M 2- P ), it is inefficient to continuously shuttle the progression state to and from passive memory. Thus, each progression is handled by an independent active emitter cell that includes an internal counter (similarly to TWINKLE). An emitter serves multiple bus lines, using a variant of the delivery lines described above. Using certain algebraic tricks, these cells can be made very compact. Two such station designs are used: for the progressions with medium-sized intervals, many progressions share the same delivery lines (since emissions are still not very frequent); this requires some coordination logic. For very small intervals, each emitter cell has its own delivery line. Diaries. Recall that in addition to finding the sieve locations whose contributions exceed the threshold, we also want to find the sets F 7 of relevant progressions. This is accomplished by adding a diary to each processor (it suffices to handle the progressions with large interval). The diary is a memory bank which records every emission sent by the processor and saves it for a few thousand clock cycles the depth of the bus pipeline. By that time, the 7
8 corresponding sieve location has reached the end of the bus and the accumulated sum of logarithms 6 was checked. If the threshold was exceeded, this is reported to all processors and the corresponding diary entries are recalled and collected. Otherwise, these diary entries are discarded (i.e., their memory is reused). Cascading the sieves. In the Number Field Sieve we have to perform two sieving tasks in parallel: a rational sieve whose parameters were given above, and an algebraic sieve which is usually more expensive since it has a large value of. However, we succeed in greatly reducing the cost of the algebraic sieve by using an even higher parallelization factor for it: S#\(^. This is made possible by an alteration that greatly reduces the bus width: the algebraic sieve needs only to consider the sieve locations that passed the rational sieve, i.e., about one in 5,000. Thus we connect the input of the algebraic sieve to the output of the rational sieve, and in the algebraic sieve we replace the thick bus and delivery lines by units that consider only the sieve locations that passed the rational sieve. We now have a much narrower bus containing only 32 lines, though each line now carries both a partial sum (as before) and the index of the sieve location to which the sum belongs. Logically, the sieve locations still travel in chunks of size, so that the regular and predictable timing is preserved. Physically, only the relevant locations (at most 32) in each chunk are present; emissions addressed to the rest are discarded. Fault tolerance. The issue of fault tolerance is very important, as silicon wafers normally have multiple local faults. When the wafer contains many independent small chips, one usually discards the faulty ones. However, for 1024-bit composites TWIRL is a waferscale design and thus must operate in the presence of faults. All large components of TWIRL can be made fault-tolerant by a combination of techniques: routing around faults, post-processing and re-assigning faulty units to spare. We can tolerate occasional transient faults since the sieving task allows a few errors; only the total number of good values matters. 4 Cost Based on the detailed design, we estimated the cost and performance of the TWIRL device using today s VLSI technology (namely, the +,L2S m process used in many modern memory chips and CPUs). While these estimates are hypothetical, they rely on a detailed analysis and should reasonably reflect the real cost. It should be stressed that the NFS parameters assumed are partially based on heuristic estimates. See [19] for details bit composites. Recall that to implement NFS we have to perform two different sieving tasks, a rational sieve and an algebraic sieve, which have different parameters. Here, the rational sieve (whose parameters were given above) dominates the cost. For this sieve, a TWIRL device requires 2U( K #$c$ of silicon wafer area, so we can fit 4 of them on a 30cm silicon wafer. Most of the device area is occupied by the large progressions (and specifically, SY^ of the device is used for their DRAM banks). For the algebraic sieves we use a higher parallelization factor, RS#\(^. One algebraic TWIRL device requires #U( K # #$ $ of silicon wafer area a full wafer and here too most of the device is occupied by the largish progressions (the DRAM banks occupy ). The devices are assembled in clusters that consist of rational TWIRLs (occupying two wafers) and 1 algebraic TWIRL (on a third wafer), where each rational TWIRL has a unidirectional link to the algebraic TWIRL over which it transmits 2\ bits per clock cycle. A cluster handles a full sieve line in KS#\(^ clock cycles, i.e., S#S(, seconds when clocked at 1GHz. The full sieving involves Z sieve lines, which would require 2 years when using a single cluster (after a heuristic that rules out S#S of the sieve locations). At a cost of K\(, M (assuming $5,000 per wafer), we can build 2 independent TWIRL clusters that, when run in parallel, would complete the sieving task within 1 year. 8
9 After accounting for the cost of packaging, power supply and cooling systems, adding the cost of PCs for collecting the data and leaving a generous error margin, 13 it appears realistic that all the sieving required for factoring 1024-bit integers can be completed within 1 year by a device that costs (2- to manufacture. In addition to this per-device cost, there would be an initial NRE cost on the order of K\K (for design, simulation, mask creation, etc.). 512-bit composites. Since 512-bit factorization is well-studied [18, 13, 8] and backed by experimental data [5], it is interesting to compare 512-bit TWIRL to previous designs. We shall use the same 512-bit parameters as in [13, 8], though they are far from optimal for TWIRL. With 2K/Ÿ\, we can fit 79 TWIRLs into a single silicon wafer; together, they would handle a sieve line in seconds (compared to 1.8 seconds for TWINKLE wafer and 0.36 seconds for a full wafer using mesh-based design of [8]). Thus, in factoring 512-bit composites the basic TWIRL design is about 2K K # times more cost effective than the best previously published design [8], and L2- # times more cost effective than TWINKLE. Such a wafer full of TWIRLs, which can be manufactured for about $5,000 in large quantities, can complete the sieving for 512-bit composites in under 10 minutes (this is before TWIRL-specific optimizations which would halve the cost, and using the standard but suboptimal parameter choice). 768-bit composites. For 768-bit composites, a single wafer containing 6 TWIRL clusters can complete the sieving in 95 days. This wafer would cost about $5,000 to manufacture one tenth of the RSA-768 challenge prize [20]. Unfortunately these figures are not easy to verify experimentally, nor do they provide a quick way to gain $45,000, since the initial NRE cost remains $10M-$20M. 13 It is a common rule of thumb to estimate the total cost as twice the silicon cost; to be conservative, we triple it. 5 Conclusions It has been often claimed that 1024-bit RSA keys are safe for the next 15 to 20 years, since when applying the Number Field Sieve to such composites both the sieving step and the linear algebra step would be unfeasible (e.g., [4, 21] and a NIST guideline draft [16]). However, these estimates relied on PC-based implementations. We presented a new design for a custombuilt hardware implementation of the sieving step, which relies on algorithms that are highly tuned for the available technology. With appropriate settings of the NFS parameters, this design reduces the cost of sieving to about $10M (plus a one-time cost of $20M). Recent works [14, 9] indicate that for these NFS parameters, the cost of the matrix step is even lower. Our estimates are hypothetical and rely on numerous approximations; the only way to learn the precise costs involved would be to perform a factorization experiment. However, it is difficult to identify any specific issue that may prevent a sufficiently motivated and well-funded organization from applying the Number Field Sieve to 1024-bit composites within the next few years. This should be taken into account by anyone planning to use a 1024-bit RSA key. Acknowledgment. This work was inspired by Daniel J. Bernstein s insightful work on the NFS matrix step, and its adaptation to sieving by Willi Geiselmann and Rainer Steinwandt. We thank the latter for interesting discussions of their design and for suggesting an improvement to ours. We are indebted to Arjen K. Lenstra for many insightful discussions, and to Robert D. Silverman, Andrew bunnie Huang, Michael Szydlo and Markus Jakobsson for valuable comments and suggestions. Early versions of [12] and the polynomial selection programs of Jens Franke and Thorsten Kleinjung were indispensable in obtaining refined estimates for the NFS parameters. 9
10 References [1] F. Bahr, J. Franke, T. Kleinjung, M. Lochter, M. Böhm, RSA-160, announcement, Apr. 2003, zimmerma/ records/rsa160 [2] Daniel J. Bernstein, How to find small factors of integers, manuscript, 2000, [3] Daniel J. Bernstein, Circuits for integer factorization: a proposal, manuscript, 2001, [4] Richard P. Brent, Recent progress and prospects for integer factorisation algorithms, proc. CO- COON 2000, LNCS , Springer- Verlag, 2000 [5] S. Cavallar, B. Dodson, A.K. Lenstra, W. Lioen, P.L. Montgomery, B. Murphy, H.J.J. te Riele, et al., Factorization of a 512-bit RSA modulus, proc. Eurocrypt 2000, LNCS , Springer-Verlag, 2000 [6] Don Coppersmith, Modifications to the number field sieve, Journal of Cryptology, 6(3) , 1993 [7] Electronic Frontier Foundation, DES Cracker Project, [8] Willi Geiselmann, Rainer Steinwandt, A dedicated sieving hardware, proc. PKC 2003, LNCS , Springer-Verlag, 2002 [9] Willi Geiselmann, Rainer Steinwandt, Hardware to solve sparse systems of linear equations over GF(2), proc. CHES 2003, LNCS, Springer- Verlag, to appear. [10] Arjen K. Lenstra, H.W. Lenstra, Jr., (eds.), The development of the number field sieve, Lecture Notes in Math. 1554, Springer-Verlag, 1993 [11] Arjen K. Lenstra, Bruce Dodson, NFS with four large primes: an explosive experiment, proc. Crypto 95, LNCS , Springer- Verlag, 1995 [12] Arjen K. Lenstra, Bruce Dodson, James Hughes, Wil Kortsmit, Paul Leyland, Factoring estimates for a 1024-bit RSA modulus, proc. Asiacrypt 2003, LNCS, Springer-Verlag, to appear. [13] Arjen K. Lenstra, Adi Shamir, Analysis and optimization of the TWINKLE factoring device, proc. Eurocrypt 2002, LNCS , Springer- Verlag, 2000 [14] Arjen K. Lenstra, Adi Shamir, Jim Tomlinson, Eran Tromer, Analysis of Bernstein s factorization circuit, proc. Asiacrypt 2002, LNCS , Springer-Verlag, 2002 [15] Arjen K. Lenstra, Eric R. Verheul, Selecting cryptographic key sizes, Journal of Cryptology, 14(4) , 2002 [16] NIST, Key management guidelines, Part 1: General guidance (draft), Jan. 2003, tkkeymgmt.html [17] Carl Pomerance, A Tale of Two Sieves, Notices of the AMS, , Dec [18] Adi Shamir, Factoring large numbers with the TWINKLE device (extended abstract), proc. CHES 99, LNCS , Springer-Verlag, 1999 [19] Adi Shamir, Eran Tromer, Factoring large numbers with the TWIRL device, proc. Crypto 2003, LNCS 2729, Springer-Verlag, 2003 [20] RSA Security, The new RSA factoring challenge, web page, Jan. 2003, challenges/factoring/ [21] Robert D. Silverman, A cost-based security analysis of symmetric and asymmetric key lengths, Bulletin 13, RSA Security, 2000, bulletins/bulletin13.html 10
The Quadratic Sieve Factoring Algorithm
The Quadratic Sieve Factoring Algorithm Eric Landquist MATH 488: Cryptographic Algorithms December 14, 2001 1 Introduction Mathematicians have been attempting to find better and faster ways to factor composite
An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method
An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method Jens Franke 1, Thorsten Kleinjung 1, Christof Paar 2, Jan Pelzl 2, Christine Priplata 3, Martin Šimka4, Colin Stahlke
Integer Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 [email protected] March 16, 2011 Abstract We give
Factoring Large Numbers with the TWINKLE Device (Extended Abstract) Adi Shamir Dept. of Applied Math. The Weizmann Institute of Science Rehovot 76100, Israel [email protected] Abstract The current
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II Mohammedia-Casablanca,
CS 6290 I/O and Storage. Milos Prvulovic
CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction
Factoring. Factoring 1
Factoring Factoring 1 Factoring Security of RSA algorithm depends on (presumed) difficulty of factoring o Given N = pq, find p or q and RSA is broken o Rabin cipher also based on factoring Factoring like
FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING
FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.
Factorization of a 1061-bit number by the Special Number Field Sieve
Factorization of a 1061-bit number by the Special Number Field Sieve Greg Childers California State University Fullerton Fullerton, CA 92831 August 4, 2012 Abstract I provide the details of the factorization
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
SECURITY IMPROVMENTS TO THE DIFFIE-HELLMAN SCHEMES
www.arpapress.com/volumes/vol8issue1/ijrras_8_1_10.pdf SECURITY IMPROVMENTS TO THE DIFFIE-HELLMAN SCHEMES Malek Jakob Kakish Amman Arab University, Department of Computer Information Systems, P.O.Box 2234,
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Data Storage - II: Efficient Usage & Errors
Data Storage - II: Efficient Usage & Errors Week 10, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004, 03.03.2005 based upon slides by Pål Halvorsen, 12.3.2002. Contains slides from: Hector Garcia-Molina
Factoring integers and Producing primes
Factoring integers,..., RSA Erbil, Kurdistan 0 Lecture in Number Theory College of Sciences Department of Mathematics University of Salahaddin Debember 4, 2014 Factoring integers and Producing primes Francesco
Primality Testing and Factorization Methods
Primality Testing and Factorization Methods Eli Howey May 27, 2014 Abstract Since the days of Euclid and Eratosthenes, mathematicians have taken a keen interest in finding the nontrivial factors of integers,
' DEC SRC, 130 Lytton Avenue, Palo Alto, CA 94301, U.S.A
On the factorization of RSA-120 T. Denny1, B. Dodson2, A. K. Lenstra3, M. S. Manasse4 * Lehrstuhl Prof. Buchmann, Fachbereich Informatik, Universitit des Saarlandes, Postfach 1150, 66041 Saarbriicken,
Computer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin
Faster deterministic integer factorisation
David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers
Trends in NAND Flash Memory Error Correction
Trends in NAND Flash Memory Error Correction June 2009 Eric Deal Cyclic Design Introduction NAND Flash is a popular storage media because of its ruggedness, power efficiency, and storage capacity. Flash
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Attaining EDF Task Scheduling with O(1) Time Complexity
Attaining EDF Task Scheduling with O(1) Time Complexity Verber Domen University of Maribor, Faculty of Electrical Engineering and Computer Sciences, Maribor, Slovenia (e-mail: [email protected]) Abstract:
EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada [email protected]
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 11 Block Cipher Standards (DES) (Refer Slide
Price/performance Modern Memory Hierarchy
Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion
MATH 168: FINAL PROJECT Troels Eriksen. 1 Introduction
MATH 168: FINAL PROJECT Troels Eriksen 1 Introduction In the later years cryptosystems using elliptic curves have shown up and are claimed to be just as secure as a system like RSA with much smaller key
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
AsicBoost A Speedup for Bitcoin Mining
AsicBoost A Speedup for Bitcoin Mining Dr. Timo Hanke March 31, 2016 (rev. 5) Abstract. AsicBoost is a method to speed up Bitcoin mining by a factor of approximately 20%. The performance gain is achieved
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...
Yaffs NAND Flash Failure Mitigation
Yaffs NAND Flash Failure Mitigation Charles Manning 2012-03-07 NAND flash is one of very few types of electronic device which are knowingly shipped with errors and are expected to generate further errors
路 論 Chapter 15 System-Level Physical Design
Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS
Notes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
Chapter 2 Data Storage
Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical
PowerPC Microprocessor Clock Modes
nc. Freescale Semiconductor AN1269 (Freescale Order Number) 1/96 Application Note PowerPC Microprocessor Clock Modes The PowerPC microprocessors offer customers numerous clocking options. An internal phase-lock
A Factoring and Discrete Logarithm based Cryptosystem
Int. J. Contemp. Math. Sciences, Vol. 8, 2013, no. 11, 511-517 HIKARI Ltd, www.m-hikari.com A Factoring and Discrete Logarithm based Cryptosystem Abdoul Aziz Ciss and Ahmed Youssef Ecole doctorale de Mathematiques
18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES
A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the
Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK
Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK It has been my pleasure to give two presentations
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Factorizations of a n ± 1, 13 a < 100
Factorizations of a n ± 1, 13 a < 100 Richard P. Brent Computer Sciences Laboratory, Australian National University GPO Box 4, Canberra, ACT 2601, Australia e-mail: [email protected] and Herman J. J.
FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.
3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.
Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1
Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic
Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: [email protected] How to go
Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems
Fault Tolerance & Reliability CDA 5140 Chapter 3 RAID & Sample Commercial FT Systems - basic concept in these, as with codes, is redundancy to allow system to continue operation even if some components
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering
DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan
Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan Abstract AES is an encryption algorithm which can be easily implemented on fine grain many core systems.
Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Performance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
Cryptography and Network Security
Cryptography and Network Security Spring 2012 http://users.abo.fi/ipetre/crypto/ Lecture 3: Block ciphers and DES Ion Petre Department of IT, Åbo Akademi University January 17, 2012 1 Data Encryption Standard
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
Implementation and Design of AES S-Box on FPGA
International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar
Introduction to Digital System Design
Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital
Developing MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2015/16 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
A3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray [email protected] www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
Lecture 9 - Message Authentication Codes
Lecture 9 - Message Authentication Codes Boaz Barak March 1, 2010 Reading: Boneh-Shoup chapter 6, Sections 9.1 9.3. Data integrity Until now we ve only been interested in protecting secrecy of data. However,
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28
Arithmetic algorithms for cryptology 5 October 2015, Paris Sieves Razvan Barbulescu CNRS and IMJ-PRG R. Barbulescu Sieves 0 / 28 Starting point Notations q prime g a generator of (F q ) X a (secret) integer
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS
TECHNOLOGY BRIEF August 1999 Compaq Computer Corporation Prepared by ISSD Technology Communications CONTENTS Executive Summary 1 Introduction 3 Subsystem Technology 3 Processor 3 SCSI Chip4 PCI Bridge
Smooth numbers and the quadratic sieve
Algorithmic Number Theory MSRI Publications Volume 44, 2008 Smooth numbers and the quadratic sieve CARL POMERANCE ABSTRACT. This article gives a gentle introduction to factoring large integers via the
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation
Authentication requirement Authentication function MAC Hash function Security of
UNIT 3 AUTHENTICATION Authentication requirement Authentication function MAC Hash function Security of hash function and MAC SHA HMAC CMAC Digital signature and authentication protocols DSS Slides Courtesy
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
Flash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
SPINS: Security Protocols for Sensor Networks
SPINS: Security Protocols for Sensor Networks Adrian Perrig, Robert Szewczyk, J.D. Tygar, Victor Wen, and David Culler Department of Electrical Engineering & Computer Sciences, University of California
A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters
A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection
