The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism"

Transcription

1 The enefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelis Stijn Eyeran Lieven Eeckhout Ghent University, elgiu Abstract The nuber of active threads in a ulti-core processor varies over tie and is often uch saller than the nuber of supported hardware threads. This requires ulti-core chip designs to balance core count and per-core perforance. Low active thread counts benefit fro a few big, highperforance cores, while high active thread counts benefit ore fro a sea of sall, energy-efficient cores. This paper coprehensively studies the trade-offs in ulti-core design given dynaically varying active thread counts. We find that, under these workload conditions, a hoogeneous ulti-core processor, consisting of a few highperforance SMT cores, typically outperfors heterogeneous ulti-cores consisting of a ix of big and sall cores (without SMT), within the sae power budget. We also show that a hoogeneous ulti-core perfors alost as well as a heterogeneous ulti-core that also ipleents SMT, as well as a dynaic ulti-core, while being less coplex to design and verify. Further, heterogeneous ulti-cores that power-gate idle cores yield (only) slightly better energyefficiency copared to hoogeneous ulti-cores. The overall conclusion is that the benefit of SMT in the ulti-core era is to provide flexibility with respect to the available thread-level parallelis. Consequently, hoogeneous ulti-cores with big SMT cores are copetitive highperforance, energy-efficient design points for workloads with dynaically varying active thread counts. C.. [Processor Ar- Categories and Subject Descriptors chitectures]: Parallel Architectures Keywords Chip Multi-Core Processor; SMT; Single-ISA Heterogeneous Multi-Core; Thread-Level Parallelis Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for coponents of this work owned by others than ACM ust be honored. Abstracting with credit is peritted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. Request perissions fro ASPLOS, March,, Salt Lake City, Utah, USA. Copyright c ACM //... $.. Introduction The nuber of active threads in a processor varies over tie, and is often (uch) saller than the nuber of available hardware thread contexts. This observation has been ade across different application doains. Desktop applications exhibit a liited aount of thread-level parallelis, with typically only to active threads []. Datacenter servers are often underutilized and seldoly operate near their axiu utilization; they operate ost of the tie between to percent of their axiu utilization level []. Even parallel, ulti-threaded applications do not utilize all cores all the tie. Threads ay be waiting because of synchronization priitives (locks, barriers, etc.) and ay yield the processor to avoid active spinning [6]. Finally, in a ultiprograed environent, jobs coe and go, and hence, the aount of available thread-level parallelis varies over tie. Workloads with dynaically varying active thread counts iply that ulti-core chip designs should balance core count and per-core perforance. Few high-perforance cores are beneficial at low active thread counts, while a sea of energyefficient cores are preferred at high active thread counts. The key question is what processor architecture is best able to deal with dynaically varying degrees of thread-level parallelis. A heterogeneous single-isa ulti-core with a few big cores and any sall cores [9], ight schedule threads onto the big cores in case there are few active threads, and only schedule threads on the sall cores when the nuber of active threads exceeds the nuber of big cores. A conventional hoogeneous ulti-core with Siultaneous Multi-Threading (SMT) cores [] ight schedule threads across the various cores if there are fewer active threads than cores. Each thread would then have an entire core to its disposal, and only when the nuber of active threads exceeds total core count, could one engage SMT to iprove chip throughput. Ideally, core count and size should be dynaically changed depending on the nuber of active threads, and people have proposed to fuse sall cores to bigger cores as a function of the nuber of active threads [, 7]. Deterining the appropriate processor architecture is not only iportant in the context of delivering high perforance under

2 various workload conditions, it also involves other design concerns such as power/energy as well as cost to design and verify the design, i.e., a heterogeneous and core fusion processor architecture is likely ore costly to design and verify than a hoogeneous ulti-core. This paper studies ajor ulti-core design trade-offs in the face of dynaically varying degrees of available threadlevel parallelis. Through a set of coprehensive experients, we find that a hoogeneous ulti-core with big SMT cores outperfors heterogeneous designs, under the sae power envelope, when there is a varying degree of threadlevel parallelis, for both ulti-progra and ulti-threaded workloads. The intuition is that when there are few active threads, they can be scheduled across the available big cores with a few, or even a single, SMT hardware thread contexts active, and hence achieve good single-thread perforance. We also find that a hoogeneous ulti-core with SMT perfors alost as well as a heterogeneous design that also exploits SMT, and that its perforance is also close to that of a dynaic ulti-core design, in which the configuration (nuber of big and sall cores) can change dynaically depending on the nuber of active threads. The result that the perforance of a hoogeneous ulticore with big SMT cores is coparable to a heterogeneous ulti-core design, we believe, is counter-intuitive. It is wellknown, and confired by our experiental results, that a nuber of sall cores achieve better aggregate perforance (throughput) than a high-perforance SMT core under the sae power budget. Hence, it is to be expected that overall perforance will be higher for a hoogeneous ulti-core with any sall cores as well as for a heterogeneous ulticore with a few big cores and any sall cores, when there are any active threads in the syste. However, under variable active thread workload conditions, a hoogeneous design with big SMT cores is a copetitive design point because it can ore easily adapt to software diversity, and deliver both best possible chip throughput when there are few active threads, and coparable perforance when there are any active threads. While we show that a hoogeneous ulti-core consisting of all big cores with SMT is copetitive to a heterogeneous ulti-core in ters of perforance, the latter has ore opportunities to save power by power-gating idle cores. Cores can only be switched off when there are fewer active threads than cores, resulting in fewer power-gating opportunities for configurations with fewer cores. We find however, that a heterogeneous ulti-core has only a slightly better energy-efficiency copared to a hoogeneous all-big-core configuration under variable thread-level parallelis. The overall conclusion fro this paper is that, although SMT was designed to iprove single-core throughput [], the real benefit of SMT in the ulti-core era is to provide flexibility with respect to the available thread-level parallelis. Consequently, we find that a hoogeneous ulticore with big SMT cores is a copetitive high-perforance, energy- and cost-efficient design point when the active thread count varies dynaically in the workload.. Motivation. Varying thread-level parallelis We identify at least four application doains that exhibit varying degrees of available thread-level parallelis during runtie. Multi-prograed workloads. The ost obvious reason for having a varying degree of active threads is due to ulti-prograing. Jobs coe and go, and hence the aount of thread-level parallelis varies over tie. Jobs are also scheduled out when perforing I/O (disk and network activity). Desktop applications. A recent study by lake et al. [] quantifies the aount of thread-level parallelis in conteporary desktop applications. They find the aount of threadlevel parallelis to be sall, with typically only to active threads on average, even after ten years of ulti-core processing. Server workloads. Servers in datacenters operate between to percent of their axiu utilization level ost of the tie according to arroso and Hölzle []. They found the distribution of utilization at a typical server within Google to have a peak around zero utilization and percent utilization. A ulti-core server that is underutilized iplies that there are only few active threads. Multi-threaded applications. Even ulti-threaded applications ay not have as any active threads as there are software threads at all ties during the execution. Threads ay be waiting because of synchronization due to locks, barriers, etc., and ay yield to the operating syste to avoid active spinning. Figure quantifies the nuber of active threads when running the PARSEC bencharks [] on a twenty-core processor. (We refer to Section for details on the experiental setup.) Soe bencharks have active threads ost of the tie (blackscholes, canneal and raytrace), whereas others have active threads only a sall fraction of the tie (e.g., ferret, freqine and swaptions). Soe bencharks have either one or twenty active threads (e.g., bodytrack and swaptions), others have a larger variation in the nuber of active threads (e.g., dedup, ferret and freqine). On average across all PARSEC bencharks running on cores, we find that there are active threads only half of the tie, and % of the tie, only or fewer threads are active. Note that these nubers are generated for the parallel part of the application the so-called region of interest (ROI) as it is defined for the PARSEC bencharks so the liited nuber of active threads only stes fro inter-thread synchronization during parallel execution, and is not due to other sequential code such as initialization.

3 fraction of tie % 9% 8% 7% 6% % % % % % % threads 6-9 threads - threads 6- threads threads threads threads threads thread Figure. Distribution of the nuber of active threads for the PARSEC bencharks on a twenty-core processor.. Multi-core design choices There exist three ajor ulti-core architectures: syetric or hoogeneous, asyetric or heterogeneous, and dynaic []. All cores in a hoogeneous ulti-core have the sae organization; exaples are the Intel Sandy ridge CPU [], AMD Opteron [], IM POWER7 [], etc. Each core typically ipleents Siultaneous Multi-Threading (SMT), effectively providing a any-thread architecture, e.g., an 8-core processor with SMT threads per core effectively yields a -threaded processor. A heterogeneous (or asyetric) ulti-core features one or ore cores that are ore powerful than others. In case of a single-isa heterogeneous ulti-core, there are so-called big, high-perforance cores and sall, energy-efficient cores. NVidia s Kal-El [] integrates four perforancetuned cores along with one energy-tuned core, and ARM s big.little [8] cobines a high-perforance core with a low-energy core. A dynaic ulti-core is able to cobine a nuber of cores to boost perforance of sequential code sections. Core fusion [, 7] dynaically orphs cores to for a bigger, ore powerful core. Thread-level speculation and helper threads [9, 8], in which assist-threads running on other cores help speeding up another thread, could also be viewed as a for of dynaic ulti-core. Recently, Khubaib et al. [6] propose MorphCore, which is a high-perforance out-of-order core that can orph into a any-threaded inorder core when the deand for parallelis is high.. Goal of this paper Given the background in workloads and the ulti-core design space as just described, the following key question arises: How to best design a single-isa ulti-core processor in light of varying degrees of thread-level parallelis in conteporary workloads? As entioned in the introduction, all three design options can deal with varying nubers of active threads, one way or the other. A hoogeneous ulti-core can distribute the active threads across the various cores and only activate SMT when there are ore active threads than cores. A heterogeneous ulti-core can schedule ig core Mediu core Sall core Frequency.66GHz.66GHz.66GHz Type Out-of-Order Out-of-Order In-Order Width RO size 8 N/A Func. units int, ld/st int, ld/st int, ld/st ul/div ul/div ul/div FP FP FP SMT contexts up to 6 up to up to L I-cache K 6K 6K -way assoc -way assoc -way assoc L D-cache K 6K 6K -way assoc -way assoc -way assoc L cache 6K 8K 8K 8-way assoc -way assoc -way assoc Last-level cache 8M, 6-way assoc On-chip interconn..66ghz, full cross-bar DRAM 8 banks, ns access tie Off-chip bus 8G/s Table. ig, ediu and sall core configurations. the active threads on the big cores and only schedule threads on the sall cores when there are ore active threads than big cores. A dynaic ulti-core can for as any cores as there are active threads. However, without a detailed and coprehensive study, it is unclear which ulti-core architecture paradig yields best perforance under varying active thread counts. This paper, to the best of our knowledge, is the first to explore this ulti-core design space and coprehensively copare ulti-core paradigs in light of variable active thread count. Note that specialized accelerators are not in this paper s scope, as we focus on single-isa ulti-cores.. Experiental Setup. Multi-core design space To evaluate the various ulti-core paradigs in the context of varying thread counts, we use the following experiental setup. We consider three types of cores: a four-wide out-oforder core (big core), a two-wide out-of-order core (ediu core), and a two-wide in-order core (sall core), see also Table for ore details about these icroarchitectures. We copare all ulti-core architectures under the (approxiate) sae power envelope. We therefore estiate power consuption using McPAT [] (assuing n technology and aggressive clock gating). The big core consues approxiately.8 ties the power of the ediu two-wide OoO core on average, and. ties the power of the sall two-wide in-order core. We conservatively assue that one big core is power-equivalent to two ediu cores and five sall cores. We validate later in this section that these scaling factors result in an approxiately equal power consuption, even when the big cores execute six threads through SMT (which leads to higher utilization and therefore higher dynaic power consuption). When evaluating energy efficiency in Section 7, we assue idle cores are power gated.

4 We keep total on-chip cache capacity constant when exploring the ulti-core design space, in order to focus on the ipact of core types and organization, and not cache capacity. This iplies that we have to set the private cache size of the ediu core two ties saller copared to the big core, and five ties for the sall core, see also Table. (We pick nubers that are powers of two or just in between two powers of two). The last-level cache () is shared across all cores, and has the sae size for all ulti-core configurations (8M). The on-chip network is a full crossbar between all cores and the shared. Although not realistic, a full crossbar ensures that the results are not skewed in favor of the few large cores configuration, which would experience less contention in the on-chip network copared to a any sall cores configuration. We use the ulti-core siulator Sniper [] enhanced with cycle-level out-of-order and in-order core odels, as well as SMT support. Total chip power budget is equivalent to big cores or 8 ediu cores or sall cores, plus a shared. This allows for 9 possible designs, see Figure. (For the heterogeneous designs, we only consider ixes of big cores and ediu cores or sall cores; we do not consider ixes of ediu and sall cores). In the reainder of the paper, these designs are referred to as,,,,,,, and, as indicated in the figure., and are hoogeneous ulti-cores (all cores of the sae type), while the others are heterogeneous. With SMT enabled, we assue that a big core is able to execute up to six threads; a ediu core can execute up to three threads; and a sall in-order core can execute up to two threads (using fine-grained ultithreading), so that all configurations can run up to threads. The SMT core that we siulate ipleents static RO partitioning and a round-robin fetch policy []. The average total (static plus dynaic) power consuption of the three hoogeneous configurations running threads is 6 Watt for, Watt for, and Watt for (averaged across all hoogeneous ulti-progra workloads, see later). The power consuption of the heterogeneous configurations varies between 6 and Watt. This justifies our clai that all configurations operate ore or less under the sae power envelope.. Workloads Multi-progra workloads. We consider ulti-progra workloads using the SPEC CPU 6 bencharks with their reference inputs. In order to liit the nuber of siulations, we select representative benchark-input cobinations. The selection is based on the relative perforance of the bencharks on the three core types. We evaluated all SPEC CPU 6 benchark-input cobinations on the three core designs (big, ediu and sall) and calculated relative perforance with respect to the big core. We then picked bencharks that cover the full perforance range, i.e., the bencharks that have the highest and lowest rela- s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s Figure. The nine power-equivalent ulti-core designs considered in this study (=big core, =ediu core, s=sall core). tive perforance, along with in-between bencharks picked such as to provide good coverage. For each benchark, we take a 7 illion instruction single siulation point to reduce siulation tie [6]. When running a ulti-progra workload, we stop the siulation when all of the progras have executed at least 7 illion instructions, thereby restarting progras that reached the end of the 7 illion instruction siulation point. We suarize ulti-progra perforance using the syste throughput (STP) etric [7] or weighted speedup [7], which is a easure for the nuber of jobs copleted per unit of tie. For coputing STP, we noralize against isolated execution on the big core. When reporting STP nubers averaged across a set of workloads, we use the haronic ean because STP is a rate etric (inversely proportional to tie). We also calculate average noralize turnaround tie (ANTT [7]) to show the ipact of the ulti-core design on per-progra perforance. We evaluate hoogeneous ulti-progra workloads (ultiple copies of the sae benchark) as well as heterogeneous ulti-progra workloads (different bencharks co-run). We vary the nuber of progras fro to. For the heterogeneous ulti-progra workloads, we randoly construct two-, three-, four-, etc., up to twenty-four-thread cobinations, while aking sure that every benchark is included an equal nuber of ties for all thread counts. Velasquez et al. [] show that this balanced rando sapling technique is ore representative copared to fully rando sapling. We intentionally liit the nuber of active threads to to reflect a (realistic) situation with a odest and variable thread count. Given the hardware budget of big cores, this is already a considerable nuber of threads (6 threads per core). Our results confir that at a (constantly) large thread count, a design with any sall cores is optial, but in this study we specifically target those workloads that exhibit a variable active thread count. Furtherore, we believe our results are general enough to be projected to larger hardware

5 budgets and thread counts (e.g., 8 large cores and up to 8 threads). Scheduling also plays an iportant role in ulti-progra workload perforance. A general principle that we aintain is to first schedule threads on the big core(s) in a heterogeneous design before scheduling on the sall cores. Likewise for SMT, we first distribute threads across cores before engaging SMT, e.g., when there are fewer active threads than cores, we run each thread on a separate core, but when there are ore active threads than cores, we need to co-run threads on a single core through SMT. A heterogeneous design also iplies deciding which thread to execute on which core. Siilarly, in the case of SMT, we need to decide which threads to co-run on a core, since different co-runner schedules ay have significant ipact on perforance [7]. As exploring all possible cobinations of progra schedules is infeasible because of siulation tie considerations, we use offline analysis for deterining the best possible schedule. We run each benchark on each of the different core types in isolation, and use this analysis to steer application-to-core apping for the heterogeneous design points for best perforance. Likewise for SMT, we run all possible two-, three-, etc., up to six-progra cobinations on the big core (up to four for the ediu and two for the sall cores), and select the best possible co-schedule. This approach ignores the ipact of resource sharing aong cores to steer scheduling, however, we do account for resource sharing (shared cache, eory bandwidth, etc.) during detailed siulation for the selected schedules. Multi-threaded workloads. We also evaluate ulti-threaded workloads, using the PARSEC bencharks []. We vary the nuber of threads fro to in steps of. We only included the bencharks that allow for a nuber of threads that is not fixed to a power of. We use the ediu size input set for all bencharks, and evaluate the execution tie for the parallel part only (the so-called region of interest or ROI) and for the whole progra (including the sequential initialization and finalization code). We report speedups versus a four-threaded execution on the configuration.. Multi-Progra Workloads We now evaluate the perforance of the nine ulti-core designs for ulti-progra workloads, i.e., workloads consisting of ultiple single-threaded progras. (We will discuss ulti-threaded workloads in the next section.) We first discuss perforance as a function of thread count, and subsequently copute aggregate perforance under the assuption of various active thread count distributions.. Perforance as function of thread count Figure shows average perforance for the nine ulti-core configurations as a function of the nuber of threads fro to. All designs have SMT enabled in all cores, the non-smt curves can be reconstructed by leveling off perforance as soon as thread count equals core count. The interesting observation is that the hoogeneous configuration perfors well copared to the other hoogeneous and heterogeneous designs. Although the heterogeneous designs outperfor the configuration for soe thread counts, perfors well over the full range of thread counts. When thread count is low, yields the highest perforance, and when thread count is high, yields only slightly lower perforance copared to the any ediu and sall core designs ( and ). It is not surprising that ulti-core configuration perfors well for low thread counts: for or fewer threads, each powerful big core has only one thread running. What is ore rearkable is that the SMT ulti-core perfors also relatively well when thread count is high: for exaple, when there are threads, each core executes six threads concurrently, but perforance is close to that of running threads on 8 ediu cores (each three-way SMT) or threads on sall cores ( cores use -way SMT, the others execute only one thread). To explain this behavior, Figure shows the sae graphs for two hoogeneous workloads, which were picked to illustrate the interesting diversity observed across the various bencharks; we found the bencharks to roughly classify into these two categories. Tonto (left graph) shows the intuitively expected behavior: up to 8 threads, perforance of the SMT ulti-core is better than or siilar to the perforance of the heterogeneous architectures, but beyond 8 threads, its perforance is inferior. Tonto clearly benefits fro the higher aggregate execution resources available in the heterogeneous design points as well as in the hoogeneous ulti-core with all ediu or sall cores at high active thread counts. For libquantu (right graph) on the other hand, the ulti-core with SMT perfors approxiately as well as the other design points for high thread counts. What happens here is that as the nuber of threads increases, ore and ore pressure is put onto the shared resources (shared last-level cache, eory bandwidth, DRAM banks, etc.), upto the point that perforance gets largely doinated by shared resource contention and less by individual core perforance. In particular, we observe that, for libquantu, eory access tie is ties higher for threads than for one isolated thread for both the and configurations due to contention on the eory bus. This tightens the gap and flattens out the perforance differences between the various ulti-core configurations. It is interesting to note that, at high thread counts, the perforance of versus the other design points is slightly saller for the hoogeneous workloads than for the heterogeneous workloads, copare graphs (a) versus (b) in Figure. In fact, for heterogeneous workloads and threads, we notice that the perforance of is only 7.% lower than the axiu (), while for hoogeneous workloads, s perforance is.6% lower than

6 7 noralized throughput noralized throughput thread count thread count (a) Hoogeneous workloads (b) Heterogeneous workloads Figure. Coparing the perforance for the nine ulti-core design points with hoogeneous and heterogeneous ultiprogra workloads. 9. noralized throughput noralized throughput thread count thread count (a) tonto (b) libquantu Figure. Perforance of the nine ulti-core design points for two representative bencharks (both hoogeneous ultiprogra workloads): (a) tonto and (b) libquantu. the axiu (). This is due to the fact that heterogeneous workloads consist of ixes of both eory-intensive and copute-intensive bencharks. Scheduling a eoryintensive benchark with copute-intensive bencharks on one core using SMT enables the eory-intensive benchark to occupy a larger fraction of the core s private cache (6K in our study), as the copute-intensive bencharks are less deanding for cache space. In case of a ulti-core with any sall cores (), each core has a sall private cache (8K in our setup), hence, a eory-intensive benchark would not get as uch cache space. y intelligently scheduling bencharks to cores and SMT thread contexts, the ulti-core is better capable of utilizing cache space than a ulti-core with any sall cores and relatively saller private caches. For copleteness, Figure shows the average noralized turnaround tie (ANTT) for the hoogeneous workloads as a function of thread count (the results for heterogeneous workloads are siilar). At sall thread counts, the design results in the lowest per-progra execution tie (highest per-progra perforance), because all threads can run on a big core. Per-progra execution tie increases as thread count goes up, because ore threads share a core through SMT, reducing per-progra perforance. For the other ex- average noralized turnaround tie thread count 8s t s t s t 6s t Figure. Coparing the ANTT for the nine ulti-core design points with hoogeneous ulti-progra workloads. tree configuration, the turnaround tie is larger for low thread counts, because of the poorly perforing cores, but it reains ore stable as thread count increases, due to a saller degree of sharing. The conclusions are siilar to that of the througput results: at low thread counts, has the highest throughput and the lowest per-progra execution tie, and at high thread counts, the configurations with ore and saller cores have the highest throughput and the lowest per-progra execution tie, but the configuration reains close.

7 . noralized throughput.. noralized throughput.... hoogeneous workloads heterogeneous workloads Figure 6. Average perforance assuing a unifor thread count distribution and no SMT. We conclude this section with our first finding: Finding #: A hoogeneous ulti-core consisting of all big SMT cores yields better perforance than a heterogeneous ulti-core for a sall nuber of threads (due to the bigger cores) and only slightly worse for a large nuber of threads (because shared resource contention largely doinates perforance for workload ixes of eory-intensive applications, and cache capacity can be used ore efficiently through intelligent scheduling).. Thread count distributions We now copare the ulti-core designs under various active thread distributions, assuing unifor distributions as well as distributions observed in datacenter operations... Unifor distribution. We begin with assuing a unifor distribution over threads, i.e., each thread count ( to threads) has equal probability. No SMT. We first assue that none of the cores ipleent SMT. Figure 6 shows the average perforance for all of the ulti-core designs without SMT. Each core can execute only one thread at a tie, and when there are ore threads than cores, ultiple threads run on one core sequentially through tie-sharing. Clearly, the configuration outperfors the other hoogeneous configurations (, ). eing able to execute faster at low thread counts is ore iportant than achieving a high throughput at high thread counts. This is in line with Adahl s law: as parallelis increases, the perforance of the sequential part (low thread count) doinates the perforance of the progra as a whole. The ost iportant conclusion is that the optial design without SMT is for hoogeneous workloads and for heterogeneous workloads both heterogeneous ulti-core designs. Hence our second finding: Finding #: In the absence of SMT, heterogeneous ulticores outperfor hoogeneous ulti-cores across varying thread counts. At low thread counts, the big cores in a heterogeneous ulti-core can be used to get high perforance, hoogeneous workloads heterogeneous workloads Figure 7. Average perforance assuing a unifor thread count distribution and SMT in the hoogeneous configurations. noralized throughput.... hoogeneous workloads heterogeneous workloads Figure 8. Average perforance assuing a unifor thread count distribution and SMT in all configurations. while at high thread counts, the larger aount of sall cores can be used to exploit thread-level parallelis. This is in line with recent work that advocates single-isa heterogeneous ulti-core processors [, 9]. SMT in hoogeneous designs. We now assue SMT is ipleented in the hoogeneous designs (, and ), but not the heterogeneous designs. Figure 7 shows average perforance for the various designs. It is interesting to copare this graph against the one in Figure 6, which showed that heterogeneous ulti-cores yield higher perforance than hoogeneous ulti-cores when the nuber of threads varies. Now, through Figure 7, we observe that by adding SMT to the hoogeneous ulti-cores, the design outperfors the other designs. This leads to: Finding #: A hoogeneous ulti-core with big SMT cores outperfors a heterogeneous ulti-core (without SMT) under the sae power budget. Put differently, SMT outperfors heterogeneity as a eans to cope with varying thread counts. The intuition is that, at low thread counts, the design with SMT is able to use all big cores, while the nuber of big cores in the heterogeneous designs is always saller. At high thread counts, a hoogeneous ulti-core with big SMT cores allows for ore concurrent threads ( in total) copared to heterogeneous ulti-cores (at ost in the

8 6 noralized throughput Figure 9. Average perforance per benchark assuing a unifor thread count distribution. frequency thread count (a) Datacenter distribution noralized throughput without SMT with SMT without SMT with SMT datacenter (b) Average throughput datacenter irrored Figure. Datacenter distribution and average perforance using the datacenter distribution and the irrored datacenter distribution. design point), yielding higher overall throughput within the sae power budget. SMT in all designs. Finally, Figure 8 shows average perforance when SMT is enabled in all cores of all designs. For hoogeneous workloads, the perforance for the best heterogeneous configuration is.6% higher than that of without SMT in all configurations (Figure 6), but only.6% higher than with SMT in all designs (Figure 8). For heterogeneous workloads, the hoogeneous design even outperfors the best heterogeneous design by.%. Thus, in other words: Finding #: The added benefit of cobining heterogeneity and SMT is liited. It is also interesting to observe that the optial heterogeneous design shifts fro without SMT to with SMT for the hoogeneous workloads, and fro to for the heterogeneous workloads. Hence: Finding #: Adding SMT to the heterogeneous designs akes the optiu shift towards fewer and larger cores. This is in line with the general observation that SMT in larger cores enables flexibility as a function of active thread count. Per-benchark results. Figure 9 shows average perforance for the various ulti-core configurations (SMT enabled in all cores) for each benchark, assuing a unifor distribution. The results vary across bencharks: for soe bencharks (calculix, h6ref, her and tonto), perfors worse than the best heterogeneous ulti-core, while for others it perfors siilarly, or even slightly better (libquantu and cf). Detailed analysis of the results revealed that the latter category of bencharks have high eory bandwidth deands, resulting in bandwidth-bound perforance nubers for high thread counts. Section 8. contains results with a higher eory bandwidth setting... Datacenter distributions. Figure (b) shows average perforance across two different thread count distributions, assuing heterogeneous workload ixes. Datacenter is the distribution taken fro [] for CPU utilization in a datacenter, adapted to a workload of at ost threads; Figure (a) shows the distribution: there is a peak at thread (low utilization) and one at 7 to 9 threads (%-% utilization). Mirrored datacenter is the sae distribution, irrored around the center. This eans that there now is a peak at threads, and one around 6 to 8 threads. We use this distribution to odel a ore heavily

9 noralized speedup noralized speedup without SMT with SMT without SMT with SMT (a) ROI only (b) Whole progra Figure. Average noralized speedup for all PARSEC bencharks loaded server park, with a distribution skewed to the higher thread counts. For the datacenter distribution, is the best perforing configuration without SMT, see Figure (b). This is as expected: we have big core for the peak at thread, and 7 cores in total to cover the peak around 7 threads. Adding SMT again akes the fewer but bigger cores configurations ore optial, with the best perforance for the configuration. For the irrored datacenter distribution, the optiu without SMT is, because there is a peak at 6 threads. For the configurations with SMT, is optial, with perforing only.6% worse. Finding #6: For distributions that are skewed to fewer threads, the configuration with SMT is optial. For distributions that are skewed towards ore active threads, with SMT becoes less optial, but its perforance is very close to the optiu.. Multi-Threaded Workloads As discussed in Section, ulti-threaded progras can also have a variable nuber of active threads. When threads have to wait due to synchronization (e.g., a barrier), they can be scheduled out by the operating syste to free resources for other runnable threads. Periods with low active thread count are critical to perforance, since they exhibit little parallelis and are therefore ore difficult to speed up [6]. Achieving high perforance at low thread counts is therefore likely to be even ore crucial for ulti-threaded workloads than for ulti-prograed workloads. We use the PARSEC bencharks in this section, and always report the axiu speedup across all possible thread counts. Note this does not necessarily equal total core count because of interference between threads in shared resources. We further assue pinned scheduling which pins threads to cores to iprove data locality (as done in odern ulti-core schedulers []); and we execute serial phases on the big core when reporting whole progra perforance results. We liit the discussion in this section to heterogeneous designs with a single big core as we assue pinned scheduling which does not enable benefiting fro ultiple big cores. (We verified that none of the other heterogeneous designs have larger speedups than the ones reported here.) The results, averaged across all bencharks, are shown in Figure. We split up the results for the ROI-only and the whole progra, and show the speedups without SMT (i.e., nuber of threads equal to nuber of cores) and with SMT. For the ROI-only results without SMT, is the optial design. This is because ost of the applications scale well up to 8 threads, but not beyond. Adding SMT boosts the speedup for the design, and akes its speedup very close to that of. Overall, the design with SMT perfors well for bencharks that have poor parallelis, and perfors only slightly worse for progras that scale well. For the whole-progra results, the design perfors best both without and with SMT. The design perfors best for applications with liited parallelis, and close to optial for applications that scale better, but that have a large initialization and finalization serial phase. Without SMT, the heterogeneous designs perfor close to the configuration: they speed up the serial phases, but on average, the poorly scaling bencharks achieve better perforance on the configuration and this doinates the average. With SMT enabled, the difference between the configuration and the heterogeneous configurations is larger, because with SMT speeds up well-scaling bencharks ore. Figure shows per-benchark speedups. For the ROIonly results (top graph), it clearly shows the difference across bencharks: is optial for well-scaling bencharks, while or a heterogeneous design are optial for poorly scaling bencharks. For the whole progra results (botto), the optial configuration is or a heterogeneous design for ost of the bencharks. Finding #7: SMT is also beneficial for ulti-threaded workloads. As for the ulti-progra workloads, adding SMT lets the optial design shift to fewer but larger cores. A hoogeneous design with big SMT cores outperfors the best heterogeneous design without SMT, and perfors close to, and soeties even slightly better than, the best heterogeneous design with SMT.

10 (a) ROI-only noralized throughput noralized throughput nuber of threads (a) Hoogeneous ulti-progra workloads dynaic w/o SMT dynaic w/ SMT dynaic w/o SMT dynaic w/ SMT (b) Whole progra Figure. Noralized speedup for the individual PARSEC bencharks. 6. Dynaic Multi-Cores Dynaic ulti-cores are ulti-core processors with a dynaic configuration [, 7]: core configuration and the nuber of cores can dynaically vary between any sall cores and a few large cores, and in-between heterogeneous configurations. Theoretical studies, such as the one of Hill and Marty [], show that this type of ulti-core is optial in the context of varying parallelis and varying thread count. Through dynaic adaptation, one or a few big cores can be fored when there is low parallelis, while the configuration is changed to any sall cores when there are a lot of active threads. This technique is essentially the inverse of SMT: an SMT core executes a single thread but can execute ultiple threads at higher active thread counts; a dynaic ulti-core executes threads on independent cores, which can be fused to bigger cores at low active thread counts. To copare the abilities of a hoogeneous ulti-core with big SMT cores () versus a dynaic ulti-core to cope with varying active thread counts, we assue an ideal dynaic ulti-core that can be orphed without overhead into any of the 9 ulti-core configurations in Figure. This ideal dynaic ulti-core chooses the best perforing configuration (out of the 9 possible configurations) at each thread count for each workload. This is an optiistic assuption in favor of dynaic ulti-cores, since fusing cores is likely to involve a non-negligible tie, area and power overhead. Figure copares dynaic ulti-cores (both with and without SMT) against the configuration (with SMT) for the hoogeneous and heterogeneous nuber of threads (b) Heterogeneous ulti-progra workloads Figure. Throughput as a function of the nuber of threads for the configuration with SMT and the dynaic core fusion configuration with and without SMT. ulti-progra workloads. This figure shows that dynaic ulti-cores without SMT yield siilar or even worse overall perforance. Especially for heterogeneous workloads, SMT sees to perfor better than a dynaic ulti-core design. The reason is that SMT enables better utilization and higher throughput within the sae power budget, especially when the progras are copleentary in their resource deands. SMT also allows for ore fine-grained parallelis: for the dynaic ulti-core, a big core can be split up into ediu cores or sall cores, but an SMT core can also execute and threads concurrently, while fully utilizing all resources. As a result, the line in Figure (b) soothly increases, while the dynaic line (without SMT) shows ultiple plateaus with jups when the configuration changes. A dynaic ulti-core that also supports SMT perfors the best, but this will probably result in a very coplex design and an even ore coplex scheduling and reconfiguration policy. We thus conclude: Finding #8: Hoogeneous ulti-cores with big SMT cores outperfor (or are at least copetitive to) dynaic ulticores as a way to cope with variable active thread counts. A cobination of both is optial, but is also the ost coplex, both with respect to design and run-tie scheduling.

11 power (W) thread count Figure. Power consuption as a function of thread count for all configurations assuing power gating. 7. Energy Efficiency In the previous sections, we focused on perforance under an equal total power budget. However, power-gating can be used to turn off idle cores, resulting in lower power consuption at low active thread counts. Especially for the configurations with any ediu or sall cores, this ay result in iproved power/energy-efficiency copared to the hoogeneous configuration with a few big SMT cores. Power consuption as a function of thread count. Figure shows average power consuption for all configurations (all configurations have SMT enabled in all cores) as a function of thread count when power-gating unused cores (averaged across all hoogeneous ulti-progra workloads). It is interesting to study power consuption along with perforance as shown in Figure : the configuration consues ost power at low active thread counts while delivering highest perforance; the configuration consues least power while delivering poorest perforance; on the other hand, at high thread counts, all configurations perfor nearly as well while consuing siilar levels of power. Figure also shows that activating SMT contexts increases power consuption, due to the increase in resource utilization, but not as uch as the increase in power consuption fro activating cores (see for exaple the configuration: power consuption increases fro Watt for threads to 6 Watt for threads). Note that the nubers for one thread (leftost points) do not show the // relative power difference for the big, ediu and sall cores (the power consuption for one active core is 7.,. and 9.8 Watt, for, and s, respectively). This is because the shared L cache and the ain eory (DRAM) are active all the tie, irrespective of active thread count these resources consue approxiately 7 Watt. The relative difference in power consuption for the three core types is reflected in the slopes of the, and configurations (part of the curves that do not use SMT, i.e., with thread count lower than or equal to core count). Pareto-optial designs. Figure shows the power and energy consuption as a function of perforance for the power (W) noralized energy noralized throughput 8 6 (a) Power versus perforance noralized throughput (b) Energy versus perforance Figure. Throughput versus power (top) and energy (botto) consuption for heterogeneous ulti-progra workloads (assuing a unifor thread count distribution). heterogeneous ulti-progra workloads (assuing a unifor thread count distribution). There are several interesting observations to be ade. First, the configuration consues the least power, but results in high energy consuption due to its poor perforance. In other words, a configuration with any sall cores is not energy-optial. Second, the configuration is the best perforing, but also has higher power and energy consuption. Third, the Pareto-optial frontier is populated with heterogeneous design points, along with the best-perforance and lowestpower configurations: the Pareto-optial frontier consists of the following design points,,,,,, and, for power versus perforance (top graph in Figure ), and, and, for energy versus perforance (botto graph). In other words, heterogeneity trades off perforance for power and energy consuption. The design point with the iniu energy-delay product (EDP) across all the designs considered is the configuration, yet this heterogeneous design point iproves EDP by as little as.% and.8% over the design point for the hoogeneous and heterogeneous workloads, respectively. This leads to the following finding: Finding #9: Heterogeneous ulti-core designs, when power gating idle cores, yield an (only) slightly better energyefficiency copared to hoogeneous ulti-cores with big SMT cores under variable active thread count conditions.

12 noralized speedup Figure 6. Average ulti-threaded benchark perforance with alternative large-cache and high-frequency configurations. 8. Alternative Multi-Core Designs 8. Larger caches or higher frequency for the sall cores In Section, we assued particular design decisions that ay ipact the final results. One decision was to keep total cache capacity constant across all designs. The otivation was to evaluate the ipact of core type and organization, not cache capacity. Nevertheless, we noticed that sharing a cache between ultiple progras co-executing on an SMT core can lead to better cache usage. We therefore now evaluate the effect of keeping private cache sizes constant across core types. We also evaluate the ipact of increasing frequency of the sall cores to iprove its perforance. Figure 6 shows average speedup for the ulti-threaded bencharks (ROI-only). 6 lc and 6s lc (lc stands for larger cache) are configurations where the private L and L cache sizes for the ediu and sall cores are equal to that of the big core. Larger caches consue ore power, leading to a different power-equivalence aong core types: a big core is now power-equivalent to. ediu cores and sall cores, which explains the decreased core count for the configurations with a larger cache. Further, the 6 hf and 6s hf configurations contain 6 ediu cores or 6 sall cores with clock frequency increased fro.66 GHz to. GHz. This increase in frequency also results in a to., and a to power-equivalence between the big and ediu cores, and the big and sall cores, respectively. The results in Figure 6 show that a larger cache and, ore distinctly, higher frequency leads to a higher speedup for the sall-core configuration (copare 6s lc and 6s hf versus ). This is because any bencharks do not scale well up to threads, and reducing core count in exchange for ore cache capacity or a higher frequency results in higher speedup. For the ediu-core configuration () on the other hand, enlarging the cache or increasing the frequency has a negative ipact on perforance: the benefits of a large cache or a higher frequency do not copensate for the reduction in core count. Overall, we observe that a hoogeneous ulti-core with big SMT cores achieves best perforance for the given power budget. Hence, we conclude that: Finding #: Enlarging the caches or increasing the frequency of the ediu and sall cores does not affect the general observation that a hoogeneous ulti-core with big SMT cores is close to optial. 8. Higher eory bandwidth Another decision ade in our initial setup was to set the eory bandwidth to 8 G/s. However, as entioned before, for soe bencharks, eory bandwidth turns out to be a bottleneck. We therefore now double eory bandwidth to 6 G/s, see Figure 7. Coparing this Figure to Figures 8 and, we observe that perforance increases for all configurations, albeit by a sall argin. For the hoogeneous ulti-progra workloads, now achieves a.8% lower throughput than the optiu (which was.6% for 8 G/s), and a.% lower throughput for the heterogeneous ulti-progra workloads (used to be.% higher). For the ulti-threaded progras, considering ROI only, we observe a speedup for that is.9% lower than the optiu (which was.8% before), and a.8% higher speedup when considering the whole progra (.9% before). The progras that were bandwidth-bound in the 8 G/s setup now achieve better perforance across all configurations. These eory-bound bencharks especially benefit fro SMT, ore so than copute-bound progras []. Hence, our conclusion: Finding #: Even under high available eory bandwidths does the perforance of a hoogeneous design with big SMT cores reain close to the heterogeneous configurations. 9. Related Work Olukotun et al. [] ake the case for ulti-core processing. y coparing an aggressive single-core processor (6- wide out-of-order) and a dual-core processor consisting of -wide out-of-order cores, they found that parallelized applications with liited parallelis achieve coparable perforance on both architectures, and that applications with large aount of coarse-grained parallelis achieve significantly better perforance on the dual-core. Kuar et al. [9] argue that a single-isa heterogeneous ulti-core processor covers a spectru of workloads better than a conventional ulti-core processor, providing good single-thread perforance when thread-level parallelis is low, and high throughput when thread-level parallelis is high. Our results confir this finding: the heterogeneous ulti-core configurations achieve better overall perforance copared to across the broad range of active thread counts when SMT is not enabled. However, Kuar et al. did not consider and copare against a hoogeneous ulticore with big SMT cores, which we find to achieve a level of perforance that is copetitive to a heterogeneous design

13 noralized throughput noralized speedup hoogeneous (a) ulti-progra heterogeneous ROI only (b) ulti-threaded whole progra Figure 7. Perforance nubers assuing 6 G/s eory bandwidth. under varying degrees of thread-level parallelis, while being less costly to design and verify. Ipek et al. [] and Ki et al. [7] propose to fuse sall cores to for bigger cores when there are few active threads. y doing so, the ulti-core processor becoes ore dynaic and can ore easily adapt to software diversity. Our results indicate that siilar perforance benefits can be achieved through the opposite echanis: instead of fusing sall cores to for a big core when there are few active threads, one could schedule threads across big SMT cores (and have few active SMT threads per core) in a hoogeneous ulti-core. Khubaib et al. [6] build on a siilar insight when proposing MorphCore, an aggressive out-of-order core with -way SMT that can orph into an energy-efficient 8-way SMT in-order core. The idea is to switch between the two odes of operation depending on the aount of available thread-level parallelis: with few (one or two) active threads, the core runs in out-of-order ode, and switches to in-order SMT with ore active threads. Whereas Khubaib et al. focus on the proposal of an energyefficient core design that can switch between out-of-order and wide-smt in-order operation, the focus of our work is to study the ipact of variable thread-level parallelis in the workload, and how this affects ulti-core design decisions. More specifically, we consider distributions of active thread counts in ulti-progra workloads next to ulti-threaded workloads to copare hoogeneous ulti-cores with SMT against heterogeneous and dynaic ulti-cores. MorphCore is copleentary to our work and can be leveraged to further iprove energy-efficiency of the big SMT cores when running ultiple SMT threads. Hill and Marty [] evaluate the three ajor ulti-core processor architecture paradigs hoogeneous, heterogeneous and dynaic ulti-core and derive high-level insights fro Adahl s Law. Their odel did not consider SMT and assued that software is either sequential or infinitely parallel. One of the results that they obtain is that heterogeneous ulti-cores can achieve better perforance than hoogeneous ulti-cores; also, they find that dynaic processors achieve better perforance than heterogeneous ulti-cores with identical functions of perforance per unit area. While this is true considering the assuptions ade, our results show that this is not necessarily the case when the nuber of available threads varies over tie. A nuber of papers have explored how to take advantage fro heterogeneity to iprove ulti-threaded application perforance. Annavara et al. [] propose running sequential portions of a ulti-threaded application at a higher power budget, thereby significantly iproving perforance while reaining within a given power budget. Intel s Turbooost [] offers siilar functionality by boosting clock frequency. Sulean et al. [9] accelerate the execution of critical sections by exploiting high-perforance cores in a heterogeneous ulti-core, i.e., a thread that executes a critical section is igrated to a big core in order to reduce serialization tie a technique called Accelerating Critical Sections (ACS). Joao et al. [] generalize this principle to other types of synchronization bottlenecks, including critical sections, barriers and pipes. All of these approaches exploit the fact that the nuber of active threads varies over tie and leverage heterogeneity to iprove perforance. This paper suggests that siilar perforance benefits ight potentially be achieved through SMT on a hoogeneous ulticore. More specifically, when a thread is executing sequential code (e.g., initialization, critical section, etc.), scheduling it on a single core with the other SMT threads throttled, ight achieve siilar perforance benefits, and does not require igrating (or arshaling []) data when a thread is igrated fro a sall to a big core in ACS. Li et al. [] copare the energy-efficiency and theral characteristics of SMT versus ulti-core. They report that, assuing an equal area budget, SMT is ore energyefficient than ulti-core for eory-intensive workloads; the inverse is true for copute-intensive workloads. Kuar et al. [8] exploit dynaic tie-varying application behavior to schedule applications on the ost energy-efficient core

14 in a heterogeneous ulti-core, and they report substantial energy savings copared to a hoogeneous ulti-core. In contrast to this prior work, we explore ulti-core configurations under variable thread-level parallelis conditions.. Conclusion The nuber of active threads varies over tie in today s coputer systes. This has been observed across any application doains, ranging fro ulti-progra systes, desktop applications, datacenter servers, and even ultithreaded applications. This paper studied how varying degrees of thread-level parallelis in the workload affect ulti-core design decisions. We considered hoogeneous, heterogeneous and dynaic ulti-cores under an equal power budget, and conclude that a hoogeneous ulti-core consisting of big SMT cores achieves coparable or slightly better perforance copared to heterogeneous ulti-cores (both with and without SMT) and dynaic ulti-cores. The reason is that a hoogeneous ulti-core with big SMT cores can better adapt to varying degrees of thread-level parallelis in the workload, and achieves higher per-thread perforance at low active thread counts and copetitive throughput at high active thread counts. Finally, we also find that heterogeneous ulti-cores are (only) slightly ore energy-efficient copared to a hoogeneous all-big-core configuration with SMT, when power gating idle cores. The overall conclusion is that, while ulti-cores with any sall cores, be it hoogeneous or heterogeneous architectures, outperfor hoogeneous ulti-cores with big SMT cores at full utilization, the inverse is typically true under variable active thread workload conditions, which akes hoogeneous ulti-cores with big SMT cores an appealing, cost-effective design point for the variable active threads workloads coonly observed in odern-day systes. Acknowledgents We thank the anonyous reviewers for their valuable and constructive feedback. Stijn Eyeran is a postdoctoral fellow of the Research Foundation Flanders. Additional support is provided by the European Research Council under the European Counity s Seventh Fraework Prograe (FP7/7-) / ERC Grant agreeent no. 99. Experients were run at the VSC Fleish Supercoputer Center. References [] M. Annavara, E. Grochowski, and J. Shen. Mitigating Adahl s law through EPI throttling. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages 98 9, June. [] L. A. arroso and U. Hölzle. The case for energy-proportional systes. IEEE Coputer, : 7, Dec. 7. [] C. ienia, S. Kuar, J. P. Singh, and K. Li. The PARSEC benchark suite: Characterization and architectural iplications. In Proceedings of the International Conference on Parallel Architectures and Copilation Techniques (PACT), pages 7 8, Oct. 8. [] G. lake, R. G. Dreslinski, T. N. Mudge, and K. Flautner. Evolution of thread-level parallelis in desktop applications. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages, June. [] T. E. Carlson, W. Heiran, and L. Eeckhout. Sniper: Exploring the level of abstraction for scalable and accurate parallel ulti-core siulation. In Proceedings of International Conference for High Perforance Coputing, Networking, Storage and Analysis (SC), pages : :, Nov.. [6] K. Du ois, S. Eyeran, J. Sartor, and L. Eeckhout. Criticality stacks: Identifying critical threads in parallel progras using synchronization behavior. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages, June. [7] S. Eyeran and L. Eeckhout. Syste-level perforance etrics for ulti-progra workloads. IEEE Micro, 8():, May/June 8. [8] P. Greenhalgh. ig.little processing with ARM Cortex-A & Cortex-A7: Iproving energy efficiency in high-perforance obile platfors. LITTLE Final Final.pdf, Sept.. [9] L. Haond, M. Willey, and K. Olukotun. Data speculation support for a chip ultiprocessor. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages 8 69, Oct [] M. D. Hill and M. R. Marty. Adahl s law in the ulticore era. IEEE Coputer, (7): 8, July 8. [] E. Ipek, M. Kiran, N. Kiran, and J. F. Martinez. Core fusion: Accoodating software diversity in chip ultiprocessors. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages 86 97, June 7. [] J. A. Joao, M. A. Sulean, O. Mutlu, and Y. N. Patt. ottleneck identification and scheduling in ultithreaded applications. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages, Mar.. [] M. T. Jones. Inside the Linux scheduler: The latest version of this all-iportant kernel coponent iproves scalability. June 6. [] R. Kalla,. Sinharoy, W. J. Starke, and M. Floyd. Power7: IM s next-generation server processor. IEEE Micro, :7, March/April. [] C. N. Keltcher, K. J. McGrath, A. Ahed, and P. Conway. The AMD Opteron processor for ultiprocessor servers. IEEE Micro, ():66 76, Mar. 7. [6] K. Khubaib, M. Sulean, M. Hashei, C. Wilkerson, and Y. Patt. MorphCore: An energy-efficient icroarchitecture for high perforance ILP and high throughput TLP. In th Annual IEEE/ACM International Syposiu on Microarchitecture (MICRO), pages 6, Dec..

15 [7] C. Ki, S. Sethuadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. urger, and S. Keckler. Coposable lightweight processors. In Proceedings of the International Syposiu on Microarchitecture (MICRO), pages 8 9, Dec. 7. [8] R. Kuar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous ulti-core architectures: The potential for processor power reduction. In Proceedings of the ACM/IEEE Annual International Syposiu on Microarchitecture (MICRO), pages 8 9, Dec.. [9] R. Kuar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous ulti-core architectures for ultithreaded workload perforance. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages 6 7, June. [] S. Li, J. H. Ahn, R. D. Strong, J.. rockan, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and tiing odeling fraework for ulticore and anycore architectures. In Proceedings of the IEEE/ACM International Syposiu on Microarchitecture (MICRO), pages 69 8, Dec. 9. [] Y. Li, D. rooks, Z. Hu, and K. Skadron. Perforance, energy, and theral considerations for SMT and CMP architectures. In Proceedings of the International Syposiu on High-Perforance Coputer Architecture (HPCA), pages 7 8, Feb.. [] NVidia. Variable SMP a ulti-core CPU architecture for low power and high perforance. white papers/ Variable-SMP-A-Multi-Core-CPU-Architecture-for-Low- Power-and-High-Perforance-v..pdf,. [] K. Olukotun,. A. Nayfeh, L. Haond, K. Wilson, and K.-Y. Chang. The case for a single-chip ultiprocessor. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages, Oct [] S. E. Raasch and S. K. Reinhardt. The ipact of resource partitioning on SMT processors. In Proceedings of the th International Conference on Parallel Architectures and Copilation Techniques (PACT), pages 6, Sept.. [] E. Rote, A. Naveh, D. Rajwan, A. Ananthakrishnan, and E. Weissann. Power-anageent architecture of the intel icroarchitecture code-naed sandy bridge. IEEE Micro, : 7, March/April. [6] T. Sherwood, E. Perelan, G. Haerly, and. Calder. Autoatically characterizing large scale progra behavior. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages 7, Oct.. [7] A. Snavely and D. M. Tullsen. Sybiotic jobscheduling for siultaneous ultithreading processor. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages, Nov.. [8] G. S. Sohi, S. E. reach, and T. N. Vijaykuar. Multiscalar processors. In Proceedings of the nd Annual International Syposiu on Coputer Architecture (ISCA), pages, June 99. [9] M. A. Sulean, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asyetric ulticore architectures. In Proceedings of the International Conference on Architectural Support for Prograing Languages and Operating Systes (ASPLOS), pages 6, Mar. 9. [] M. A. Sulean, O. Mutlu, J. A. Joao, Khubaib, and Y. N. Patt. Data arshaling for ulti-core architectures. In Proceedings of the International Syposiu on Coputer Architecture (ISCA), pages, June. [] D. M. Tullsen, S. J. Eggers, J. S. Eer, H. M. Levy, J. L. Lo, and R. L. Sta. Exploiting choice: Instruction fetch and issue on an ipleentable siultaneous ultithreading processor. In Proceedings of the rd Annual International Syposiu on Coputer Architecture (ISCA), pages 9, May 996. [] R. Velasquez, P. Michaud, and A. Seznec. Selecting benchark cobinations for the evaluation of ulticore throughput. In The IEEE International Syposiu on Perforance Analysis of Systes and Software (ISPASS), pages 7 8, Apr..

An Innovate Dynamic Load Balancing Algorithm Based on Task

An Innovate Dynamic Load Balancing Algorithm Based on Task An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hong-bin Wang,,a, Zhi-yi Fang, b, Guan-nan Qu,*,c, Xiao-dan Ren,d College of Coputer Science and Technology, Jilin University, Changchun

More information

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti Ylä-Jääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,

More information

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs Send Orders for Reprints to reprints@benthascience.ae 206 The Open Fuels & Energy Science Journal, 2015, 8, 206-210 Open Access The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic

More information

Online Bagging and Boosting

Online Bagging and Boosting Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

More information

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES Charles Reynolds Christopher Fox reynolds @cs.ju.edu fox@cs.ju.edu Departent of Coputer

More information

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and igration algoriths Chaia Ghribi, Makhlouf Hadji and Djaal Zeghlache Institut Mines-Téléco, Téléco SudParis UMR CNRS 5157 9, Rue

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523-464 eissn 526-5498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed

More information

Energy Proportionality for Disk Storage Using Replication

Energy Proportionality for Disk Storage Using Replication Energy Proportionality for Disk Storage Using Replication Jinoh Ki and Doron Rote Lawrence Berkeley National Laboratory University of California, Berkeley, CA 94720 {jinohki,d rote}@lbl.gov Abstract Energy

More information

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University - VNUHCM, Vietna vtlphuong@hciu.edu.vn

More information

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model Evaluating Inventory Manageent Perforance: a Preliinary Desk-Siulation Study Based on IOC Model Flora Bernardel, Roberto Panizzolo, and Davide Martinazzo Abstract The focus of this study is on preliinary

More information

Software Quality Characteristics Tested For Mobile Application Development

Software Quality Characteristics Tested For Mobile Application Development Thesis no: MGSE-2015-02 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology

More information

Dynamic Placement for Clustered Web Applications

Dynamic Placement for Clustered Web Applications Dynaic laceent for Clustered Web Applications A. Karve, T. Kibrel, G. acifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi IBM T.J. Watson Research Center {karve,kibrel,giovanni,spreitz,steinder,sviri,tantawi}@us.ib.co

More information

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET Francisco Alonso, Roberto Blanco, Ana del Río and Alicia Sanchis Banco de España Banco de España Servicio de Estudios Docuento de

More information

A Scalable Application Placement Controller for Enterprise Data Centers

A Scalable Application Placement Controller for Enterprise Data Centers W WWW 7 / Track: Perforance and Scalability A Scalable Application Placeent Controller for Enterprise Data Centers Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici IBM T.J.

More information

An Approach to Combating Free-riding in Peer-to-Peer Networks

An Approach to Combating Free-riding in Peer-to-Peer Networks An Approach to Cobating Free-riding in Peer-to-Peer Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 April 7, 2008

More information

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking

More information

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises Advance Journal of Food Science and Technology 9(2): 964-969, 205 ISSN: 2042-4868; e-issn: 2042-4876 205 Maxwell Scientific Publication Corp. Subitted: August 0, 205 Accepted: Septeber 3, 205 Published:

More information

Searching strategy for multi-target discovery in wireless networks

Searching strategy for multi-target discovery in wireless networks Searching strategy for ulti-target discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75-{878,

More information

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, Sheng-Min Liu Arts, Media and Engineering Progra Arizona State

More information

A Multi-Core Pipelined Architecture for Parallel Computing

A Multi-Core Pipelined Architecture for Parallel Computing Parallel & Cloud Coputing PCC Vol, Iss A Multi-Core Pipelined Architecture for Parallel Coputing Duoduo Liao *1, Sion Y Berkovich Coputing for Geospatial Research Institute Departent of Coputer Science,

More information

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008 SOME APPLCATONS OF FORECASTNG Prof. Thoas B. Foby Departent of Econoics Southern Methodist University May 8 To deonstrate the usefulness of forecasting ethods this note discusses four applications of forecasting

More information

ASIC Design Project Management Supported by Multi Agent Simulation

ASIC Design Project Management Supported by Multi Agent Simulation ASIC Design Project Manageent Supported by Multi Agent Siulation Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel Abstract The coplexity of Application Specific Integrated Circuits (ASICs) is continuously

More information

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Use of extrapolation to forecast the working capital in the mechanical engineering companies ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance

More information

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona

More information

Approximately-Perfect Hashing: Improving Network Throughput through Efficient Off-chip Routing Table Lookup

Approximately-Perfect Hashing: Improving Network Throughput through Efficient Off-chip Routing Table Lookup Approxiately-Perfect ing: Iproving Network Throughput through Efficient Off-chip Routing Table Lookup Zhuo Huang, Jih-Kwon Peir, Shigang Chen Departent of Coputer & Inforation Science & Engineering, University

More information

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaing Algoriths for Estiating Entropy of Network Traffic Ashwin Lall University of Rochester Vyas Sekar Carnegie Mellon University Mitsunori Ogihara University of Rochester Jun (Ji) Xu Georgia

More information

Enhancing MapReduce via Asynchronous Data Processing

Enhancing MapReduce via Asynchronous Data Processing Enhancing MapReduce via Asynchronous Data Processing Marwa Elteir, Heshan Lin and Wu-chun Feng Departent of Coputer Science Virginia Tech {aelteir, hlin2, feng}@cs.vt.edu Abstract The MapReduce prograing

More information

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization Ipact of Processing Costs on Service Chain Placeent in Network Functions Virtualization Marco Savi, Massio Tornatore, Giacoo Verticale Dipartiento di Elettronica, Inforazione e Bioingegneria, Politecnico

More information

Partitioned Elias-Fano Indexes

Partitioned Elias-Fano Indexes Partitioned Elias-ano Indexes Giuseppe Ottaviano ISTI-CNR, Pisa giuseppe.ottaviano@isti.cnr.it Rossano Venturini Dept. of Coputer Science, University of Pisa rossano@di.unipi.it ABSTRACT The Elias-ano

More information

CPU Animation. Introduction. CPU skinning. CPUSkin Scalar:

CPU Animation. Introduction. CPU skinning. CPUSkin Scalar: CPU Aniation Introduction The iportance of real-tie character aniation has greatly increased in odern gaes. Aniating eshes ia 'skinning' can be perfored on both a general purpose CPU and a ore specialized

More information

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS 641 CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS Marketa Zajarosova 1* *Ph.D. VSB - Technical University of Ostrava, THE CZECH REPUBLIC arketa.zajarosova@vsb.cz Abstract Custoer relationship

More information

Implementation of Active Queue Management in a Combined Input and Output Queued Switch

Implementation of Active Queue Management in a Combined Input and Output Queued Switch pleentation of Active Queue Manageent in a obined nput and Output Queued Switch Bartek Wydrowski and Moshe Zukeran AR Special Research entre for Ultra-Broadband nforation Networks, EEE Departent, The University

More information

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No. - 0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:

More information

Applying Multiple Neural Networks on Large Scale Data

Applying Multiple Neural Networks on Large Scale Data 0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree

More information

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration International Journal of Hybrid Inforation Technology, pp. 339-350 http://dx.doi.org/10.14257/hit.2016.9.4.28 An Iproved Decision-aking Model of Huan Resource Outsourcing Based on Internet Collaboration

More information

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking International Journal of Future Generation Counication and Networking Vol. 8, No. 6 (15), pp. 197-4 http://d.doi.org/1.1457/ijfgcn.15.8.6.19 An Integrated Approach for Monitoring Service Level Paraeters

More information

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.

More information

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Equivalent Tapped Delay Line Channel Responses with Reduced Taps Equivalent Tapped Delay Line Channel Responses with Reduced Taps Shweta Sagari, Wade Trappe, Larry Greenstein {shsagari, trappe, ljg}@winlab.rutgers.edu WINLAB, Rutgers University, North Brunswick, NJ

More information

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing Machine Learning Applications in Grid Coputing George Cybenko, Guofei Jiang and Daniel Bilar Thayer School of Engineering Dartouth College Hanover, NH 03755, USA gvc@dartouth.edu, guofei.jiang@dartouth.edu

More information

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS Isaac Zafrany and Sa BenYaakov Departent of Electrical and Coputer Engineering BenGurion University of the Negev P. O. Box

More information

Information Processing Letters

Information Processing Letters Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul

More information

arxiv:0805.1434v1 [math.pr] 9 May 2008

arxiv:0805.1434v1 [math.pr] 9 May 2008 Degree-distribution stability of scale-free networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of

More information

- 265 - Part C. Property and Casualty Insurance Companies

- 265 - Part C. Property and Casualty Insurance Companies Part C. Property and Casualty Insurance Copanies This Part discusses proposals to curtail favorable tax rules for property and casualty ("P&C") insurance copanies. The syste of reserves for unpaid losses

More information

Endogenous Credit-Card Acceptance in a Model of Precautionary Demand for Money

Endogenous Credit-Card Acceptance in a Model of Precautionary Demand for Money Endogenous Credit-Card Acceptance in a Model of Precautionary Deand for Money Adrian Masters University of Essex and SUNY Albany Luis Raúl Rodríguez-Reyes University of Essex March 24 Abstract A credit-card

More information

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks Reliability Constrained acket-sizing for inear Multi-hop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois

More information

Endogenous Market Structure and the Cooperative Firm

Endogenous Market Structure and the Cooperative Firm Endogenous Market Structure and the Cooperative Fir Brent Hueth and GianCarlo Moschini Working Paper 14-WP 547 May 2014 Center for Agricultural and Rural Developent Iowa State University Aes, Iowa 50011-1070

More information

Managing Complex Network Operation with Predictive Analytics

Managing Complex Network Operation with Predictive Analytics Managing Coplex Network Operation with Predictive Analytics Zhenyu Huang, Pak Chung Wong, Patrick Mackey, Yousu Chen, Jian Ma, Kevin Schneider, and Frank L. Greitzer Pacific Northwest National Laboratory

More information

Is Pay-as-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes?

Is Pay-as-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes? Is Pay-as-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes? By Ian W.H. Parry Despite concerns about US dependence on a volatile world oil arket, greenhouse gases fro fuel cobustion,

More information

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Vol. 9, No. 5 (2016), pp.303-312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou

More information

Mobile Backhaul in Heterogeneous Network Deployments: Technology Options and Power Consumption

Mobile Backhaul in Heterogeneous Network Deployments: Technology Options and Power Consumption Mobile Backhaul in Heterogeneous Network Deployents: Technology Options and Power Consuption Paolo Monti*, Meber, IEEE, Sibel Tobaz, Meber, IEEE, Lena Wosinska, Senior Meber, IEEE, Jens Zander, Meber,

More information

A short-term, pattern-based model for water-demand forecasting

A short-term, pattern-based model for water-demand forecasting 39 Q IWA Publishing 2007 Journal of Hydroinforatics 09.1 2007 A short-ter, pattern-based odel for water-deand forecasting Stefano Alvisi, Marco Franchini and Alberto Marinelli ABSTRACT The short-ter, deand-forecasting

More information

Investing in corporate bonds?

Investing in corporate bonds? Investing in corporate bonds? This independent guide fro the Australian Securities and Investents Coission (ASIC) can help you look past the return and assess the risks of corporate bonds. If you re thinking

More information

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel Recent Advances in Counications Adaptive odulation and Coding for Unanned Aerial Vehicle (UAV) Radio Channel Airhossein Fereidountabar,Gian Carlo Cardarilli, Rocco Fazzolari,Luca Di Nunzio Abstract In

More information

Data Set Generation for Rectangular Placement Problems

Data Set Generation for Rectangular Placement Problems Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George

More information

Fuzzy Sets in HR Management

Fuzzy Sets in HR Management Acta Polytechnica Hungarica Vol. 8, No. 3, 2011 Fuzzy Sets in HR Manageent Blanka Zeková AXIOM SW, s.r.o., 760 01 Zlín, Czech Republic blanka.zekova@sezna.cz Jana Talašová Faculty of Science, Palacký Univerzity,

More information

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment 6 JOURNAL OF SOFTWARE, VOL. 4, NO. 3, MAY 009 The AGA Evaluating Model of Custoer Loyalty Based on E-coerce Environent Shaoei Yang Econoics and Manageent Departent, North China Electric Power University,

More information

ADJUSTING FOR QUALITY CHANGE

ADJUSTING FOR QUALITY CHANGE ADJUSTING FOR QUALITY CHANGE 7 Introduction 7.1 The easureent of changes in the level of consuer prices is coplicated by the appearance and disappearance of new and old goods and services, as well as changes

More information

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs-2014-0011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,

More information

Quality evaluation of the model-based forecasts of implied volatility index

Quality evaluation of the model-based forecasts of implied volatility index Quality evaluation of the odel-based forecasts of iplied volatility index Katarzyna Łęczycka 1 Abstract Influence of volatility on financial arket forecasts is very high. It appears as a specific factor

More information

No. 2004/12. Daniel Schmidt

No. 2004/12. Daniel Schmidt No. 2004/12 Private equity-, stock- and ixed asset-portfolios: A bootstrap approach to deterine perforance characteristics, diversification benefits and optial portfolio allocations Daniel Schidt Center

More information

Dynamic right-sizing for power-proportional data centers Extended version

Dynamic right-sizing for power-proportional data centers Extended version 1 Dynaic right-sizing for power-proportional data centers Extended version Minghong Lin, Ada Wieran, Lachlan L. H. Andrew and Eno Thereska Abstract Power consuption iposes a significant cost for data centers

More information

Method of supply chain optimization in E-commerce

Method of supply chain optimization in E-commerce MPRA Munich Personal RePEc Archive Method of supply chain optiization in E-coerce Petr Suchánek and Robert Bucki Silesian University - School of Business Adinistration, The College of Inforatics and Manageent

More information

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM Eercise 4 IVESTIGATIO OF THE OE-DEGREE-OF-FREEDOM SYSTEM 1. Ai of the eercise Identification of paraeters of the euation describing a one-degree-of- freedo (1 DOF) atheatical odel of the real vibrating

More information

6. Time (or Space) Series Analysis

6. Time (or Space) Series Analysis ATM 55 otes: Tie Series Analysis - Section 6a Page 8 6. Tie (or Space) Series Analysis In this chapter we will consider soe coon aspects of tie series analysis including autocorrelation, statistical prediction,

More information

Local Area Network Management

Local Area Network Management Technology Guidelines for School Coputer-based Technologies Local Area Network Manageent Local Area Network Manageent Introduction This docuent discusses the tasks associated with anageent of Local Area

More information

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries Int J Digit Libr (2000) 3: 9 35 INTERNATIONAL JOURNAL ON Digital Libraries Springer-Verlag 2000 A fraework for perforance onitoring, load balancing, adaptive tieouts and quality of service in digital libraries

More information

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS PREDICTIO OF POSSIBLE COGESTIOS I SLA CREATIO PROCESS Srećko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia Tel +385 20 445-739,

More information

SAMPLING METHODS LEARNING OBJECTIVES

SAMPLING METHODS LEARNING OBJECTIVES 6 SAMPLING METHODS 6 Using Statistics 6-6 2 Nonprobability Sapling and Bias 6-6 Stratified Rando Sapling 6-2 6 4 Cluster Sapling 6-4 6 5 Systeatic Sapling 6-9 6 6 Nonresponse 6-2 6 7 Suary and Review of

More information

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?

More information

Red Hat Enterprise Linux: Creating a Scalable Open Source Storage Infrastructure

Red Hat Enterprise Linux: Creating a Scalable Open Source Storage Infrastructure Red Hat Enterprise Linux: Creating a Scalable Open Source Storage Infrastructure By Alan Radding and Nick Carr Abstract This paper discusses the issues related to storage design and anageent when an IT

More information

Investing in corporate bonds?

Investing in corporate bonds? Investing in corporate bonds? This independent guide fro the Australian Securities and Investents Coission (ASIC) can help you look past the return and assess the risks of corporate bonds. If you re thinking

More information

Algorithmica 2001 Springer-Verlag New York Inc.

Algorithmica 2001 Springer-Verlag New York Inc. Algorithica 2001) 30: 101 139 DOI: 101007/s00453-001-0003-0 Algorithica 2001 Springer-Verlag New York Inc Optial Search and One-Way Trading Online Algoriths R El-Yaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin

More information

Amplifiers and Superlatives

Amplifiers and Superlatives Aplifiers and Superlatives An Exaination of Aerican Clais for Iproving Linearity and Efficiency By D. T. N. WILLIAMSON and P. J. WALKE ecent articles, particularly in the United States, have shown that

More information

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument Ross [1],[]) presents the aritrage pricing theory. The idea is that the structure of asset returns leads naturally to a odel of risk preia, for otherwise there would exist an opportunity for aritrage profit.

More information

Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases

Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases Nathanaël Cheriere Departent of Coputer Science ENS Rennes Rennes, France nathanael.cheriere@ens-rennes.fr

More information

Resource Allocation in Wireless Networks with Multiple Relays

Resource Allocation in Wireless Networks with Multiple Relays Resource Allocation in Wireless Networks with Multiple Relays Kağan Bakanoğlu, Stefano Toasin, Elza Erkip Departent of Electrical and Coputer Engineering, Polytechnic Institute of NYU, Brooklyn, NY, 0

More information

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013 Calculation Method for evaluating Solar Assisted Heat Pup Systes in SAP 2009 15 July 2013 Page 1 of 17 1 Introduction This docuent describes how Solar Assisted Heat Pup Systes are recognised in the National

More information

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT-2011-11 Bureaux

More information

The United States was in the midst of a

The United States was in the midst of a A Prier on the Mortgage Market and Mortgage Finance Daniel J. McDonald and Daniel L. Thornton This article is a prier on ortgage finance. It discusses the basics of the ortgage arket and ortgage finance.

More information

Study on the development of statistical data on the European security technological and industrial base

Study on the development of statistical data on the European security technological and industrial base Study on the developent of statistical data on the European security technological and industrial base Security Sector Survey Analysis: France Client: European Coission DG Migration and Hoe Affairs Brussels,

More information

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Physics 211: Lab Oscillations. Simple Harmonic Motion. Physics 11: Lab Oscillations. Siple Haronic Motion. Reading Assignent: Chapter 15 Introduction: As we learned in class, physical systes will undergo an oscillatory otion, when displaced fro a stable equilibriu.

More information

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008 Factored Models for Probabilistic Modal Logic Afsaneh Shirazi and Eyal Air Coputer Science Departent, University of Illinois

More information

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University

More information

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA Audio Engineering Society Convention Paper Presented at the 119th Convention 2005 October 7 10 New York, New York USA This convention paper has been reproduced fro the authors advance anuscript, without

More information

The Application of Bandwidth Optimization Technique in SLA Negotiation Process

The Application of Bandwidth Optimization Technique in SLA Negotiation Process The Application of Bandwidth Optiization Technique in SLA egotiation Process Srecko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia

More information

COMBINING CRASH RECORDER AND PAIRED COMPARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMPACTS WITH SPECIAL REFERENCE TO NECK INJURIES

COMBINING CRASH RECORDER AND PAIRED COMPARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMPACTS WITH SPECIAL REFERENCE TO NECK INJURIES COMBINING CRASH RECORDER AND AIRED COMARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMACTS WITH SECIAL REFERENCE TO NECK INJURIES Anders Kullgren, Maria Krafft Folksa Research, 66 Stockhol,

More information

DISCUSSION PAPER. Is Pay-As-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes? Ian W.H. Parry. April 2005 RFF DP 05-15

DISCUSSION PAPER. Is Pay-As-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes? Ian W.H. Parry. April 2005 RFF DP 05-15 DISCUSSION PAPER April 25 R DP 5-15 Is Pay-As-You-Drive Insurance a Better Way to Reduce Gasoline than Gasoline Taxes? Ian W.H. 1616 P St. NW Washington, DC 236 22-328-5 www.rff.org Is Pay-As-You-Drive

More information

Mathematical Model for Glucose-Insulin Regulatory System of Diabetes Mellitus

Mathematical Model for Glucose-Insulin Regulatory System of Diabetes Mellitus Advances in Applied Matheatical Biosciences. ISSN 8-998 Volue, Nuber (0), pp. 9- International Research Publication House http://www.irphouse.co Matheatical Model for Glucose-Insulin Regulatory Syste of

More information

A Study on the Chain Restaurants Dynamic Negotiation Games of the Optimization of Joint Procurement of Food Materials

A Study on the Chain Restaurants Dynamic Negotiation Games of the Optimization of Joint Procurement of Food Materials International Journal of Coputer Science & Inforation Technology (IJCSIT) Vol 6, No 1, February 2014 A Study on the Chain estaurants Dynaic Negotiation aes of the Optiization of Joint Procureent of Food

More information

A Fast Algorithm for Online Placement and Reorganization of Replicated Data

A Fast Algorithm for Online Placement and Reorganization of Replicated Data A Fast Algorith for Online Placeent and Reorganization of Replicated Data R. J. Honicky Storage Systes Research Center University of California, Santa Cruz Ethan L. Miller Storage Systes Research Center

More information

( C) CLASS 10. TEMPERATURE AND ATOMS

( C) CLASS 10. TEMPERATURE AND ATOMS CLASS 10. EMPERAURE AND AOMS 10.1. INRODUCION Boyle s understanding of the pressure-volue relationship for gases occurred in the late 1600 s. he relationships between volue and teperature, and between

More information

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Airline Yield Manageent with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Integral Developent Corporation, 301 University Avenue, Suite 200, Palo Alto, California 94301 SHALER STIDHAM

More information

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks Protecting Sall Keys in Authentication Protocols for Wireless Sensor Networks Kalvinder Singh Australia Developent Laboratory, IBM and School of Inforation and Counication Technology, Griffith University

More information

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128 ON SELF-ROUTING IN CLOS CONNECTION NETWORKS BARRY G. DOUGLASS Electrical Engineering Departent Texas A&M University College Station, TX 778-8 A. YAVUZ ORUÇ Electrical Engineering Departent and Institute

More information

Pay-As-You-Drive (PAYD): A case study into the safety and accessibility effects of PAYD strategies

Pay-As-You-Drive (PAYD): A case study into the safety and accessibility effects of PAYD strategies J. Zantea, D.H. van Aelsfort, M.C.J. Blieer, P.H.L. Bovy 1 Pay-As-You-Drive (PAYD): A case study into the safety and accessibility effects of PAYD strategies J. Zantea MSc Goudappel Coffeng BV P.O. Box

More information

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance Calculating the Return on nvestent () for DMSMS Manageent Peter Sandborn CALCE, Departent of Mechanical Engineering (31) 45-3167 sandborn@calce.ud.edu www.ene.ud.edu/escml/obsolescence.ht October 28, 21

More information

Standards and Protocols for the Collection and Dissemination of Graduating Student Initial Career Outcomes Information For Undergraduates

Standards and Protocols for the Collection and Dissemination of Graduating Student Initial Career Outcomes Information For Undergraduates National Association of Colleges and Eployers Standards and Protocols for the Collection and Disseination of Graduating Student Initial Career Outcoes Inforation For Undergraduates Developed by the NACE

More information

A Soft Real-time Scheduling Server on the Windows NT

A Soft Real-time Scheduling Server on the Windows NT A Soft Real-tie Scheduling Server on the Windows NT Chih-han Lin, Hao-hua Chu, Klara Nahrstedt Departent of Coputer Science University of Illinois at Urbana Chapaign clin2, h-chu3, klara@cs.uiuc.edu Abstract

More information

Network delay-aware load balancing in selfish and cooperative distributed systems

Network delay-aware load balancing in selfish and cooperative distributed systems Network delay-aware load balancing in selfish and cooperative distributed systes Piotr Skowron Faculty of Matheatics, Inforatics and Mechanics University of Warsaw Eail: p.skowron@iuw.edu.pl Krzysztof

More information