Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications

Size: px
Start display at page:

Download "Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications"

Transcription

1 Effective Instruction Prefetching in Chip Multiprocessors for Moern Commercial Applications Lawrence Spracklen, Yuan Chou & Santosh G. Abraham International Symposium on High-Performance Computer Architecture Feb 15 th 2005 Lawrence Spracklen Avance Processor Architecture Sun Microsystems

2 Outline Motivation Limit stuy The iscontinuity prefetcher Results Conclusions 2

3 Motivation Cache missing memory accesses frequently ictate application performance Typically think in terms of ata misses ominate for SPEC CPU2000 benchmarks Commercial applications have large instruction working sets Exemplifie by atabases, application servers, an web servers We investigate the performance implications for: A atabase workloa TPC-W SPECjAppServer2002 SPECweb99 3

4 Instruction Miss Rates Commercial applications observe significant stalls ue to both L1 cache an L2 cache instruction misses Instruction misses can be more problematic than either loa or store misses Miss rate (per 100 instructions) KB 4-way L1$ 2MB 4-way L2$ B TPC-W japp Web 0 B TPC-W japp Web 4

5 Chip Multiprocessors Recent years witnesse a significant paraigm shift an the emergence of Chip Multiprocessors (CMPs) CMPs are exemplifie by multiple cores on a single chip Avance CMP cores typically: Have private Level-1 caches (L1$) Share a moest Level-2 cache (L2$) We investigate how CMPs impact commercial workloas Use a next-generation 4-core CMP, with a 2MB share L2$ 5

6 CMP Instruction Miss Rates L1$ miss rates are ientical as cores have private I$s L2$ is share between 4 cores Less cache resources per core Applications experience more frequent cache misses HW prefetchers are even more important! Miss rate (per 100 instructions) Single core 4-core CMP 2MB 4-way L2$ 0 B TPC-W japp Web Mix 6

7 Miss Classification Few processors provie support for instruction prefetching Typically focus on sequential prefetchers Given an access/miss for line L, prefetch line L+1 Irregular control flow in commercial apps means many misses are non-sequential 1.0 Sequential 1.0 Non-sequential Miss Breakown B TPC-W japp Web B TPC-W japp Web Mixe 32KB 4-way L1$ 2MB 4-way L2$ 7

8 Non-Sequential Misses Sequential prefetchers fail to capture up to almost 60% of instruction misses in commercial applications Non-sequential misses are not attributable to a single cause Cause by a variety of control transfer instructions (CTI) Trap Return Jump Call Uncon branch Con branch (nt) Con branch (tb) Con branch (tf) Non-sequential miss breakown B TPC-W japp Web B TPC-W japp Web Mixe 32KB 4-way L1$ 2MB 4-way L2$ 8

9 Potential Performance Improvements Failure to aress non-sequential instruction misses sacrifices significant performance Can ouble performance gain by also targeting non-sequential misses Non-sequential prefetchers must target all main CTI groups Many prior instruction prefetchers only capture a subset of misses Sequential only Branch only Function only Sequential + Branch Sequential + Function Sequential + Branch + Function Potential performance improvement, X Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 9

10 Introucing iscontinuities When a CTI instruction causes a transition to a nonsequential cache line it causes a 'iscontinuity' in the instruction fetch stream # Misses No Prefetch L L+1 L+2 L+4 L+20 L+21 L+23 7 Next-line prefetchers can't capture these transitions # Misses Next-line (on miss) L L+1 L+2 L+4 L+20 L+21 L+23 5 Next-line (tagge) L L+1 L+2 L+4 L+20 L+21 L

11 Capturing iscontinuities Next-N-line sequential prefetchers capture short forwar iscontinuities Where the target lies within the prefetch-ahea istance # Misses Next-4-line (tagge) L L+1 L+2 L+4 L+20 L+21 L+23 2 Next-N-line sequential prefetchers represent a simple, low-cost mechanism to prefetch for small iscontinuities An elegant, CTI inepenent metho for capturing the remaining iscontinuities is require We propose the iscontinuity prefetcher 11

12 The iscontinuity Prefetcher The iscontinuity prefetcher utilizes a history-base preictor to track iscontinuities that incur an L1$ miss Only nee to track large iscontinuities that aren't covere by the next-n-line sequential prefetcher Significantly reuces the size of the require preictor Preictor only nees to cover large iscontinuity => Small Preictor iscontinuity + Next-4-line (tagge) L L+1 L+2 L+4 L+20 L+21 L+23 # Misses 0 Small forwar iscontinuities covere by next-4-line sequential prefetcher 12

13 Prefetcher Implementation Preictor is implemente as a irect-mappe table Inexe by a portion of the aress of the trigger Entry is tagge with a portion of the aress of the trigger Only require one target per entry Request/Miss info Tag Target Core Next-4-line Tag Target Prefetch Queue To L2$ 13

14 Prefetcher Operation Allocation: When a iscontinuity causes a miss, it is inserte into the table L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target 14

15 Prefetcher Operation Allocation: When a iscontinuity causes a miss, it is inserte into the table L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Tag Target Prefetch Queue Tag Target 15

16 Prefetcher Operation Preiction: Preictor is probe by the sequential prefetcher moving ahea of the eman fetch stream If a vali entry is locate a prefetch is issue for the potential target Prefetches are also issue for sequential lines following the target (up to N) L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target Tag Target 16

17 Prefetcher Operation Preiction: Preictor is probe by the sequential prefetcher moving ahea of the eman fetch stream If a vali entry is locate a prefetch is issue for the potential target Prefetches are also issue for sequential lines following the target (up to N) L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target Tag Target 17

18 Methoology Processor overview Processor 4-core CMP 64-entry issue winow 3-wie issue 64K gshare preictor Memory Hierarchy 32KB 4-way 64B I$ an $ (per core) 2MB 4-way 64B L2$ (share between cores) 400-cycle memory latency 20GB/s offchip BW iscontinuity Prefetcher 8192-entry irect-mappe table (per core) Next-4-line sequential prefetcher Compare iscontinuity prefetcher to: Next line (on miss): if line L is a miss, prefetch line L+1 Next line (tagge): if line L is a miss or a previously prefetche line, prefetch line L+1 Next-4-line (tagge): if line L is a miss or a previously prefetche line, prefetch lines L+1, L+2, L+3 an L+4 18

19 Miss Coverage single core Achieve a significant reuction in both the I$ miss rate an the L2$ instruction miss rate 90% of L1$ misses an 85% of L2$ misses eliminate for atabase workloa The iscontinuity prefetcher outperforms the sequential prefetchers Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) KB 4-way L1$ MB 4-way L2$ B TPC-W japp Web B TPC-W japp Web 19

20 Miss Coverage CMP CMP L1$ miss reuction is ientical to the reuctions achieve for a single core Cores have private L1$s L2$ miss rate reuctions similar to single-core reuctions Also manage to eliminate 82% of L2$ instruction misses for the mixe workloa Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) MB 4-way L2$ B TPC-W japp Web Mixe 20

21 Performance Improvements The instruction prefetchers provie significant performance benefits Higher performance benefits observe for the CMP Given the significant reuction in miss rates, greater performance improvements seeme likely Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Performance improvement, X Single core way CMP 1 B TPC-W japp Web 1 B TPC-W japp Web Mixe 21

22 ata Miss Rates L2$ ata miss rates increase significantly when aggressive instruction prefetching was enable Increase in ata misses offsets the benefits from the reuction in instruction misses Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 22

23 L2-Bypass Prefetching Introuce L2-Bypass prefetching Prefetches are initially only installe in the L1$ If the line is utilize uring its resience in the L1$, on eviction, the line is installe in the L2$ Eliminates L2$ pollution by instruction prefetchers Observe full performance benefits of the instruction prefetchers (up to 1.38X) Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Performance improvement, X Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 23

24 Low Cost? A small iscontinuity prefetcher still provies appreciable performance increases 8192-entries 4096-entries 2048-entries 1024-entries 512-entries 256-entries Next-4-lines (tagge) KB 4-way L1$ 1.0 2MB 4-way L2$ (CMP) Miss Coverage B TPC-W japp Web Mixe B TPC-W japp Web Mixe Very little aitional HW cost for the smaller preictors, yet they achieve significant performance gains over sequential prefetchers

25 Relate Art Significant prior work on instruction prefetching in aition to the next-line an next-n-line sequential prefetchers: Target prefetching [Hsu, Smith] Markov prefetching [Joseph, Grunwal] Branch-history guie prefetching [Tyson, Charney, Srinivasan, avison] Call-graph prefetching [avison, Annavaram, Patel] Fetch irecte prefetching [Caler, Reinman, Austin] Wrong-path prefetching [Pierce, Muge] Benefits an rawbacks of these alternative schemes are iscusse in more etail in the paper 25

26 Concluing Remarks Moern commercial applications have high instruction miss rates at both the L1 an L2 levels Effective instruction prefetching is imperative to mitigate the performance losses ue to these misses Necessary to target all types of instruction misses Sequential misses AN non-sequential misses causes by control transfer inst. Propose the iscontinuity prefetcher which reuces the miss rate by ~90% Nee to consier the pollution effects of aggressive prefetchers (especially in CMPs) Accelerate commercial apps by up to 38% using the iscontinuity prefetcher an selective L2$ installation 26

27 Questions? 27

28 Prefetch Accuracy Lower for the more aggressive instruction prefetchers Accuracy of the iscontinuity prefetcher is comparable with the next-4-lines sequential prefetcher Yet the iscontinuity prefetcher achieves superior performance 2-line iscontinuity prefetcher outperforms next-4-lines an has 50% higher accuracy (BW constraine 1.4 systems) Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity iscontinuity (2NL) 4-way CMP 4-way CMP Prefetch Accuracy B TPC-W japp Web Mixe 1 B TPC-W japp Web Mixe 28

29 Prefetching for CMPs Implications for prefetching? Resources per core ecrease Potential for inter-stran pollution increase Chip real-estate available to support HW prefetchers ecrease May require multiple L1$ prefetchers per chip HW prefetchers nee to be effective, accurate an lowcost 29

Improving Emulation Throughput for Multi-Project SoC Designs

Improving Emulation Throughput for Multi-Project SoC Designs Improving Emulation Throhput for Multi-Project SoC Designs By Frank Schirrmeister, Caence Design Systems As esign sizes grow, so, too, oes the verification effort. Inee, verification has become the biggest

More information

Unbalanced Power Flow Analysis in a Micro Grid

Unbalanced Power Flow Analysis in a Micro Grid International Journal of Emerging Technology an Avance Engineering Unbalance Power Flow Analysis in a Micro Gri Thai Hau Vo 1, Mingyu Liao 2, Tianhui Liu 3, Anushree 4, Jayashri Ravishankar 5, Toan Phung

More information

Chip Multithreading: Opportunities and Challenges

Chip Multithreading: Opportunities and Challenges Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA {lawrence.spracklen,santosh.abraham}@sun.com Abstract

More information

Combining Local and Global History for High Performance Data Prefetching

Combining Local and Global History for High Performance Data Prefetching Journal of Instruction-Level Parallelism 13 (2011) 1-14 Submitted 3/10; published 1/11 Combining Local and Global History for High Performance Data Prefetching Martin Dimitrov Department of Electrical

More information

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier. Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core

More information

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO

More information

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications : Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha Scott Mahlke Satish Narayanasamy University of Michigan, Ann Arbor {gauravc, mahlke, nsatish}@umich.edu ABSTRACT Web 2. applications

More information

SHIFT! Shared History Instruction Fetch! for Lean-Core Server Processors" Cansu Kaynak, Boris Grot, Babak Falsafi"

SHIFT! Shared History Instruction Fetch! for Lean-Core Server Processors Cansu Kaynak, Boris Grot, Babak Falsafi SHIFT! Shared History Instruction Fetch! for Lean-Core Server Processors" Cansu Kaynak, Boris Grot, Babak Falsafi" Instruction Fetch Stalls in Servers" Traditional and emerging server apps:" Deep software

More information

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle

More information

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors

SHIFT: Shared History Instruction Fetch for Lean-Core Server Processors : Shared History Instruction Fetch for Lean-Core Server Processors Cansu Kaynak EcoCloud, EPFL Boris Grot * University of Edinburgh Babak Falsafi EcoCloud, EPFL ABSTRACT In server workloads, large instruction

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

Cost Efficient Datacenter Selection for Cloud Services

Cost Efficient Datacenter Selection for Cloud Services Cost Efficient Datacenter Selection for Clou Services Hong u, Baochun Li henryxu, bli@eecg.toronto.eu Department of Electrical an Computer Engineering University of Toronto Abstract Many clou services

More information

Enterprise Applications

Enterprise Applications Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting

More information

Energy Cost Optimization for Geographically Distributed Heterogeneous Data Centers

Energy Cost Optimization for Geographically Distributed Heterogeneous Data Centers Energy Cost Optimization for Geographically Distribute Heterogeneous Data Centers Eric Jonari, Mark A. Oxley, Sueep Pasricha, Anthony A. Maciejewski, Howar Jay Siegel Abstract The proliferation of istribute

More information

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines EUROGRAPHICS 2000 / M. Gross an F.R.A. Hopgoo Volume 19, (2000), Number 3 (Guest Eitors) Unsteay Flow Visualization by Animating Evenly-Space Bruno Jobar an Wilfri Lefer Université u Littoral Côte Opale,

More information

Game Theoretic Modeling of Cooperation among Service Providers in Mobile Cloud Computing Environments

Game Theoretic Modeling of Cooperation among Service Providers in Mobile Cloud Computing Environments 2012 IEEE Wireless Communications an Networking Conference: Services, Applications, an Business Game Theoretic Moeling of Cooperation among Service Proviers in Mobile Clou Computing Environments Dusit

More information

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES 1 st Logistics International Conference Belgrae, Serbia 28-30 November 2013 INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES Goran N. Raoičić * University of Niš, Faculty of Mechanical

More information

Load Balancing for Heterogeneous Web Servers

Load Balancing for Heterogeneous Web Servers Loa Balancing for Heterogeneous Web Servers Aam Piórkowski 1, Aleksaner Kempny 2, Arian Hajuk 1, an Jacek Strzelczyk 1 1 Department of Geoinfomatics an Applie Computer Science, AGH University of Science

More information

A Universal Sensor Control Architecture Considering Robot Dynamics

A Universal Sensor Control Architecture Considering Robot Dynamics International Conference on Multisensor Fusion an Integration for Intelligent Systems (MFI2001) Baen-Baen, Germany, August 2001 A Universal Sensor Control Architecture Consiering Robot Dynamics Frierich

More information

HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT

HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT International Journal of Avance Research in Computer Engineering & Technology (IJARCET) HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT Pawan Kumar, Pijush Kanti Dutta Pramanik Computer Science

More information

On Benchmarking Popular File Systems

On Benchmarking Popular File Systems On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems * GPRS performance estimation in GSM circuit switche serices an GPRS share resource systems * Shaoji i an Sen-Gusta Häggman Helsinki Uniersity of Technology, Institute of Raio ommunications, ommunications

More information

The most common model to support workforce management of telephone call centers is

The most common model to support workforce management of telephone call centers is Designing a Call Center with Impatient Customers O. Garnett A. Manelbaum M. Reiman Davison Faculty of Inustrial Engineering an Management, Technion, Haifa 32000, Israel Davison Faculty of Inustrial Engineering

More information

Modelling and Resolving Software Dependencies

Modelling and Resolving Software Dependencies June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software

More information

! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6

! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6 ! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6 9 Quality signposting : the role of online information prescription in proviing patient information Liz Brewster & Barbara Sen Information School,

More information

A Data Placement Strategy in Scientific Cloud Workflows

A Data Placement Strategy in Scientific Cloud Workflows A Data Placement Strategy in Scientific Clou Workflows Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Faculty of Information an Communication Technologies, Swinburne University of Technology Hawthorn, Melbourne,

More information

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT OPTIMAL INSURANCE COVERAGE UNDER BONUS-MALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,

More information

DACOTA: Post-silicon Validation of the Memory Subsystem in Multi-core Designs. Presenter: Bo Zhang Yulin Shi

DACOTA: Post-silicon Validation of the Memory Subsystem in Multi-core Designs. Presenter: Bo Zhang Yulin Shi DACOTA: Post-silicon Validation of the Memory Subsystem in Multi-core Designs Presenter: Bo Zhang Yulin Shi Outline Motivation & Goal Solution - DACOTA overview Technical Insights Experimental Evaluation

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY Jörg Felhusen an Sivakumara K. Krishnamoorthy RWTH Aachen University, Chair an Insitute for Engineering

More information

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes Proceeings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Moeling an Preicting Popularity Dynamics via Reinforce Poisson Processes Huawei Shen 1, Dashun Wang 2, Chaoming Song 3, Albert-László

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams Forecasting an Staffing Call Centers with Multiple Interepenent Uncertain Arrival Streams Han Ye Department of Statistics an Operations Research, University of North Carolina, Chapel Hill, NC 27599, hanye@email.unc.eu

More information

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Data Center Power System Reliability Beyond the 9 s: A Practical Approach Data Center Power System Reliability Beyon the 9 s: A Practical Approach Bill Brown, P.E., Square D Critical Power Competency Center. Abstract Reliability has always been the focus of mission-critical

More information

Professional Level Options Module, Paper P4(SGP)

Professional Level Options Module, Paper P4(SGP) Answers Professional Level Options Moule, Paper P4(SGP) Avance Financial Management (Singapore) December 2007 Answers Tutorial note: These moel answers are consierably longer an more etaile than woul be

More information

Improving Direct Marketing Profitability with Neural Networks

Improving Direct Marketing Profitability with Neural Networks Volume 9 o.5, September 011 Improving Direct Marketing Profitability with eural etworks Zaiyong Tang Salem State University Salem, MA 01970 ABSTRACT Data mining in irect marketing aims at ientifying the

More information

Product Differentiation for Software-as-a-Service Providers

Product Differentiation for Software-as-a-Service Providers University of Augsburg Prof. Dr. Hans Ulrich Buhl Research Center Finance & Information Management Department of Information Systems Engineering & Financial Management Discussion Paper WI-99 Prouct Differentiation

More information

Concept of Cache in web proxies

Concept of Cache in web proxies Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Testing Database Performance with HelperCore on Multi-Core Processors

Testing Database Performance with HelperCore on Multi-Core Processors Project Report on Testing Database Performance with HelperCore on Multi-Core Processors Submitted by Mayuresh P. Kunjir M.E. (CSA) Mahesh R. Bale M.E. (CSA) Under Guidance of Dr. T. Matthew Jacob Problem

More information

Cache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching

Cache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching Cache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching Josefin Hallberg, Tuva Palm and Mats Brorsson Department of Microelectronics and Information Technology (IMIT)

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

www.opensparc.net Creative Commons Attribution-Share 3.0 United States License

www.opensparc.net Creative Commons Attribution-Share 3.0 United States License OpenSPARC Slide-Cast In 12 Chapters Presented by OpenSPARC designers, developers, and programmers to guide users as they develop their own OpenSPARC designs and to assist professors as they teach the nextavailable

More information

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis.

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis. The higher eucation factor: The role of higher eucation in the hiring an promotion practices in the fire service. By Nick Geis Spring 2012 A paper submitte to the faculty of The University of North Carolina

More information

Enterprise Resource Planning

Enterprise Resource Planning Enterprise Resource Planning MPC 6 th Eition Chapter 1a McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserve. Enterprise Resource Planning A comprehensive software approach

More information

Power-Aware High-Performance Scientific Computing

Power-Aware High-Performance Scientific Computing Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan

More information

Dow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability

Dow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability www.corporate-env-strategy.com Sustainability Inex Dow Jones Sustainability Group Inex: A Global Benchmark for Corporate Sustainability Ivo Knoepfel Increasingly investors are iversifying their portfolios

More information

Operating System Impact on SMT Architecture

Operating System Impact on SMT Architecture Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected Big Picture Processor Interrupts IC220 Set #11: Storage and Cache Memory- bus Main memory 1 Graphics output Network 2 Outline Important but neglected The difficulties in assessing and designing systems

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And

More information

Mini System 101 Our Price: $669

Mini System 101 Our Price: $669 Mini System 101 Our Price: $669 Mini System 102 Our Price: $610 Processor Features 667MHz front side bus, 512KB L2 cache and 1.33GHz processor speed. with 1024 x 600 resolutions delivers intense detail

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Digital barrier option contract with exponential random time

Digital barrier option contract with exponential random time IMA Journal of Applie Mathematics Avance Access publishe June 9, IMA Journal of Applie Mathematics ) Page of 9 oi:.93/imamat/hxs3 Digital barrier option contract with exponential ranom time Doobae Jun

More information

Impact of Java Application Server Evolution on Computer System Performance

Impact of Java Application Server Evolution on Computer System Performance Impact of Java Application Server Evolution on Computer System Performance Peng-fei Chuang, Celal Ozturk, Khun Ban, Huijun Yan, Kingsum Chow, Resit Sendag Intel Corporation; {peng-fei.chuang, khun.ban,

More information

Interference Mitigation Techniques for Spectral Capacity Enhancement in GSM Networks

Interference Mitigation Techniques for Spectral Capacity Enhancement in GSM Networks I.J. Wireless an Microwave Technologies, 04,, 0-49 Publishe Online January 04 in MECS(http://www.mecs-press.net) OI: 0.585/ijwmt.04.0.03 Available online at http://www.mecs-press.net/ijwmt Interference

More information

Different approaches for the equalization of automotive sound systems

Different approaches for the equalization of automotive sound systems Auio Engineering Society Convention Paper Presente at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reprouce from the author's avance manuscript, without eiting, corrections,

More information

Accelerating Microsoft Exchange Servers with I/O Caching

Accelerating Microsoft Exchange Servers with I/O Caching Accelerating Microsoft Exchange Servers with I/O Caching QLogic FabricCache Caching Technology Designed for High-Performance Microsoft Exchange Servers Key Findings The QLogic FabricCache 10000 Series

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence

Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence Seeing the Unseen: Revealing Mobile Malware Hien Communications via Energy Consumption an Artificial Intelligence Luca Caviglione, Mauro Gaggero, Jean-François Lalane, Wojciech Mazurczyk, Marcin Urbanski

More information

DECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES

DECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES DECISION SUPPORT SYSTEM OR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES Svetlana Vinnik 1, Marc H. Scholl 2 Abstract Decision-making in the fiel of acaemic planning involves extensive analysis

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

Precise and Accurate Processor Simulation

Precise and Accurate Processor Simulation Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin Madison http://www.ece.wisc.edu/~pharm Performance Modeling Analytical

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

Firewall Design: Consistency, Completeness, and Compactness

Firewall Design: Consistency, Completeness, and Compactness C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore

More information

Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems

Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems Mariko Sakamoto, Akira Katsuno, Aiichiro Inoue, Takeo Asakawa, Haruhiko Ueno, Kuniki Morita, and Yasunori

More information

11 CHAPTER 11: FOOTINGS

11 CHAPTER 11: FOOTINGS CHAPTER ELEVEN FOOTINGS 1 11 CHAPTER 11: FOOTINGS 11.1 Introuction Footings are structural elements that transmit column or wall loas to the unerlying soil below the structure. Footings are esigne to transmit

More information

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2 Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...

More information

Safety Management System. Initial Revision Date: Version Revision No. 02 MANUAL LIFTING

Safety Management System. Initial Revision Date: Version Revision No. 02 MANUAL LIFTING Revision Preparation: Safety Mgr Authority: Presient Issuing Dept: Safety Page: Page 1 of 11 Purpose is committe to proviing a safe an healthy working environment for all employees. Musculoskeletal isorers

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Achieving quality audio testing for mobile phones

Achieving quality audio testing for mobile phones Test & Measurement Achieving quality auio testing for mobile phones The auio capabilities of a cellular hanset provie the funamental interface between the user an the raio transceiver. Just as RF testing

More information

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright

More information

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign

More information

An Alternative Approach of Operating a Passive RFID Device Embedded on Metallic Implants

An Alternative Approach of Operating a Passive RFID Device Embedded on Metallic Implants An Alternative Approach of Operating a Passive RFID Device Embee on Metallic Implants Xiaoyu Liu, Ravi Yalamanchili, Ajay Ogirala an Marlin Mickle RFID Center of Excellence, Department of Electrical an

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations This page may be remove to conceal the ientities of the authors An intertemporal moel of the real exchange rate, stock market, an international ebt ynamics: policy simulations Saziye Gazioglu an W. Davi

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

Categories and Subject Descriptors C.1.1 [Processor Architecture]: Single Data Stream Architectures. General Terms Performance, Design.

Categories and Subject Descriptors C.1.1 [Processor Architecture]: Single Data Stream Architectures. General Terms Performance, Design. Enhancing Memory Level Parallelism via Recovery-Free Value Prediction Huiyang Zhou Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University 1-919-513-2014 {hzhou,

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

TaP: Table-based Prefetching for Storage Caches

TaP: Table-based Prefetching for Storage Caches : Table-based Prefetching for Storage Caches Mingju Li University of New Hampshire mingjul@cs.unh.edu Swapnil Bhatia University of New Hampshire sbhatia@cs.unh.edu Elizabeth Varki University of New Hampshire

More information

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program

More information

InfoScale Storage & Media Server Workloads

InfoScale Storage & Media Server Workloads InfoScale Storage & Media Server Workloads Maximise Performance when Storing and Retrieving Large Amounts of Unstructured Data Carlos Carrero Colin Eldridge Shrinivas Chandukar 1 Table of Contents 01 Introduction

More information

Optimizing Multiple Stock Trading Rules using Genetic Algorithms

Optimizing Multiple Stock Trading Rules using Genetic Algorithms Optimizing Multiple Stock Traing Rules using Genetic Algorithms Ariano Simões, Rui Neves, Nuno Horta Instituto as Telecomunicações, Instituto Superior Técnico Av. Rovisco Pais, 040-00 Lisboa, Portugal.

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

A New Evaluation Measure for Information Retrieval Systems

A New Evaluation Measure for Information Retrieval Systems A New Evaluation Measure for Information Retrieval Systems Martin Mehlitz martin.mehlitz@ai-labor.e Christian Bauckhage Deutsche Telekom Laboratories christian.bauckhage@telekom.e Jérôme Kunegis jerome.kunegis@ai-labor.e

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Computer Architecture-I

Computer Architecture-I Computer Architecture-I 1. Die Yield is given by the formula, Assignment 1 Solution Die Yield = Wafer Yield x (1 + (Defects per unit area x Die Area)/a) -a Let us assume a wafer yield of 100% and a 4 for

More information

Exploratory Optimal Latin Hypercube Designs for Computer Simulated Experiments

Exploratory Optimal Latin Hypercube Designs for Computer Simulated Experiments Thailan Statistician July 0; 9() : 7-93 http://statassoc.or.th Contribute paper Exploratory Optimal Latin Hypercube Designs for Computer Simulate Experiments Rachaaporn Timun [a,b] Anamai Na-uom* [a,b]

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

Software Diversity for Information Security

Software Diversity for Information Security for Information Security Pei-yu Chen, Gaurav Kataria an Ramayya Krishnan,3 Heinz School, Tepper School an 3 Cylab Carnegie Mellon University Abstract: In this paper we analyze a software iversification-base

More information

Cross-Over Analysis Using T-Tests

Cross-Over Analysis Using T-Tests Chapter 35 Cross-Over Analysis Using -ests Introuction his proceure analyzes ata from a two-treatment, two-perio (x) cross-over esign. he response is assume to be a continuous ranom variable that follows

More information

A Theory of Exchange Rates and the Term Structure of Interest Rates

A Theory of Exchange Rates and the Term Structure of Interest Rates Review of Development Economics, 17(1), 74 87, 013 DOI:10.1111/roe.1016 A Theory of Exchange Rates an the Term Structure of Interest Rates Hyoung-Seok Lim an Masao Ogaki* Abstract This paper efines the

More information