Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications
|
|
- Aron Harris
- 7 years ago
- Views:
Transcription
1 Effective Instruction Prefetching in Chip Multiprocessors for Moern Commercial Applications Lawrence Spracklen, Yuan Chou & Santosh G. Abraham International Symposium on High-Performance Computer Architecture Feb 15 th 2005 Lawrence Spracklen Avance Processor Architecture Sun Microsystems
2 Outline Motivation Limit stuy The iscontinuity prefetcher Results Conclusions 2
3 Motivation Cache missing memory accesses frequently ictate application performance Typically think in terms of ata misses ominate for SPEC CPU2000 benchmarks Commercial applications have large instruction working sets Exemplifie by atabases, application servers, an web servers We investigate the performance implications for: A atabase workloa TPC-W SPECjAppServer2002 SPECweb99 3
4 Instruction Miss Rates Commercial applications observe significant stalls ue to both L1 cache an L2 cache instruction misses Instruction misses can be more problematic than either loa or store misses Miss rate (per 100 instructions) KB 4-way L1$ 2MB 4-way L2$ B TPC-W japp Web 0 B TPC-W japp Web 4
5 Chip Multiprocessors Recent years witnesse a significant paraigm shift an the emergence of Chip Multiprocessors (CMPs) CMPs are exemplifie by multiple cores on a single chip Avance CMP cores typically: Have private Level-1 caches (L1$) Share a moest Level-2 cache (L2$) We investigate how CMPs impact commercial workloas Use a next-generation 4-core CMP, with a 2MB share L2$ 5
6 CMP Instruction Miss Rates L1$ miss rates are ientical as cores have private I$s L2$ is share between 4 cores Less cache resources per core Applications experience more frequent cache misses HW prefetchers are even more important! Miss rate (per 100 instructions) Single core 4-core CMP 2MB 4-way L2$ 0 B TPC-W japp Web Mix 6
7 Miss Classification Few processors provie support for instruction prefetching Typically focus on sequential prefetchers Given an access/miss for line L, prefetch line L+1 Irregular control flow in commercial apps means many misses are non-sequential 1.0 Sequential 1.0 Non-sequential Miss Breakown B TPC-W japp Web B TPC-W japp Web Mixe 32KB 4-way L1$ 2MB 4-way L2$ 7
8 Non-Sequential Misses Sequential prefetchers fail to capture up to almost 60% of instruction misses in commercial applications Non-sequential misses are not attributable to a single cause Cause by a variety of control transfer instructions (CTI) Trap Return Jump Call Uncon branch Con branch (nt) Con branch (tb) Con branch (tf) Non-sequential miss breakown B TPC-W japp Web B TPC-W japp Web Mixe 32KB 4-way L1$ 2MB 4-way L2$ 8
9 Potential Performance Improvements Failure to aress non-sequential instruction misses sacrifices significant performance Can ouble performance gain by also targeting non-sequential misses Non-sequential prefetchers must target all main CTI groups Many prior instruction prefetchers only capture a subset of misses Sequential only Branch only Function only Sequential + Branch Sequential + Function Sequential + Branch + Function Potential performance improvement, X Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 9
10 Introucing iscontinuities When a CTI instruction causes a transition to a nonsequential cache line it causes a 'iscontinuity' in the instruction fetch stream # Misses No Prefetch L L+1 L+2 L+4 L+20 L+21 L+23 7 Next-line prefetchers can't capture these transitions # Misses Next-line (on miss) L L+1 L+2 L+4 L+20 L+21 L+23 5 Next-line (tagge) L L+1 L+2 L+4 L+20 L+21 L
11 Capturing iscontinuities Next-N-line sequential prefetchers capture short forwar iscontinuities Where the target lies within the prefetch-ahea istance # Misses Next-4-line (tagge) L L+1 L+2 L+4 L+20 L+21 L+23 2 Next-N-line sequential prefetchers represent a simple, low-cost mechanism to prefetch for small iscontinuities An elegant, CTI inepenent metho for capturing the remaining iscontinuities is require We propose the iscontinuity prefetcher 11
12 The iscontinuity Prefetcher The iscontinuity prefetcher utilizes a history-base preictor to track iscontinuities that incur an L1$ miss Only nee to track large iscontinuities that aren't covere by the next-n-line sequential prefetcher Significantly reuces the size of the require preictor Preictor only nees to cover large iscontinuity => Small Preictor iscontinuity + Next-4-line (tagge) L L+1 L+2 L+4 L+20 L+21 L+23 # Misses 0 Small forwar iscontinuities covere by next-4-line sequential prefetcher 12
13 Prefetcher Implementation Preictor is implemente as a irect-mappe table Inexe by a portion of the aress of the trigger Entry is tagge with a portion of the aress of the trigger Only require one target per entry Request/Miss info Tag Target Core Next-4-line Tag Target Prefetch Queue To L2$ 13
14 Prefetcher Operation Allocation: When a iscontinuity causes a miss, it is inserte into the table L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target 14
15 Prefetcher Operation Allocation: When a iscontinuity causes a miss, it is inserte into the table L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Tag Target Prefetch Queue Tag Target 15
16 Prefetcher Operation Preiction: Preictor is probe by the sequential prefetcher moving ahea of the eman fetch stream If a vali entry is locate a prefetch is issue for the potential target Prefetches are also issue for sequential lines following the target (up to N) L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target Tag Target 16
17 Prefetcher Operation Preiction: Preictor is probe by the sequential prefetcher moving ahea of the eman fetch stream If a vali entry is locate a prefetch is issue for the potential target Prefetches are also issue for sequential lines following the target (up to N) L-128 L L+1 L+2 L+4 L+20 L+21 L+23 Prefetch Queue Tag Target Tag Target 17
18 Methoology Processor overview Processor 4-core CMP 64-entry issue winow 3-wie issue 64K gshare preictor Memory Hierarchy 32KB 4-way 64B I$ an $ (per core) 2MB 4-way 64B L2$ (share between cores) 400-cycle memory latency 20GB/s offchip BW iscontinuity Prefetcher 8192-entry irect-mappe table (per core) Next-4-line sequential prefetcher Compare iscontinuity prefetcher to: Next line (on miss): if line L is a miss, prefetch line L+1 Next line (tagge): if line L is a miss or a previously prefetche line, prefetch line L+1 Next-4-line (tagge): if line L is a miss or a previously prefetche line, prefetch lines L+1, L+2, L+3 an L+4 18
19 Miss Coverage single core Achieve a significant reuction in both the I$ miss rate an the L2$ instruction miss rate 90% of L1$ misses an 85% of L2$ misses eliminate for atabase workloa The iscontinuity prefetcher outperforms the sequential prefetchers Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) KB 4-way L1$ MB 4-way L2$ B TPC-W japp Web B TPC-W japp Web 19
20 Miss Coverage CMP CMP L1$ miss reuction is ientical to the reuctions achieve for a single core Cores have private L1$s L2$ miss rate reuctions similar to single-core reuctions Also manage to eliminate 82% of L2$ instruction misses for the mixe workloa Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) MB 4-way L2$ B TPC-W japp Web Mixe 20
21 Performance Improvements The instruction prefetchers provie significant performance benefits Higher performance benefits observe for the CMP Given the significant reuction in miss rates, greater performance improvements seeme likely Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Performance improvement, X Single core way CMP 1 B TPC-W japp Web 1 B TPC-W japp Web Mixe 21
22 ata Miss Rates L2$ ata miss rates increase significantly when aggressive instruction prefetching was enable Increase in ata misses offsets the benefits from the reuction in instruction misses Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Miss rate (normalize to no prefetch) Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 22
23 L2-Bypass Prefetching Introuce L2-Bypass prefetching Prefetches are initially only installe in the L1$ If the line is utilize uring its resience in the L1$, on eviction, the line is installe in the L2$ Eliminates L2$ pollution by instruction prefetchers Observe full performance benefits of the instruction prefetchers (up to 1.38X) Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity Performance improvement, X Single core B TPC-W japp Web way CMP B TPC-W japp Web Mixe 23
24 Low Cost? A small iscontinuity prefetcher still provies appreciable performance increases 8192-entries 4096-entries 2048-entries 1024-entries 512-entries 256-entries Next-4-lines (tagge) KB 4-way L1$ 1.0 2MB 4-way L2$ (CMP) Miss Coverage B TPC-W japp Web Mixe B TPC-W japp Web Mixe Very little aitional HW cost for the smaller preictors, yet they achieve significant performance gains over sequential prefetchers
25 Relate Art Significant prior work on instruction prefetching in aition to the next-line an next-n-line sequential prefetchers: Target prefetching [Hsu, Smith] Markov prefetching [Joseph, Grunwal] Branch-history guie prefetching [Tyson, Charney, Srinivasan, avison] Call-graph prefetching [avison, Annavaram, Patel] Fetch irecte prefetching [Caler, Reinman, Austin] Wrong-path prefetching [Pierce, Muge] Benefits an rawbacks of these alternative schemes are iscusse in more etail in the paper 25
26 Concluing Remarks Moern commercial applications have high instruction miss rates at both the L1 an L2 levels Effective instruction prefetching is imperative to mitigate the performance losses ue to these misses Necessary to target all types of instruction misses Sequential misses AN non-sequential misses causes by control transfer inst. Propose the iscontinuity prefetcher which reuces the miss rate by ~90% Nee to consier the pollution effects of aggressive prefetchers (especially in CMPs) Accelerate commercial apps by up to 38% using the iscontinuity prefetcher an selective L2$ installation 26
27 Questions? 27
28 Prefetch Accuracy Lower for the more aggressive instruction prefetchers Accuracy of the iscontinuity prefetcher is comparable with the next-4-lines sequential prefetcher Yet the iscontinuity prefetcher achieves superior performance 2-line iscontinuity prefetcher outperforms next-4-lines an has 50% higher accuracy (BW constraine 1.4 systems) Next-line (on miss) Next-line (tagge) Next-4-line (tagge) iscontinuity iscontinuity (2NL) 4-way CMP 4-way CMP Prefetch Accuracy B TPC-W japp Web Mixe 1 B TPC-W japp Web Mixe 28
29 Prefetching for CMPs Implications for prefetching? Resources per core ecrease Potential for inter-stran pollution increase Chip real-estate available to support HW prefetchers ecrease May require multiple L1$ prefetchers per chip HW prefetchers nee to be effective, accurate an lowcost 29
Improving Emulation Throughput for Multi-Project SoC Designs
Improving Emulation Throhput for Multi-Project SoC Designs By Frank Schirrmeister, Caence Design Systems As esign sizes grow, so, too, oes the verification effort. Inee, verification has become the biggest
More informationUnbalanced Power Flow Analysis in a Micro Grid
International Journal of Emerging Technology an Avance Engineering Unbalance Power Flow Analysis in a Micro Gri Thai Hau Vo 1, Mingyu Liao 2, Tianhui Liu 3, Anushree 4, Jayashri Ravishankar 5, Toan Phung
More informationChip Multithreading: Opportunities and Challenges
Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA {lawrence.spracklen,santosh.abraham}@sun.com Abstract
More informationCombining Local and Global History for High Performance Data Prefetching
Journal of Instruction-Level Parallelism 13 (2011) 1-14 Submitted 3/10; published 1/11 Combining Local and Global History for High Performance Data Prefetching Martin Dimitrov Department of Electrical
More informationParallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.
Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core
More informationRUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO
More informationEFetch: Optimizing Instruction Fetch for Event-Driven Web Applications
: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha Scott Mahlke Satish Narayanasamy University of Michigan, Ann Arbor {gauravc, mahlke, nsatish}@umich.edu ABSTRACT Web 2. applications
More informationSHIFT! Shared History Instruction Fetch! for Lean-Core Server Processors" Cansu Kaynak, Boris Grot, Babak Falsafi"
SHIFT! Shared History Instruction Fetch! for Lean-Core Server Processors" Cansu Kaynak, Boris Grot, Babak Falsafi" Instruction Fetch Stalls in Servers" Traditional and emerging server apps:" Deep software
More information<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing
T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle
More informationSHIFT: Shared History Instruction Fetch for Lean-Core Server Processors
: Shared History Instruction Fetch for Lean-Core Server Processors Cansu Kaynak EcoCloud, EPFL Boris Grot * University of Edinburgh Babak Falsafi EcoCloud, EPFL ABSTRACT In server workloads, large instruction
More informationIntroduction to Microprocessors
Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?
More informationCost Efficient Datacenter Selection for Cloud Services
Cost Efficient Datacenter Selection for Clou Services Hong u, Baochun Li henryxu, bli@eecg.toronto.eu Department of Electrical an Computer Engineering University of Toronto Abstract Many clou services
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationEnergy Cost Optimization for Geographically Distributed Heterogeneous Data Centers
Energy Cost Optimization for Geographically Distribute Heterogeneous Data Centers Eric Jonari, Mark A. Oxley, Sueep Pasricha, Anthony A. Maciejewski, Howar Jay Siegel Abstract The proliferation of istribute
More informationUnsteady Flow Visualization by Animating Evenly-Spaced Streamlines
EUROGRAPHICS 2000 / M. Gross an F.R.A. Hopgoo Volume 19, (2000), Number 3 (Guest Eitors) Unsteay Flow Visualization by Animating Evenly-Space Bruno Jobar an Wilfri Lefer Université u Littoral Côte Opale,
More informationGame Theoretic Modeling of Cooperation among Service Providers in Mobile Cloud Computing Environments
2012 IEEE Wireless Communications an Networking Conference: Services, Applications, an Business Game Theoretic Moeling of Cooperation among Service Proviers in Mobile Clou Computing Environments Dusit
More informationINFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES
1 st Logistics International Conference Belgrae, Serbia 28-30 November 2013 INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES Goran N. Raoičić * University of Niš, Faculty of Mechanical
More informationLoad Balancing for Heterogeneous Web Servers
Loa Balancing for Heterogeneous Web Servers Aam Piórkowski 1, Aleksaner Kempny 2, Arian Hajuk 1, an Jacek Strzelczyk 1 1 Department of Geoinfomatics an Applie Computer Science, AGH University of Science
More informationA Universal Sensor Control Architecture Considering Robot Dynamics
International Conference on Multisensor Fusion an Integration for Intelligent Systems (MFI2001) Baen-Baen, Germany, August 2001 A Universal Sensor Control Architecture Consiering Robot Dynamics Frierich
More informationHOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT
International Journal of Avance Research in Computer Engineering & Technology (IJARCET) HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT Pawan Kumar, Pijush Kanti Dutta Pramanik Computer Science
More informationOn Benchmarking Popular File Systems
On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,
More informationPutting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719
Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code
More informationGPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *
GPRS performance estimation in GSM circuit switche serices an GPRS share resource systems * Shaoji i an Sen-Gusta Häggman Helsinki Uniersity of Technology, Institute of Raio ommunications, ommunications
More informationThe most common model to support workforce management of telephone call centers is
Designing a Call Center with Impatient Customers O. Garnett A. Manelbaum M. Reiman Davison Faculty of Inustrial Engineering an Management, Technion, Haifa 32000, Israel Davison Faculty of Inustrial Engineering
More informationModelling and Resolving Software Dependencies
June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software
More information! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6
! # % & ( ) +,,),. / 0 1 2 % ( 345 6, & 7 8 4 8 & & &&3 6 9 Quality signposting : the role of online information prescription in proviing patient information Liz Brewster & Barbara Sen Information School,
More informationA Data Placement Strategy in Scientific Cloud Workflows
A Data Placement Strategy in Scientific Clou Workflows Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Faculty of Information an Communication Technologies, Swinburne University of Technology Hawthorn, Melbourne,
More informationJON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT
OPTIMAL INSURANCE COVERAGE UNDER BONUS-MALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,
More informationDACOTA: Post-silicon Validation of the Memory Subsystem in Multi-core Designs. Presenter: Bo Zhang Yulin Shi
DACOTA: Post-silicon Validation of the Memory Subsystem in Multi-core Designs Presenter: Bo Zhang Yulin Shi Outline Motivation & Goal Solution - DACOTA overview Technical Insights Experimental Evaluation
More informationPerformance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09
Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,
More informationIntel Pentium 4 Processor on 90nm Technology
Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended
More informationFAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY
FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY Jörg Felhusen an Sivakumara K. Krishnamoorthy RWTH Aachen University, Chair an Insitute for Engineering
More informationModeling and Predicting Popularity Dynamics via Reinforced Poisson Processes
Proceeings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Moeling an Preicting Popularity Dynamics via Reinforce Poisson Processes Huawei Shen 1, Dashun Wang 2, Chaoming Song 3, Albert-László
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationExploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager
Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationForecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams
Forecasting an Staffing Call Centers with Multiple Interepenent Uncertain Arrival Streams Han Ye Department of Statistics an Operations Research, University of North Carolina, Chapel Hill, NC 27599, hanye@email.unc.eu
More informationData Center Power System Reliability Beyond the 9 s: A Practical Approach
Data Center Power System Reliability Beyon the 9 s: A Practical Approach Bill Brown, P.E., Square D Critical Power Competency Center. Abstract Reliability has always been the focus of mission-critical
More informationProfessional Level Options Module, Paper P4(SGP)
Answers Professional Level Options Moule, Paper P4(SGP) Avance Financial Management (Singapore) December 2007 Answers Tutorial note: These moel answers are consierably longer an more etaile than woul be
More informationImproving Direct Marketing Profitability with Neural Networks
Volume 9 o.5, September 011 Improving Direct Marketing Profitability with eural etworks Zaiyong Tang Salem State University Salem, MA 01970 ABSTRACT Data mining in irect marketing aims at ientifying the
More informationProduct Differentiation for Software-as-a-Service Providers
University of Augsburg Prof. Dr. Hans Ulrich Buhl Research Center Finance & Information Management Department of Information Systems Engineering & Financial Management Discussion Paper WI-99 Prouct Differentiation
More informationConcept of Cache in web proxies
Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationTesting Database Performance with HelperCore on Multi-Core Processors
Project Report on Testing Database Performance with HelperCore on Multi-Core Processors Submitted by Mayuresh P. Kunjir M.E. (CSA) Mahesh R. Bale M.E. (CSA) Under Guidance of Dr. T. Matthew Jacob Problem
More informationCache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching
Cache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching Josefin Hallberg, Tuva Palm and Mats Brorsson Department of Microelectronics and Information Technology (IMIT)
More informationBEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
More informationwww.opensparc.net Creative Commons Attribution-Share 3.0 United States License
OpenSPARC Slide-Cast In 12 Chapters Presented by OpenSPARC designers, developers, and programmers to guide users as they develop their own OpenSPARC designs and to assist professors as they teach the nextavailable
More informationThe higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis.
The higher eucation factor: The role of higher eucation in the hiring an promotion practices in the fire service. By Nick Geis Spring 2012 A paper submitte to the faculty of The University of North Carolina
More informationEnterprise Resource Planning
Enterprise Resource Planning MPC 6 th Eition Chapter 1a McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserve. Enterprise Resource Planning A comprehensive software approach
More informationPower-Aware High-Performance Scientific Computing
Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan
More informationDow Jones Sustainability Group Index: A Global Benchmark for Corporate Sustainability
www.corporate-env-strategy.com Sustainability Inex Dow Jones Sustainability Group Inex: A Global Benchmark for Corporate Sustainability Ivo Knoepfel Increasingly investors are iversifying their portfolios
More informationOperating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationBig Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected
Big Picture Processor Interrupts IC220 Set #11: Storage and Cache Memory- bus Main memory 1 Graphics output Network 2 Outline Important but neglected The difficulties in assessing and designing systems
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationThe Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy
The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And
More informationMini System 101 Our Price: $669
Mini System 101 Our Price: $669 Mini System 102 Our Price: $610 Processor Features 667MHz front side bus, 512KB L2 cache and 1.33GHz processor speed. with 1024 x 600 resolutions delivers intense detail
More informationGPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
More informationDigital barrier option contract with exponential random time
IMA Journal of Applie Mathematics Avance Access publishe June 9, IMA Journal of Applie Mathematics ) Page of 9 oi:.93/imamat/hxs3 Digital barrier option contract with exponential ranom time Doobae Jun
More informationImpact of Java Application Server Evolution on Computer System Performance
Impact of Java Application Server Evolution on Computer System Performance Peng-fei Chuang, Celal Ozturk, Khun Ban, Huijun Yan, Kingsum Chow, Resit Sendag Intel Corporation; {peng-fei.chuang, khun.ban,
More informationInterference Mitigation Techniques for Spectral Capacity Enhancement in GSM Networks
I.J. Wireless an Microwave Technologies, 04,, 0-49 Publishe Online January 04 in MECS(http://www.mecs-press.net) OI: 0.585/ijwmt.04.0.03 Available online at http://www.mecs-press.net/ijwmt Interference
More informationDifferent approaches for the equalization of automotive sound systems
Auio Engineering Society Convention Paper Presente at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reprouce from the author's avance manuscript, without eiting, corrections,
More informationAccelerating Microsoft Exchange Servers with I/O Caching
Accelerating Microsoft Exchange Servers with I/O Caching QLogic FabricCache Caching Technology Designed for High-Performance Microsoft Exchange Servers Key Findings The QLogic FabricCache 10000 Series
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationSeeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence
Seeing the Unseen: Revealing Mobile Malware Hien Communications via Energy Consumption an Artificial Intelligence Luca Caviglione, Mauro Gaggero, Jean-François Lalane, Wojciech Mazurczyk, Marcin Urbanski
More informationDECISION SUPPORT SYSTEM FOR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES
DECISION SUPPORT SYSTEM OR MANAGING EDUCATIONAL CAPACITY UTILIZATION IN UNIVERSITIES Svetlana Vinnik 1, Marc H. Scholl 2 Abstract Decision-making in the fiel of acaemic planning involves extensive analysis
More informationArchitecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
More informationPrecise and Accurate Processor Simulation
Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin Madison http://www.ece.wisc.edu/~pharm Performance Modeling Analytical
More informationThread level parallelism
Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process
More informationFirewall Design: Consistency, Completeness, and Compactness
C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,
More informationOn Adaboost and Optimal Betting Strategies
On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore
More informationMicroarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems
Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems Mariko Sakamoto, Akira Katsuno, Aiichiro Inoue, Takeo Asakawa, Haruhiko Ueno, Kuniki Morita, and Yasunori
More information11 CHAPTER 11: FOOTINGS
CHAPTER ELEVEN FOOTINGS 1 11 CHAPTER 11: FOOTINGS 11.1 Introuction Footings are structural elements that transmit column or wall loas to the unerlying soil below the structure. Footings are esigne to transmit
More informationUsing Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
More informationSafety Management System. Initial Revision Date: Version Revision No. 02 MANUAL LIFTING
Revision Preparation: Safety Mgr Authority: Presient Issuing Dept: Safety Page: Page 1 of 11 Purpose is committe to proviing a safe an healthy working environment for all employees. Musculoskeletal isorers
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationAchieving quality audio testing for mobile phones
Test & Measurement Achieving quality auio testing for mobile phones The auio capabilities of a cellular hanset provie the funamental interface between the user an the raio transceiver. Just as RF testing
More informationSPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers
X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright
More informationMemory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality
Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign
More informationAn Alternative Approach of Operating a Passive RFID Device Embedded on Metallic Implants
An Alternative Approach of Operating a Passive RFID Device Embee on Metallic Implants Xiaoyu Liu, Ravi Yalamanchili, Ajay Ogirala an Marlin Mickle RFID Center of Excellence, Department of Electrical an
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationAn intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations
This page may be remove to conceal the ientities of the authors An intertemporal moel of the real exchange rate, stock market, an international ebt ynamics: policy simulations Saziye Gazioglu an W. Davi
More informationA Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
More informationCategories and Subject Descriptors C.1.1 [Processor Architecture]: Single Data Stream Architectures. General Terms Performance, Design.
Enhancing Memory Level Parallelism via Recovery-Free Value Prediction Huiyang Zhou Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University 1-919-513-2014 {hzhou,
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationTaP: Table-based Prefetching for Storage Caches
: Table-based Prefetching for Storage Caches Mingju Li University of New Hampshire mingjul@cs.unh.edu Swapnil Bhatia University of New Hampshire sbhatia@cs.unh.edu Elizabeth Varki University of New Hampshire
More informationOperating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program
More informationInfoScale Storage & Media Server Workloads
InfoScale Storage & Media Server Workloads Maximise Performance when Storing and Retrieving Large Amounts of Unstructured Data Carlos Carrero Colin Eldridge Shrinivas Chandukar 1 Table of Contents 01 Introduction
More informationOptimizing Multiple Stock Trading Rules using Genetic Algorithms
Optimizing Multiple Stock Traing Rules using Genetic Algorithms Ariano Simões, Rui Neves, Nuno Horta Instituto as Telecomunicações, Instituto Superior Técnico Av. Rovisco Pais, 040-00 Lisboa, Portugal.
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
More informationA New Evaluation Measure for Information Retrieval Systems
A New Evaluation Measure for Information Retrieval Systems Martin Mehlitz martin.mehlitz@ai-labor.e Christian Bauckhage Deutsche Telekom Laboratories christian.bauckhage@telekom.e Jérôme Kunegis jerome.kunegis@ai-labor.e
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationComputer Architecture-I
Computer Architecture-I 1. Die Yield is given by the formula, Assignment 1 Solution Die Yield = Wafer Yield x (1 + (Defects per unit area x Die Area)/a) -a Let us assume a wafer yield of 100% and a 4 for
More informationExploratory Optimal Latin Hypercube Designs for Computer Simulated Experiments
Thailan Statistician July 0; 9() : 7-93 http://statassoc.or.th Contribute paper Exploratory Optimal Latin Hypercube Designs for Computer Simulate Experiments Rachaaporn Timun [a,b] Anamai Na-uom* [a,b]
More informationOracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011
Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB
More informationSoftware Diversity for Information Security
for Information Security Pei-yu Chen, Gaurav Kataria an Ramayya Krishnan,3 Heinz School, Tepper School an 3 Cylab Carnegie Mellon University Abstract: In this paper we analyze a software iversification-base
More informationCross-Over Analysis Using T-Tests
Chapter 35 Cross-Over Analysis Using -ests Introuction his proceure analyzes ata from a two-treatment, two-perio (x) cross-over esign. he response is assume to be a continuous ranom variable that follows
More informationA Theory of Exchange Rates and the Term Structure of Interest Rates
Review of Development Economics, 17(1), 74 87, 013 DOI:10.1111/roe.1016 A Theory of Exchange Rates an the Term Structure of Interest Rates Hyoung-Seok Lim an Masao Ogaki* Abstract This paper efines the
More information