Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture



Similar documents
Compiler-Assisted Binary Parsing

An OS-oriented performance monitoring tool for multicore systems

Achieving QoS in Server Virtualization

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking

Intel Pentium 4 Processor on 90nm Technology

Computer Architecture

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

How Much Power Oversubscription is Safe and Allowed in Data Centers?

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

Practical taint analysis for protecting buggy binaries

Precise and Accurate Processor Simulation

Configuring Memory on the HP Business Desktop dx5150

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Full and Para Virtualization

Cache Capacity and Memory Bandwidth Scaling Limits of Highly Threaded Processors

FACT: a Framework for Adaptive Contention-aware Thread migrations

Multi-Threading Performance on Commodity Multi-Core Processors

secubt : Hacking the Hackers with User-Space Virtualization

System requirements for Autodesk Building Design Suite 2017

Linear-time Modeling of Program Working Set in Shared Cache

Comparison of Intel Single-Core and Intel Dual-Core Processor Performance

Unified Architectural Support for Soft-Error Protection or Software Bug Detection

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality


When Prefetching Works, When It Doesn t, and Why

Benchmarking the Amazon Elastic Compute Cloud (EC2)

Industry First X86-based Single Board Computer JaguarBoard Released

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

Andreas Herrmann. AMD Operating System Research Center

Practical Memory Checking with Dr. Memory

farmerswife Contents Hourline Display Lists 1.1 Server Application 1.2 Client Application farmerswife.com

Performance Monitoring of the Software Frameworks for LHC Experiments

Family 12h AMD Athlon II Processor Product Data Sheet

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

CS222: Systems Programming

GenICam 3.0 Faster, Smaller, 3D

LSN 2 Computer Processors

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Computer Architecture-I

The Bus (PCI and PCI-Express)

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

Grant Management. System Requirements

EEM 486: Computer Architecture. Lecture 4. Performance

HP reference configuration for entry-level SAS Grid Manager solutions

MCCCSim A Highly Configurable Multi Core Cache Contention Simulator

How To Use Softxpand (A Thin Client) On A Pc Or Laptop Or Mac Or Macbook Or Ipad (For A Powerbook)

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

A Performance Counter Architecture for Computing Accurate CPI Components

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Several tips on how to choose a suitable computer

picojava TM : A Hardware Implementation of the Java Virtual Machine

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Summer Student Project Report

Abila Grant Management. System Requirements

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Choosing a Computer for Running SLX, P3D, and P5

Prefetch-Aware Shared-Resource Management for Multi-Core Systems

¹ Autodesk Showcase 2016 and Autodesk ReCap 2016 are not supported in 32-Bit.

SIP Infrastructure Performance Testing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

1 Storage Devices Summary

Building a More Efficient Data Center from Servers to Software. Aman Kansal

x64 Servers: Do you want 64 or 32 bit apps with that server?

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Enabling Technologies for Distributed and Cloud Computing

Instruction Set Architecture (ISA)

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Comparing Free Virtualization Products

The Memory Factor Samsung Green Memory Solutions for energy efficient Systems Ed Hogan 2 /?

Introduction to Microprocessors

Characterizing Processor Thermal Behavior

How to Run the MQX RTOS on Various RAM Memories for i.mx 6SoloX

Sun Microsystems Special Promotions for Education and Research January 9, 2007

How to choose a suitable computer

Introduction to Virtual Machines

Video Conferencing System Requirements

Transcription:

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Dong Ye David Kaeli Northeastern University Joydeep Ray Christophe Harle AMD Inc. IISWC 2006 1

Outline Motivation and background Performance characteristics of CPU2006 integer benchmarks on x86-64 (64-bit mode vs. 32-bit mode) Program characteristics of selected benchmarks CPU2006 vs. CPU2000 IISWC 2006 2

Motivation and background x86-64 architecture brings 64-bit computing to the PC market Need to evaluate whether PC desktop applications can benefit from 64-bit ISA SPEC released its latest CPU suite (CPU2006) last month Want to evaluate how applications have changed when moving from CPU2000 to CPU2006 IISWC 2006 3

x86-64: 64: extends x86 to 64 bits x86-64==x64==(amd64 + EM64T) Fully compatible with existing x86 modes Architectural support for 64 bit virtual address space and 52 bit physical address space 64-bit mode supports flat addressing 64-bit integer operations 16 64-bit GPRs, 16 SSE registers IISWC 2006 4

Evaluation environment Athlon 64 X2 4400+ Rev E (dual-core, 2.2GHz) 2 DIMMs 1GB DDR400 DRAM SUSE Linux 9.3 Pro x86-64 edition, run level 3, selected daemons are disabled (kernel 2.6.11.4) Benchmark bound to run on a single core 64-bit binary run in the 64-bit mode, 32-bit binary run in compatibility mode (referred to as 32-bit mode), both on the same 64-bit OS GCC 4.1.1 (-O2 for perlbench, -O3 for all others) H/W counters used to collect performance data IISWC 2006 5

How much faster is 64-bit mode? Benchmark Language 64-bit vs. 32-bit speedup perlbench C bzip2 C 15.77% C C 3.42% -18.09% -26.35% gobmk C 4.97% hmmer C 34.34% sjeng C 14.21% libquantum C 35.38% h264ref C 35.35% omnetpp C++ -7.83% astar C++ 8.46% xalancbmk C++ -13.65% Average 7.16% IISWC 2006 6

Code size increases in 64-bit mode 90% 70% 50% 30% 10% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk -10% IISWC 2006 7 Percentage

Runtime memory footprint increases in 64-bit mode 95% 75% 55% 35% 15% -5% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 8 Percentage

Dynamic instruction count decreases in 64-bit mode 50% 40% 30% 20% 10% 0% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 9 Percentage

IPC comparison 1.2 1.0 0.8 64-bit 32-bit 0.6 0.4 0.2 0.0 perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 10 IPC

Instruction cache (L1) request rate increases in 64-bit mode 40% 30% 20% 10% 0% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 11 Percentage

Instruction cache miss rate comparison 5.0 4.0 64-bit 32-bit 3.0 2.0 1.0 0.0 perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 12 Icache misses per 1K instr

Data cache (L1) request rate decreases in 64-bit mode 60% 40% 20% 0% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 13 Percentage

Data cache miss rate comparison 100.0 80.0 64-bit 32-bit 60.0 40.0 20.0 0.0 perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 14 Dcache misses per 1K instr

Observations Instruction cache miss rate is very low in both 64-bit and 32-bit modes Data cache request rate decreases significantly in 64-bit mode Extra registers help Data cache miss rate increases in 64-bit mode The increased size of long and pointer data types has an adverse impact on data cache performance A lower instruction count magnifies this IISWC 2006 15

Memory controller utilization ratio comparison 50% 40% 64-bit 32-bit 30% 20% 10% 0% perlbench bzip2 gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk IISWC 2006 16 Percentage

Program characteristics of five selected benchmarks Benchmark Performance Observations Root cause analysis 64-bit: 26.4% slower Memory footprint doubles Larger memory footprint due to extensive uses of longs and pointers xalancbmk 64-bit: 13.7% slower Memory footprint increases by 33.9% Larger memory footprint due to extensive uses of pointers hmmer libquantum h264ref 64-bit: 34.3% faster 64-bit: 35.4% faster 64-bit: 35.4% faster Dynamic instruction count decreases by 8.7% Dynamic instruction count decreases by 54% Dynamic instruction count decreases by 10% More registers available in 64-bit mode Native 64-bit integer arithmetic in 64-bit mode Faster calling convention (mainly because of more registers) in 64-bit mode IISWC 2006 17

CPU2000int: Speedup in 64-bit mode 40% 20% 0% -20% -40% -60% gzip vpr crafty parser eon perlbmk gap vortex bzip2 twolf IISWC 2006 18 Percentage

Conclusions Programs that use features in 64-bit mode (64-bit integer arithmetic and more registers) stand to benefit more from x86-64 Programs that are memory intensive or make heavy use of long and pointer need to carefully evaluated when porting to x86-64 IISWC 2006 19

Disclaimer All performance numbers referred to in this presentation are estimates because they are from a peak-only SPEC CPU2006 run, and hence is not fully compliant with SPEC run rules. It is expected, though not proven, that results from a fully compliant run would be very close. IISWC 2006 20

Thank you and questions? IISWC 2006 21