Computer Architecture-I



Similar documents
on an system with an infinite number of processors. Calculate the speedup of

CS 6290 I/O and Storage. Milos Prvulovic

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

PassMark - CPU Mark Multiple CPU Systems - Updated 17th of July 2012

Generations of the computer. processors.

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Home Software Hardware Benchmarks Services Store Support Forums About Us. CPU Mark Price Performance (Click to select desired chart)

Overview. CPU Manufacturers. Current Intel and AMD Offerings

1 Storage Devices Summary

CPU Benchmarks Over 600,000 CPUs Benchmarked

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Performance evaluation

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Price/performance Modern Memory Hierarchy

Enabling Technologies for Distributed Computing

For designers and engineers, Autodesk Product Design Suite Standard provides a foundational 3D design and drafting solution.

CPU Benchmarks Over 600,000 CPUs Benchmarked

PassMark - CPU Mark High End CPUs - Updated 12th of November 2015

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Measuring Processor Power

64-Bit versus 32-Bit CPUs in Scientific Computing

DDR3 memory technology

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x Architecture

Enabling Technologies for Distributed and Cloud Computing

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Solid State Drive Architecture

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

CPU Benchmarks Over 600,000 CPUs Benchmarked

Home Software Hardware Benchmarks Services Store Support Forums About Us. CPU Mark Price Performance. (Click to select desired chart)

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

Green HPC - Dynamic Power Management in HPC

Configuring Memory on the HP Business Desktop dx5150

High Performance SQL Server with Storage Center 6.4 All Flash Array

Summary. Key results at a glance:

SUN ORACLE EXADATA STORAGE SERVER

AMD Opteron Quad-Core

Autodesk Building Design Suite 2012 Standard Edition System Requirements... 2

x64 Servers: Do you want 64 or 32 bit apps with that server?

Power Efficiency Metrics for the Top500. Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab

Technical Product Specifications Dell Dimension 2400 Created by: Scott Puckett

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Understanding the Benefits of IBM SPSS Statistics Server

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Oracle Applications Release 10.7 NCA Network Performance for the Enterprise. An Oracle White Paper January 1998

How To Compare Two Servers For A Test On A Poweredge R710 And Poweredge G5P (Poweredge) (Power Edge) (Dell) Poweredge Poweredge And Powerpowerpoweredge (Powerpower) G5I (

HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark Benchmark

Understanding the Economics of Flash Storage

Resource Efficient Computing for Warehouse-scale Datacenters

SUN STORAGE F5100 FLASH ARRAY

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

ZK Performance Test. ZK Community, Professional, Enterprise Edition. Sam Chuang Timothy Clare. Potix Corportation

Thread level parallelism

Indexing on Solid State Drives based on Flash Memory

Parallel Algorithm Engineering

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Week 1 out-of-class notes, discussions and sample problems

System Models for Distributed and Cloud Computing

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Towards Energy Efficient Query Processing in Database Management System

Understanding Memory Resource Management in VMware vsphere 5.0

Virtuoso and Database Scalability

Comparing Multi-Core Processors for Server Virtualization

Math 115 Spring 2011 Written Homework 5 Solutions

These help quantify the quality of a design from different perspectives: Cost Functionality Robustness Performance Energy consumption

Communicating with devices

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Building a More Efficient Data Center from Servers to Software. Aman Kansal

Tested product: Auslogics BoostSpeed

Pipelining Review and Its Limitations

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

Scaling in a Hypervisor Environment

Parallel Scalable Algorithms- Performance Parameters

AMD > Products We Design > Processor Pricing > AMD Opteron. Effecve January 22, 2014

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

Kingston Technology. Server Architecture and Kingston Memory Solutions. May Ingram Micro. Mike Mohney Senior Technology Manager, TRG

Copyright 1

Comparison of Windows IaaS Environments

Introduction to Microprocessors

BENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES

Benchmarking Hadoop & HBase on Violin

COMP 7970 Storage Systems

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

The Performance of 2 and 4 HyperDrive4 units in RAID0

Ports utilisés. Ports utilisés par le XT1000/5000 :

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

Virtualised MikroTik

Transcription:

Computer Architecture-I 1. Die Yield is given by the formula, Assignment 1 Solution Die Yield = Wafer Yield x (1 + (Defects per unit area x Die Area)/a) -a Let us assume a wafer yield of 100% and a 4 for current technology. a. Die yield for AMD Opteron, Die yield = (1 + (0.75 x 1.99)/4) -4 = 0.281 b. Die yield for 8-core SUN Niagara, Die yield = (1 + (0.75 x 3.80)/4) -4 = 0.116 c. The defect rate for both, the AMD Opteron and SUN Niagara is the same. But, the size of the die for Niagara is almost twice as that of AMD Opteron. Thus, the number of dies per wafer reduces significantly for the Niagara. Since the defect rate is same, the yield of Niagara suffers in comparison to Opteron. 4. Question 1.4 a. In order to compute the wattage for the server s power supply, we need to first calculate the power consumed by the entire system. i. Sun Niagara 8-core chip : Power Consumed at max load = 79W ii. 2 x 1GB 184-pin Kingston DRAM : Power consumed at max load = 2 x 3.7W = 7.4W iii. 2 x 7200rpm Hard Drive : Since, we are interested in max. load condition, we assume 0% idle time for the hard drive. Power = 7.9W Total power for 2 Drives = 15.8W Thus, the total power consumed by the system = (79+7.4+15.8) = 102.2W. Power Supply Efficiency = Power O/P / Power I/P Thus, PowerI /P = 102.2/0.7 = 146W This is the required power supply wattage for the system.

b. The hard drive is idle for 40% of the time. Power = (0.4 x 4) + (0.6 x 7.9) = 6.34W c. Since rpm is the only factor affecting idle time of a disk, the disk rpm is directly proportional to the read/seek and idle time of the disk. The disk with 7200 rpm has a read/seek of 60%. Then, for the same set of transactions, the 5400 rpm disk will take 4/3 more time than the 7200 rpm disk i.e. 80% read/seek. Thus, the 5400 rpm disk will idle for 20%. 6. Question 1.6 a. Performance/Power Ratio for each benchmark has been tabulated below, Benchmark Sun Fire T2000 IBM x346 SPECjbb 212.677 91.289 SPECWeb 42.427 9.926 b. If power is the main concern, the Sun Fire T2000 is a better choice since it has lower power consumption for both the benchmarks. c. It is true that For database benchmarks, the cheaper the system, the lower cost per database operation the system is. Even so, some server farms may go for expensive servers. These servers are equipped not only for better performance, but also lower power consumption. Power consumption is an ever-growing concern with large server farms which may consist of over 10000 processors and disks. Cheaper systems might yield a lower cost per operation which is desirable. But these systems may not be power efficient. The cost incurred due to excess power consumption, cooling costs is quite significant. Thus, it is necessary to weigh both these factors when making the choice. 9. Question 1.9 a. FIT = 100 Since FIT is given in billions of hours, MTTF = 10 9 /FIT = 10 9 /100 = 10 7 hours b. MTTR = 1 day = 24hours Availability = MTTF/ (MTTF + MTTR) = 0.9999

12. Question 1.12 a. Tabulated results for performance normalized to the Pentium D820 Chip Memory Performance Dhrystone Performance Athlon64 X2 4800+ 1.141 1.361235217 Pentium EE840 1.076 1.241327201 Pentium D820 1 1 Athlon64 X2 3800+ 0.980333333 1.12542707 Pentium 4 0.910333333 0.500722733 Athlon64 3000+ 0.984333333 0.501182654 Pentium 4 570 1.167 0.73653088 Processor X 2.333333333 0.328515112 b. Arithmetic Mean of Performance for each processor tabulated for both original and normalized performance values. Chip Arithmetic Mean for Original Results Arithmetic Mean for Normalized Results Athlon64 X2 4800+ 12070.5 1.251117608 Pentium EE840 11060.5 1.158663601 Pentium D820 9110 1 Athlon64 X2 3800+ 10035 1.052880201 Pentium 4 5176 0.705528033 Athlon64 3000+ 5290.5 0.742757994 Pentium 4 570 7355.5 0.95176544 Processor X 6000 1.330924223 c. From the table above, one can draw a conflicting conclusion in reference to the performance of Processor X. If one examines the performance given in the first column, it is clear that the processors viz. Athlon64 X2 4800+, Pentium EE840, Pentium D820, Athlon64 X2 3800+ and the Pentium 4 570 are all faster than Processor X. This is contrary to the results in the second column where Processor X is faster than all of the said processors. d. Geometric mean for Dhrystone benchmark for the single and dual core processors is given below: Geometric mean (Single Core) = 0.4964 Geometric mean (Dual Core) = 1.1743

e. The scatter graph for Dhrystone performance Vs Memory Performance is given below: f. The scatter graph clearly indicates that the dual core processors outperform their single core counterparts in Dhrystone performance. The Dhrystone benchmark is an integer benchmark which primarily exercises the logical/arithmetic functionality in CPU. The dramatic improvement in Dhrystone performance can be justified simply by the fact that there are 2 cores available for computation instead of 1. It can also be seen that there is no major improvement in memory performance. This is because the latency in memory is not related to number of CPU cores available. Thus, even if the processor is a dual core, the latency in memory load/store operations is similar to the single core. The only exception to this is the memory performance of Processor X which is fictitiously high.

13. Question 1.13 a. It is given that 40% of operations are memory centric and 60% are CPU-centric. Following table gives the weighted execution times for the benchmarks. Chip Memory Benchmark Execution Time Dhrystone Benchmark Weighted Arithmetic Mean Athlon64 X2 4800+ 0.000292141 4.82672E -05 0.00015 Pentium EE840 0.000309789 5.29297E -05 0.00016 Pentium D820 0.000333333 6.5703E-05 0.00017 Athlon64 X2 3800+ 0.00034002 5.83805E -05 0.00017 Pentium 4 0.000366166 0.000131216 0.00023 Athlon64 3000+ 0.000338639 0.000131096 0.00021 Pentium 4 570 0.000285633 8.92061E -05 0.00017 Processor X 0.000142857 0.0002 0.00018 b. Since the application suite is CPU-intensive, we consider the Dhrystone performance of the two CPUs in comparison. Speed-up from Pentium 4 570 to Athlon64 X2 4800+ can be measured as the ratio of their Dhrystone performance. Hence Speed-up = 20718/11210 = 1.848 c. Let the required ratio of memory-processor computation be a. Then, for equal performance, we can consider the following equation. 3501a + 11210(1-a) = 3000a + 15220(1-a) Thus, 4511a = 4010 i.e. a = 0.89 Thus, the performance of Pentium 4 570 equals Pentium D 820 when there are 89% memory operations and 11% processor operations.

14. Question 1.14 According to Amdahl s Law, speed up is given by, Speed-up system = (Execution Time) old /(Execution Time) new = 1/((1 Fraction enhanced ) + (Fraction enhanced /Speed-up enhanced )) a. The first application is run in isolation and 40% of it is parallelizable. Thus, Fraction enhanced = 0.4. Also, since the new processor is a dual core, Speed-up enhanced = 2. Then, the overall speed-up is given by the formula above. Speed-up system = 1.25 b. The second application is run in isolation and 99% of it is parallelizable. Hence, we have: Fraction enhanced = 0.4 Speed-up enhanced = 2 Thus, Speed-up system = 1.98 c. Now both, the first and second application are running on the system. Since, the first application uses 80% of system resources, only 40% of 80% (= 32%) will be enhanced by a factor of 2. Thus, Speed-up system = 1.19 d. Similar to the solution above, 99% of 20% (= 19.8%) will be enhanced by a factor of 2. Thus, Speed-up system = 1.10