Remote-scope promotion

Similar documents
A Project Management framework for Software Implementation Planning and Management

Description. Rc NPT G 1/8 1/4 3/8 1/2 3/4. With drain cock Drain guide 1/8 Drain guide 1/4 Drain cock with barb fitting: For ø6 x ø4 nylon tube

Fetch. Decode. Execute. Memory. PC update

REPORT' Meeting Date: April 19,201 2 Audit Committee

Important Information Call Through... 8 Internet Telephony... 6 two PBX systems Internet Calls... 3 Internet Telephony... 2

Scalable Transactions for Web Applications in the Cloud using Customized CloudTPS

Probabilistic maintenance and asset management on moveable storm surge barriers

Parallel and Distributed Programming. Performance Metrics

CPU. Rasterization. Per Vertex Operations & Primitive Assembly. Polynomial Evaluator. Frame Buffer. Per Fragment. Display List.

CARE QUALITY COMMISSION ESSENTIAL STANDARDS OF QUALITY AND SAFETY. Outcome 10 Regulation 11 Safety and Suitability of Premises

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

Whole Systems Approach to CO 2 Capture, Transport and Storage

Rural and Remote Broadband Access: Issues and Solutions in Australia

Continuity Cloud Virtual Firewall Guide

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

Magic Message Maker Amaze your customers with this Gift of Caring communication piece

ME 612 Metal Forming and Theory of Plasticity. 6. Strain

Architecture of the proposed standard

Problems and Measures Regarding Waste 1 Management and 3R Era of public health improvement Situation subsequent to the Meiji Restoration

Vector Network Analyzer

Category 7: Employee Commuting

Cost Benefit Analysis of the etir system Summary, limitations and recommendations

(Analytic Formula for the European Normal Black Scholes Formula)

Before attempting to connect or operate this product, please read these instructions carefully and save this manual for future use.

Advances in GNSS Equipment

Authenticated Encryption. Jeremy, Paul, Ken, and Mike

UTILITY SOLUTIONS. Security & Site Monitoring. Substation Automation Solutions. Protection & Control Systems. Optical Communication Networks

Free ACA SOLUTION (IRS 1094&1095 Reporting)

A Note on Approximating. the Normal Distribution Function

Tank Level GPRS/GSM Wireless Monitoring System Solutions

Nimble Storage Exchange ,000-Mailbox Resiliency Storage Solution

Sci.Int.(Lahore),26(1), ,2014 ISSN ; CODEN: SINTE 8 131

Ra atoms and ions: production and spectroscopy Testing the Standard Model in Heavy Nuclei H.W. Wilschut

STATEMENT OF INSOLVENCY PRACTICE 3.2

REVIEW ON COMPARATIVE STUDY OF SOFTWARE PROCESS MODEL

Question 3: How do you find the relative extrema of a function?

Traffic Flow Analysis (2)

LG has introduced the NeON 2, with newly developed Cello Technology which improves performance and reliability. Up to 320W 300W

EIDHow EID improves. farm performance ELECTRONIC. for all Livestock STEP 1 STEP 4 STEP 2 STEP 3. Animal fitted with eartag. Animal Fitted with Tag

A Multi-Heuristic GA for Schedule Repair in Precast Plant Production

ITIL & Service Predictability/Modeling Plexent

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

C H A P T E R 1 Writing Reports with SAS

Incomplete 2-Port Vector Network Analyzer Calibration Methods

I would appreciate the opportunity to discuss your needs and how I can help you meet your goals.

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

Enforcing Fine-grained Authorization Policies for Java Mobile Agents

DENTAL CAD MADE IN GERMANY MODULAR ARCHITECTURE BACKWARD PLANNING CUTBACK FUNCTION BIOARTICULATOR INTUITIVE USAGE OPEN INTERFACE.

Performance Evaluation

Brussels, February 28th, 2013 WHAT IS

Development of Financial Management Reporting in MPLS

A Loadable Task Execution Recorder for Hierarchical Scheduling in Linux

Precise Memory Leak Detection for Java Software Using Container Profiling

Two-wire Serial EEPROMs AT24C128 AT24C256

Caution laser! Avoid direct eye contact with the laser beam!

A Comparative Analysis of BRIDGE and Some Other Well Known Software Development Life Cycle Models

TIME MANAGEMENT. 1 The Process for Effective Time Management 2 Barriers to Time Management 3 SMART Goals 4 The POWER Model e. Section 1.

TEMPERATURE COMPENSATION OF A THERMAL FLOW SENSOR BY USING TEMPERATURE COMPENSATION NETWORK. *Corresponding author: sima@tatiuc.edu.

Sample Green Belt Certification Examination Questions with Answers

Moving Securely Around Space: The Case of ESA

ENVIRONMENT FOR SIGNAL PROCESSING APPLICATIONS DEVELOPMENT AND PROTOTYPING Brigitte SAGET, MBDA

Title: Patient Safety Improvements through Real-Time Inventory Management

I. INTRODUCTION. Figure 1, The Input Display II. DESIGN PROCEDURE

Cisco Data Virtualization

Sharp bounds for Sándor mean in terms of arithmetic, geometric and harmonic means

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Key Management System Framework for Cloud Storage Singa Suparman, Eng Pin Kwang Temasek Polytechnic

Practical Embedded Systems Engineering Syllabus for Graduate Students with Multidisciplinary Backgrounds

Projections - 3D Viewing. Overview Lecture 4. Projection - 3D viewing. Projections. Projections Parallel Perspective

Many quantities are transduced in a displacement and then in an electric signal (pressure, temperature, acceleration). Prof. B.

Business rules FATCA V. 02/11/2015

An Broad outline of Redundant Array of Inexpensive Disks Shaifali Shrivastava 1 Department of Computer Science and Engineering AITR, Indore

Lecture 20: Emitter Follower and Differential Amplifiers

MAXIMAL CHAINS IN THE TURING DEGREES

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Package Information Datasheet for Mature Altera Devices

TESTING AND EXPOSING WEAK GRAPHICS PROCESSING UNIT MEMORY MODELS

An Adaptive Clustering MAP Algorithm to Filter Speckle in Multilook SAR Images

Defense Logistics Agency STANDARD OPERATING PROCEDURE

Physics 106 Lecture 12. Oscillations II. Recap: SHM using phasors (uniform circular motion) music structural and mechanical engineering waves

Type Inference and Optimisation for an Impure World.

Quality and Pricing for Outsourcing Service: Optimal Contract Design

Switches and Indicators 01

Using. SDR-Console by Simon Brown

In the previous two chapters, we clarified what it means for a problem to be decidable or undecidable.

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Repulsive Force

Product Overview. Version 1-12/14

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

3/4", 1", & 1-1/8" Bore

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Data Encryption and Decryption Using RSA Algorithm in a Network Environment

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

Designing a Secure DNS Architecture

Usability Test of UCRS e-learning DVD

Numerical Algorithm for the Stochastic Present Value of Aggregate Claims in the Renewal Risk Model

CAT24C kb I 2 C CMOS Serial EEPROM

Transcription:

Rmot-scop promotion clarifid John Wickrson (Imprial), Mark Batty (Knt), Ally Donaldson (Imprial), Brad Bckmann (AMD) rctifid vrifid REMS workshop, Cambridg 22 April 2015

In brif RSP is a GPGPU languag xtnsion from AMD that nabls fficint work-staling W workd with AMD to formalis thir dsign (at languag and HW lvl). This ld to a corrctd and improvd implmntation. Formalis arly in th dsign procss!

This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP

This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP

C11: flat thrad structur T4 T5

OpnCL: thrad groupings dvic dvic workgroup workgroup workgroup T4 T5

GPUs: hirarchical mmory dvic dvic workgroup workgroup workgroup T4 T5 L1 CACHE L1 CACHE L1 CACHE L2 CACHE L2 CACHE GLOBAL MEMORY

Mmory scops

Mmory scops stor(x,42) load(x) workgroup workgroup

Mmory scops stor(x,42,wg) load(x,wg) workgroup workgroup

Mmory scops stor(x,42,wg) load(x,wg) faulty! workgroup workgroup

Mmory scops stor(x,42,dv) load(x,dv) ok! workgroup workgroup

Mmory scops stor(x,42,dv) load(x,wg) faulty!* workgroup workgroup *...but th OpnCL standard could asily b xtndd to allow this

stor(hada,_,wg) //pop Work-staling stor(hada,_,wg) //push workgroup A workgroup B taila hada tailb hadb

Work-staling stor(hada,_,wg) //push stor(hada,_,wg) //pop stor(hada,_,???) //stal workgroup A workgroup B no way to plug this hol in OpnCL! taila hada hadb tailb

Rmot-scop promotion stor(x,42,dv) load(x,dv)

Rmot-scop promotion stor(x,42,wg) load(x,dv)

Rmot-scop promotion stor(x,42,wg) load(x,dv,rmot)

Rmot-scop promotion stor(x,42,dv,rmot) load(x,wg)

Work-staling stor(hada,_,wg) //push stor(hada,_,wg) //pop stor(hada,_,dv,rmot) //stal workgroup A workgroup B taila hada hadb tailb

This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP

vrsion W prsnt in Fig. 2 th full OpnCL modl (including x Wprsnt prsntininfig. Fig.22th thfull fullopncl OpnCLmodl modl(including (includingx x W W prsnt Fig. 2languag. th full OpnCL modl (including barrirs) in thin.cat In th following, w discu barrirs)ininth th.cat.catlanguag. languag.ininth thfollowing, following,w wdiscu disc barrirs) w di changs thatinmust b mad to th modl, which ar highlig barrirs) th.cat languag. In th following, changsthat thatmust mustbbmad madtototh thmodl, modl,which whichar arhighlig highli changs thchangs figur. that must b mad to th modl, which ar highl thfigur. figur. th th figur. 2 n sw = rf \ (6 = ) \ ( na) 2 2 thd on n sw = rf \ (6 = ) \ ( na) sw = rf \ (6 = ) \ ( na) 2 thd thd ion In C11ra: sw = rf \ (6=thd ) \ ( na) quntial squntial quntial squntial uggstion sw = rf \ (6 = thd ) \ (=incl ) suggstion uggstion sw = rf \ (6 = ) \ (= ) sw = rf \ (6 = ) \ (= ) thd incl thd incl In OpnCLra: rccommosuggstion sw = rf \ (6 = ) \ (= ) thd incl accommoccommoo accommohroughout 0 0 0 throughout hroughout ==incl 0 0==(( incl 00^^ incl 0)0 ) =incl incl = incl ^0 incl ) 0 d throughout 0 ( incl incl =incl = ( incl ^ incl ) rmovd rmovd rmovd 0 0 0 incl 0 ==((22na ^^==thd 0)0 ) b rmovd na incl incl = thd0 ) _ 0 0 ( 2 na ^ =thd ((( 22WG ^ = incl = 2 na ^ = 0 )0_) _ wg thd WG^^==wg _ ( 2 WG wg) )_ 0 for a ( 2 WG ^ = ) _ 2 DV wg for for aa 2 DV 2 DV bfor a ations 2 DV0 rations ations 0 0 0 = 0 = ( incl 0 0^ 0 )0 _ incl incl h sid rations = = ( ^ incl = incl0 ^0 incl) )_ sid In OpnCL+RSP: =incl 0 ( incl incl sid 0_ =incl = ( ^ ) _ 0^ ( 2 rm) _ 0 incl incl th sid incl rm) ((0 incl incl ^0^22rm) 0 02 ( rm ^ ^ 2 rm) 0( 0)0 ) _ incl incl ( 2 rm ^ ( 2 rm ^ incl ) 0 or a incl 0 for or aa ( 2 rm ^ incl ) Excution oprabfor Excution barrirs barrirs and and rlaxd rlaxd atomics atomics Whn Whn orchs orch opra-a pcification nt th full ntth thfull full nt snt r) inth B. snt full dr) in B. B. r) in ordr) in B. including,including including l, including Scop inclusion

Tsting OpnCL+RSP programs W xtndd Hrd to support th nw mmory modl. W simulatd th 12 litmus tsts dsignd by th AMD dvloprs to dfin thir xpctations of RSP. W found 8 wr good, but: 2 had unintntional racs, 1 nforcd brokn bhaviour, and 1 forbad rasonabl bhaviour. W also found (and fixd) bugs in thir work-staling quu implmntation

This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP

Implmnting RSP Modl of GPU hardwar Assmbly-lik languag, with instructions modlld as stat transformrs Mapping from OpnCL+RSP oprations to squncs of assmbly instructions Can thn prov that all bhaviours of th compild program ar allowd by th OpnCL+RSP MM.

Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) stor(x,r) r=ftch_inc(x)

Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x stor(x,r) ST r x r=ftch_inc(x) INCL1 r x

Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG

Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG LD r x FLUL1 DV INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } INVL1 DV LK rmw ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG FLUL1 WG INVL1 DV INCL2 r x FLUL1 DV INVL1 WG }LKrmw

Old compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x INVL1 WG LD r x FLUL1 DV } INVL1 WG LK LD r x x stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } ST r x LK INVL1 DV x r=ftch_inc(x) INCL1 r x FLUL1 WG INVL1 WG INCL2 r x FLUL1 DV INVL1 WG}LK x INCL2 r x INVL1 DV LKrmw

Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG LD r x FLUL1 DV INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } INVL1 DV LK rmw ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG FLUL1 WG INVL1 DV INCL2 r x FLUL1 DV INVL1 WG }LKrmw

Contributions Extndd OpnCL mm to includ RSP Extndd Hrd with nw mm, and usd it to find and fix bugs in RSP litmus tsts and programs Formalisd implmntation of RSP (modl of GPU hardwar + assmbly languag) Found and fixd bugs in original compilation schm Provd nw improvd schm corrct

Rmot-scop promotion clarifid John Wickrson (Imprial), Mark Batty (Knt), Ally Donaldson (Imprial), Brad Bckmann (AMD) rctifid vrifid REMS workshop, Cambridg 22 April 2015

Spar slids

A corrupt MP *x=42; stor(y,1,dv); if(load(y,dv)==1) print(*x);

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=0

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 x=0

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 x=0 x=42

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 x=42

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 x=42 y=1

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 y=1 x=42 y=1

A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 y=1 x=42 y=1