Compilation Techniques for Embedded Data Parallel Languages

Similar documents
Fast Support Vector Machine Training and Classification on Graphics Processors


Problems and Measures Regarding Waste 1 Management and 3R Era of public health improvement Situation subsequent to the Meiji Restoration

Copyright 2013 wolfssl Inc. All rights reserved. 2


6. Cholesky factorization

OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem")

Users Guide. Exchange ActiveSync mail account Setup: mail1.t5.fi

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague

Lecture 13 Linear quadratic Lyapunov theory

= y y 0. = z z 0. (a) Find a parametric vector equation for L. (b) Find parametric (scalar) equations for L.

Innovation and Entrepreneurship in Renewable Energy

HowEasily Find the Best Lowest Price Possible For a Wedding

Users Guide. Exchange ActiveSync mail account Setup: mail.t5.fi

Working Set Selection Using Second Order Information for Training Support Vector Machines

Preference and Similaritybased Behavioral Discovery of Services

Open Source Software Maintenance Process Framework

LINES AND PLANES CHRIS JOHNSON

An Investigation of Geographic Mapping Techniques for Internet Hosts

A Study on SMO-type Decomposition Methods for Support Vector Machines


How To Develop A Tablet Processor

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Section 2.4: Equations of Lines and Planes

Computing Global Tablet and Application Processor Development Trends, 2012 and Beyond

Principle Two-. Learning Engages the Entire Physiology

Training and Certification Guide - Axapta

Network Performance Monitoring at Small Time Scales

Redundant Array of Independent Disks (RAID)

PRODUCTION PLANNING AND SCHEDULING Part 1

Blake Library Special Collections

Processor Architectures

Federation Proxy for Cross Domain Identity Federation

Lecture Topic: Low-Rank Approximations

Integration of Upper Division Business Core Classes: A Lesson in Informing Science

How To Map Behavior Goals From Facebook On The Behavior Grid

Versions Addressed: Microsoft Office Outlook 2010/2013. Document Updated: Copyright 2014 Smarsh, Inc. All right reserved

Using Workflow Technology to Manage Flexible e-learning Services

Multi-asset Minority Games

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

ArcGIS for. Intelligence

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Overview of the Electrical Engineering and Computer Sciences Department at UC Berkeley

A Look Inside Smartphone and Tablets

Write the Equation of the Line Review

Using Excel to Simplify Business Network Enablement:

Math 55: Discrete Mathematics

AIS Online Tools. Sophia Chen Manager, Account Information Security Singapore August 18, 2005

An Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Measuring System Performance & User Satisfaction after Implementation of ERP

Introduction to Scheduling Theory

Unit Commitment and Economic Model Predictive Control for Optimal Operation of Power Systems

15 May 2013 Version 5. for Mac OS X. Public version. Gemfor s.r.o. Tyršovo nám Roztoky Czech Republic

A Survey of Enterprise Middlebox Deployments

Implementing an Interdisciplinary Masters Program in Internet Technology and E-Commerce


Real time vehicle detection and tracking on multiple lanes

Security Policy for FIPS Validation

MEDIA SHARE Set Up Guide for PCs with Windows 7

Improving MapReduce Performance in Heterogeneous Environments

1. Oblast rozvoj spolků a SU UK 1.1. Zvyšování kvalifikace Školení Zapojení do projektů Poradenství 1.2. Financování

Archiving Scientific Data

Introduction to Microprocessors

Installation & Configuration Guide Version 1.0. TekSMTP Version Installation & Configuration Guide

Dell Open Manage System Administrator (Free software to manage and maintain your server hardware) Hardware Dell PowerEdge T320 or R320

SAP NetWeaver 7.5 Browser Support PAM Table of Content

Parallels Plesk Panel

The FOSSology Project Overview and Discussion. » The Open Compliance Program. ... By Bob Gobeille, Hewlett-Packard

Applications to Data Smoothing and Image Processing I

MEDIA SHARE Set Up Guide for PCs with Windows XP

The Harvard CSE Curriculum, Its Advantages and Disadvantages

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Get me off Your Fucking Mailing List

University of L Aquila Center of Excellence DEWS Poggio di Roio L Aquila, Italy

Man-in-the-Middle Attack on T-Mobile Wi-Fi Calling

Diusion processes. Olivier Scaillet. University of Geneva and Swiss Finance Institute

BACKUP BENCHMARKING OF VERY L ARGE MICROSOFT SQL S ERVER 7.0 D ATABASES DURING ACTIVE ONLINE T RANSACTION PERFORMANCE L OADING ON COMPAQ HARDWARE

The Open University s repository of research publications and other research outputs

Telecom Italia and WSN

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

Activities and Resources in Online Learning: From a Critical Thinking View

Parallels Mac Management v4.0

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Video Conferencing System Requirements

Outlook 2010 Setup Guide (POP3)

DATA ANALYSIS II. Matrix Algorithms

Optimal Scheduling for Dependent Details Processing Using MS Excel Solver

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Resource Utilization of Middleware Components in Embedded Systems

Distributed Machine Learning and Big Data

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method


Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

SchedulAir. Airline planning & airline scheduling with Unified Optimization. decisal. Copyright 2014 Decisal Ltd. All rights reserved.

Adaptive Stable Additive Methods for Linear Algebraic Calculations

Creating Customer Value in Participating

2015 Qualcomm Technologies, Inc. All rights reserved.

Laptop vs. Tablet. What Should I Buy? Presented by: Matt Harmon & Rob Germeroth

Transcription:

Compilation Techniques for Embedded Data Parallel Languages Bryan Catanzaro Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2011-45 http://www.eecs.berkeley.edu/pubs/techrpts/2011/eecs-2011-45.html May 11, 2011

Copyright 2011, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research has been supported by a National Science Foundation Graduate Research Fellowship, an NVIDIA Graduate Fellowship, a Qualcomm Innovation Fellowship, the Gigascale Research Center, and the Parallel Computing Laboratory, which itself is supported by Microsoft (Award #024263), Intel (Award #024894), and by matching funding from U.C. Discovery (Award #DIG07-10227), with additional support from affiliates National Instruments, NEC, Nokia, NVIDIA, and Samsung.

Ax A x

ψ = 1 ψ = 10 ψ = 100 ψ = 1000 ψ = 10000 ψ = 100000

P

3 5 9

1/5 1/3

E x E 1,...,E n A E E 1 + E 2 E 1 < E 2 E... E 1 E 2 E 1 E 2 E 1 E p E 2 F E 1,...,E n x 1,...,x n E F, A 1,...,A n E x A E x 1,...,x n A 1,...,A n E S F A

S E x 1,...,x n E E f x 1,...,x n S S

T S P S M V M S 1 S n P V 1 V n S V T S M P V M

S 1, S 2 S 1 S 2 S 1 S 1 S 1,..., S n n S 1,..., S n

( x i x x i ) z

x i x x i { x[t] i indices, t = argi indices z[i] = y[i] i / indices

Ax A x arg i indices y = Ax A x

1 7 0 0 A = 0 2 8 0 5 0 3 9 0 6 0 4 y i i A ij x j j x j x

ˆ

Φ Φ Φ

0 0 0 0 1 n i i

m n [m, n], Unit 0 0 0 0 0

y = Ax x

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 4 2 3 4 5 14 7 8 6 5 3 9 5 21 15 8 6 9 21

f (x i, y i )= x i y i 2 X = {x 0, x 1,...,x n } Y = {y 0, y 1,...,y n }

x i y i x i y i m s ζ(s) P(k; s) = k s ζ(s), k > 0, s > 1 s s s m

s min max max min (ψ) 10 5 10 7 ψ = W max W min W max W min ψ ψ = 1

m m ψm ψ ψ = 1

ψ = 1 ψ = 1 m > 50

ψ = 10 ψ = 100

ψ = 1000 ψ = 10000

ψ = 100000 m < 10 ψ = 1 ψ = 10 10 m < 50 ψ = 100 ψ = 1000 ψ = 10000 ψ = 100000 ψ 1000

O P

P P { } { } G = N, E N n N n o, n h O P n o n n h n h E n i n j n i n j e E e b, e c, e d {False, True} C D e b e b e c n i e d n i C c C C = {None, Local, Global}

None Local local local local local Global global global D = {None, Backward, Forward} None None Backward n 1 0 Backward Forward 0 n 1 Forward Global None n n

(N, L,...,L) L (N,...,N) N (N, L, G) N (N, N, N) N (N, L) N (N, N, N) N (N, L) N (N, N, N) N (G, L) L (N, N) N (L, L, G) N (N, N, N) N (N, L,...,L) L (N,...,N) N (N, L, G) N (N, N, N) N (N, L) N (N, F, N) F (N, L) N (N, B, N) B (G, L) L (N, N) N (L, L, G) N (N, N, N) N

n n o n h n n e n n ec n ed n e c < n ec (n ed = None) (e d = n ed ) e b = True n n

k 2 k

!"#"$$$%&'(")&'*("$+&'**("$$*("$,&'("-&'***" "!./012.)"#"$'(")(",*" "!./012.%"#"$'(")("+("+("-*" "!./343"""#"$%&'(")&'("+&'(",&'("-&'*" [1, 2, 3] [1, 2, 3.0] [[1, 2], [3]] [[1, 2],3]

n m ψm (ψ + n 1)m ψmn ψ

3, 3, Unit [[1, 2, 3], [4, 5, 6], [7, 8, 9]] (3, 3) (3, 1) [1, 2, 3, 4, 5, 6, 7, 8, 9] (1, 3) [1, 4, 7, 2, 5, 8, 3, 6, 9] (1, 4) [1, 4, 7,, 2, 5, 8,, 3, 6, 9, ]

6.39s 0.52s 195µs 6.04s 0.49s 202µs 5.82s 0.49s 171µs 7.46s 0.63s 360µs

{ } { }

1 7 0 0 A = 0 2 8 0 5 0 3 9 0 6 0 4 M K K K M = 4 K = 3 K M

Ax = b A

A M 1 x 0 b n r 0 = b Ax 0 z 0 = M 1 r 0 p 0 = z 0 k = 0 rt k z k α k = pk T Ap k x k+1 = x k + α k p k r k+1 = r k α k Ap k z k+1 = M 1 r k+1 β k = zt k+1 r k+1 z T k r k p k+1 = z k+1 + β k p k k = k + 1 k > n M 1 a, b, c [ ] ai b m i = i [ ] 1 [ ] p i = m 1 ai b i = i 1 ci b = i b i c i a i c i b i b i b i a i b i c i

v i V v i V i v 1 n i V n i α i β i ff fi

A v 1 V i = 1 z i = Av i α i = z T i V i z i = z i VV T z i β i = z i V i+1 = z i /β i i = i + 1

T j j j α 1, α 2,...,α j β 1, β 2,...,β j 1 SΘS = T j

x R n y { 1, 1} x i, i {1,..., l} y i, i {1,..., l} max α F(α) = subject to l i=1 α i 1 2 αt Qα 0 α i C, i 1...l y T α = 0 x i R n i y i { 1, 1} x i α i C Q ij = y i y j Φ(x i, x j ) Φ(x i, x j ) Φ(x i, x j ; γ) =exp { γ x i x j 2} γ

α i x i y i i {1..l} τ α i f i = y i i {1..l} b high b low i high i low α ihigh α ilow f i i {1..l} b high i high b low i low α ihigh α ilow b low b high + 2τ b high = 1 i high = min{i : y i = 1} b low = 1 i low = min{i : y i = 1} i high i low α i low = α ilow + y ilow (b high b low )/η α i high = α ihigh + y ilow y ihigh (α ilow α i low ) η = Φ(x ihigh, x ihigh )+Φ(x ilow, x ilow ) 2Φ(x ihigh, x ilow ) α i low α i high 0 α i C f i = l j=1 α jy j Φ(x i, x j ) y i α f f i = f i +(α i high α ihigh )y ihigh Φ(x ihigh, x i ) +(α i low α ilow )y ilow Φ(x ilow, x i )

I high = {i :0< α i < C} {i : y i > 0, α i = 0} {i : y i < 0, α i = C} I low = {i :0< α i < C} {i : y i > 0, α i = C} {i : y i < 0, α i = 0} ɛ {i : ɛ < α i < (C ɛ)} b high = min{ f i : i I high } b low = max{ f i : i I low } b low b high + 2τ i high i low α i high = arg min{ f i : i I high } i low = arg max{ f i : i I low }

1 ψ 1000 5 ψ > 1000 5 50 45% 100% 10% 33% 9

45 100%

{Distributed, Sequential}

45 100%