Compilation Techniques for Embedded Data Parallel Languages

Compilation Techniques for Embedded Data Parallel Languages Bryan Catanzaro Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2011-45 http://www.eecs.berkeley.edu/pubs/techrpts/2011/eecs-2011-45.html May 11, 2011

Copyright 2011, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research has been supported by a National Science Foundation Graduate Research Fellowship, an NVIDIA Graduate Fellowship, a Qualcomm Innovation Fellowship, the Gigascale Research Center, and the Parallel Computing Laboratory, which itself is supported by Microsoft (Award #024263), Intel (Award #024894), and by matching funding from U.C. Discovery (Award #DIG07-10227), with additional support from affiliates National Instruments, NEC, Nokia, NVIDIA, and Samsung.

Ax A x

ψ = 1 ψ = 10 ψ = 100 ψ = 1000 ψ = 10000 ψ = 100000

1/5 1/3

E x E 1,...,E n A E E 1 + E 2 E 1 < E 2 E... E 1 E 2 E 1 E 2 E 1 E p E 2 F E 1,...,E n x 1,...,x n E F, A 1,...,A n E x A E x 1,...,x n A 1,...,A n E S F A

S E x 1,...,x n E E f x 1,...,x n S S

T S P S M V M S 1 S n P V 1 V n S V T S M P V M

S 1, S 2 S 1 S 2 S 1 S 1 S 1,..., S n n S 1,..., S n

( x i x x i ) z

x i x x i { x[t] i indices, t = argi indices z[i] = y[i] i / indices

Ax A x arg i indices y = Ax A x

1 7 0 0 A = 0 2 8 0 5 0 3 9 0 6 0 4 y i i A ij x j j x j x

Φ Φ Φ

0 0 0 0 1 n i i

m n [m, n], Unit 0 0 0 0 0

y = Ax x

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 4 2 3 4 5 14 7 8 6 5 3 9 5 21 15 8 6 9 21

f (x i, y i )= x i y i 2 X = {x 0, x 1,...,x n } Y = {y 0, y 1,...,y n }

x i y i x i y i m s ζ(s) P(k; s) = k s ζ(s), k > 0, s > 1 s s s m

s min max max min (ψ) 10 5 10 7 ψ = W max W min W max W min ψ ψ = 1

m m ψm ψ ψ = 1

ψ = 1 ψ = 1 m > 50

ψ = 10 ψ = 100

ψ = 1000 ψ = 10000

ψ = 100000 m < 10 ψ = 1 ψ = 10 10 m < 50 ψ = 100 ψ = 1000 ψ = 10000 ψ = 100000 ψ 1000

P P { } { } G = N, E N n N n o, n h O P n o n n h n h E n i n j n i n j e E e b, e c, e d {False, True} C D e b e b e c n i e d n i C c C C = {None, Local, Global}

None Local local local local local Global global global D = {None, Backward, Forward} None None Backward n 1 0 Backward Forward 0 n 1 Forward Global None n n

(N, L,...,L) L (N,...,N) N (N, L, G) N (N, N, N) N (N, L) N (N, N, N) N (N, L) N (N, N, N) N (G, L) L (N, N) N (L, L, G) N (N, N, N) N (N, L,...,L) L (N,...,N) N (N, L, G) N (N, N, N) N (N, L) N (N, F, N) F (N, L) N (N, B, N) B (G, L) L (N, N) N (L, L, G) N (N, N, N) N

n n o n h n n e n n ec n ed n e c < n ec (n ed = None) (e d = n ed ) e b = True n n

!"#"$$$%&'(")&'*("$+&'**("$$*("$,&'("-&'***" "!./012.)"#"$'(")(",*" "!./012.%"#"$'(")("+("+("-*" "!./343"""#"$%&'(")&'("+&'(",&'("-&'*" [1, 2, 3] [1, 2, 3.0] [[1, 2], [3]] [[1, 2],3]

n m ψm (ψ + n 1)m ψmn ψ

3, 3, Unit [[1, 2, 3], [4, 5, 6], [7, 8, 9]] (3, 3) (3, 1) [1, 2, 3, 4, 5, 6, 7, 8, 9] (1, 3) [1, 4, 7, 2, 5, 8, 3, 6, 9] (1, 4) [1, 4, 7,, 2, 5, 8,, 3, 6, 9, ]

6.39s 0.52s 195µs 6.04s 0.49s 202µs 5.82s 0.49s 171µs 7.46s 0.63s 360µs

{ } { }

1 7 0 0 A = 0 2 8 0 5 0 3 9 0 6 0 4 M K K K M = 4 K = 3 K M

Ax = b A

A M 1 x 0 b n r 0 = b Ax 0 z 0 = M 1 r 0 p 0 = z 0 k = 0 rt k z k α k = pk T Ap k x k+1 = x k + α k p k r k+1 = r k α k Ap k z k+1 = M 1 r k+1 β k = zt k+1 r k+1 z T k r k p k+1 = z k+1 + β k p k k = k + 1 k > n M 1 a, b, c [ ] ai b m i = i [ ] 1 [ ] p i = m 1 ai b i = i 1 ci b = i b i c i a i c i b i b i b i a i b i c i

v i V v i V i v 1 n i V n i α i β i ff fi

A v 1 V i = 1 z i = Av i α i = z T i V i z i = z i VV T z i β i = z i V i+1 = z i /β i i = i + 1

T j j j α 1, α 2,...,α j β 1, β 2,...,β j 1 SΘS = T j

x R n y { 1, 1} x i, i {1,..., l} y i, i {1,..., l} max α F(α) = subject to l i=1 α i 1 2 αt Qα 0 α i C, i 1...l y T α = 0 x i R n i y i { 1, 1} x i α i C Q ij = y i y j Φ(x i, x j ) Φ(x i, x j ) Φ(x i, x j ; γ) =exp { γ x i x j 2} γ

α i x i y i i {1..l} τ α i f i = y i i {1..l} b high b low i high i low α ihigh α ilow f i i {1..l} b high i high b low i low α ihigh α ilow b low b high + 2τ b high = 1 i high = min{i : y i = 1} b low = 1 i low = min{i : y i = 1} i high i low α i low = α ilow + y ilow (b high b low )/η α i high = α ihigh + y ilow y ihigh (α ilow α i low ) η = Φ(x ihigh, x ihigh )+Φ(x ilow, x ilow ) 2Φ(x ihigh, x ilow ) α i low α i high 0 α i C f i = l j=1 α jy j Φ(x i, x j ) y i α f f i = f i +(α i high α ihigh )y ihigh Φ(x ihigh, x i ) +(α i low α ilow )y ilow Φ(x ilow, x i )

I high = {i :0< α i < C} {i : y i > 0, α i = 0} {i : y i < 0, α i = C} I low = {i :0< α i < C} {i : y i > 0, α i = C} {i : y i < 0, α i = 0} ɛ {i : ɛ < α i < (C ɛ)} b high = min{ f i : i I high } b low = max{ f i : i I low } b low b high + 2τ i high i low α i high = arg min{ f i : i I high } i low = arg max{ f i : i I low }

1 ψ 1000 5 ψ > 1000 5 50 45% 100% 10% 33% 9

45 100%

{Distributed, Sequential}

45 100%