Towards real-time image processing with Hierarchical Hybrid Grids

Size: px

Start display at page:

Download "Towards real-time image processing with Hierarchical Hybrid Grids"

Dominick McLaughlin
8 years ago
Views:

1 Towards real-time image processing with Hierarchical Hybrid Grids International Doctorate Program - Summer School Björn Gmeiner Joint work with: Harald Köstler, Ulrich Rüde August, 2011

2 Contents The HHG Framework Image processing for MRI Real-time processing 2

3 The HHG Framework 3

4 Combining finite element and multigrid methods FE mesh may be unstructured. What nodes to remove for coarsening? Not straightforward! Why not start from the coarse grid? The Hierarchical Hybrid Grids (HHG) concept Benjamin Bergen*: prototype Tobias Gradl: tuning, extensions and adaptivity * Dissertation in Erlangen, ISC award in Currently at Los Alamos Labs. 4

Advantages Properties of the HHG approach Multigrid is straightforward Very memory efficient Massive performance benefits on current computer

5 Advantages Properties of the HHG approach Multigrid is straightforward Very memory efficient Massive performance benefits on current computer architectures Subserves parallelization unknowns are possible Limitation Coarse input grid needed Adaptivity (ongoing work by Tobias Gradl) 5

6 Two-grid cycle (correction scheme) 6

7 HHG Primitives (2d-example) inner points (macro) vertex points (macro) edge points ghost points communication 7

8 Weak scalability of HHG on Blue Gene/P (Jugene) Cores Struct. Regions Unknowns CG Time

9 Image processing for MRI 1. Denoising by homogeneous diffusion 2. High dynamic range compression 9

10 Domain generation (typical size: e.g ) 1. Static domain partitioning, parallel file reading 2. Find relevant (information containing) regions 3. Distribute only relevant regions equally 10

11 1) Denoising by homogeneous diffusion Image with noise: u 0 = Ru + η R... linear operator incorporating blur (we assume R = Id) η... additive noise (e.g. white Gaussian noise) Simplest approach to reduce noise (better: anisotropic Diffusion): u u 0 = α u α... regularization parameter (α > 0) Variational formulation: a(u, v) = α u + uv dx, f (v) = Ω Ω u 0 v dx 11

12 Denoising by homogeneous diffusion (cont.) min J(u) := 1 a(u, u) f (u) 2 min J(u) := 1 α u u + u 2 dx u 0 u dx 2 Ω Ω min 2J(u) = α u u + u 2 2u 0 u dx min 2J(u) = Ω Ω α u u + u 2 2u 0 u + (u 0 ) 2 (u 0 ) 2 dx min Ω u 0 u 2 + α u 2 dx 12

13 The HHG Framework Image processing for MRI Real-time processing 2) High dynamic range compression Steps 1. compute gradient field 2. manipulate picture in the gradient domain (i.e. damp large gradients) 3. back transformation u = k( u 0 ) 13

14 Real-time processing 14

15 Objective platforms Jugene (FZ Jülich) lima (RRZE Erlangen) 4-way SMP processor 32-bit PowerPC 450 core 850 MHz Bandwidth: 13.6 GB/s 2 GB main memory 2 hexa-core processors Xeon 5650 Westmere MHz Bandwidth: 32 GB/s 24 GB main memory 15

16 5-point stencil example: Blue-Gene/P 1 f o r ( i n t j =1; j <t s i z e 1; ++j ) { 2 // l e x. update ( a l l p o i n t s ) 3 f o r ( i n t i =1; i <t s i z e 1; ++i ) { 5 u [ k t s i z e t s i z e + j t s i z e + i ] = 6 c [ 0 ] ( f [ j t s i z e+i ] + 8 c [ 1 ] u [ ( j +1) t s i z e + ( i ) ] + 9 c [ 2 ] u [ ( j ) t s i z e + ( i +1) ] + 10 c [ 3 ] u [ ( j ) t s i z e + ( i 1) ] + 11 c [ 4 ] u [ ( j 1) t s i z e + ( i ) ] ) ; 12 } 13 } 16

17 Disjoint optimization : Blue-Gene/P 1 double u2 = u ; 2 f o r ( i n t j =1; j <t s i z e 1; ++j ) { 3 // f i r s t update ( r e d p o i n t s o n l y ) 4 f o r ( i n t i =1; i <t s i z e 1; i +=2) { 5 #pragma d i s j o i n t ( u, f ) 6 #pragma d i s j o i n t ( u, u2 ) 7 #pragma d i s j o i n t ( u2, f ) 9 u2 [ k t s i z e t s i z e + j t s i z e + i ] = 10 c [ 0 ] ( f [ j t s i z e+i ] + 12 c [ 1 ] u [ ( j +1) t s i z e + ( i ) ] + 13 c [ 2 ] u [ ( j ) t s i z e + ( i +1) ] + 14 c [ 3 ] u [ ( j ) t s i z e + ( i 1) ] + 15 c [ 4 ] u [ ( j 1) t s i z e + ( i ) ] ) ; 16 } 17 // second update ( b l a c k p o i n t s o n l y ) 18 } 17

18 7-point stencil (Blue-Gene/P) MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 18

19 27-point stencil (Blue-Gene/P) 10 8 MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint, index opt Size 19

20 Different stencils (Blue-Gene/P) MStencil/s point stencil 15-point stencil 27-point stencil Size 20

10 Strong scaling (Blue-Gene/P) Time per V-cycle [s] 1 0.

21 10 Strong scaling (Blue-Gene/P) Time per V-cycle [s] ,000 20,000 30,000 40,000 50,000 Number of Cores Figure: Strong Scaling behavior of HHG on PowerPC 450 cores. This test case was performed starting from 512 cores, solving DoF. 21

22 7-point stencil (1 core per node, Westmere) 300 MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 22

23 7-point stencil (12 core per node, Westmere) MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 23

24 Next steps / Outlook Parallel file reading Implementation of varying coefficients Nonlinear isotropic and anisotropic diffusion regularizers 24

25 Thank you for you attention! Any questions? The development of HHG was funded by the Elite Network of Bavaria within the International Doctorate Program Identification, Optimization and Control with Applications in odern Technologies KONWIHR 25

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite