GPU Accelerated Monte Carlo Simulations and Time Series Analysis

Size: px

Start display at page:

Download "GPU Accelerated Monte Carlo Simulations and Time Series Analysis"

Dorcas Price
10 years ago
Views:

1 GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis Capital Asset Management GmbH With thanks to: Peter Virnau, Wolfgang Paul, and Johannes J. Schneider

Department of Physics, Boston University Artemis Capital Asset Management GmbH

2 GPGPU computing GT200 single precision G80 Realistic illustrations Driving force: computer game industry NV GHz Harpertown Source: NVIDIA CUDA programming guide

3 GPU device architecture GeForce GTX 280: Global memory 24 MB Number of multiprocessors 30 Number of cores 240 Constant memory 64 kb Shared memory 16 kb Clock rate 1.30 GHz

4 GPU device / Reference system GeForce GTX 280: Global memory 24 MB Number of multiprocessors 30 Number of cores 240 Constant memory 64 kb Reference CPU: Intel Core 2 Quad 2.66 GHz Shared memory 16 kb Clock rate 1.30 GHz Cache size 4096 KB

240 Constant memory 64 kb Reference CPU: Intel Core 2 Quad

5 C code with extensions global void gpu_function(int n, float* a, float* b) { } //Determine array element int i = threadidx.x + blockidx.x * blockdim.x; if(i<n) b[i] += a[i] * a[i]; Block 0 Block 1 Block 2... host void cpu_function() { int n = 128 * 128; int n_blocks = 128; int n_threads = 128; gpu_function<<<n_blocks,n_threads>>>(n, a, b); // Global barrier between GPU functions Thread 0 Block 1 Thread 1 Thread 2... } gpu_function<<<n_blocks/2,n_threads*2>>>(n, a, b);

$.. host void cpu_function() { int n = 128 * 128; int n_blocks = 128; int n_threads = 128;$

6 Linear congruential RNGs x i+1,j =(a x i,j + c) mod m x 0,j+1 = (16807 x 0,j ) mod m a = c = bit architecture provided by the GPU x i,j [ 2 31 ;2 31 1] y i,j = abs ( x i,j /2 31) abs (x i,j )

13904223 32 bit architecture provided by the GPU x i,j

7 Computation times Random numbers Time [ms] Acceleration " s Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 1 0 β = Total processing time on CPU Total processing time on GPU!1! Block number s

function Total processing time on GPU Total processing time on CPU Speedup factor 1 0 β

8 Ising model H = J i,j S i S j H i S i nearest neighbors Spin update: Metropolis criterion W a b = exp( H /k B T ) if H > 0 W a b =1 H 0 if

9 2D Ising: GPU implementation Noninteracting domains where Monte Carlo moves are performed in parallel Checkerboard algoritm

10 Computation times 2D Time [ms] Acceleration " n/2 Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 2 1 0! Block size n/2

transfer Time on GPU for main function Total processing time on GPU Total

11 Binder cumulant 2D kbtc = J "M (T )4 # U4 (T ) = 1 3"M (T )2 # U4 0.5 Critical temperature n/2 = 16 n/2 = 32 n/2 = 64 n/2 = 128 n/2 = kbt [J]

12 Ising model 3D H = J i,j S i S j H i S i nearest neighbors

13 3D Ising: GPU implementation Noninteracting domains where Monte Carlo moves are performed in parallel

14 Computation times 3D Time [ms] Acceleration " n/2 Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 2 1 0! Block size n/2

transfer Time on GPU for main function Total processing time on GPU

15 Binder cumulant 3D k B T C = {4.53 J, 4.51 J} Critical temperature U U 4 (T )=1 M(T )4 3 M(T ) n/2 = 16 n/2 = 32 n/2 = 64 n/2 = k B T [J]

16 GPGPU / Time Series Analysis

17 GPGPU / Time Series Analysis

18 Random Walk

19 German Stock Index (Dax) PDF of returns Autocorrelation Pattern Conformity #$%&'()!!!!!"!"" Hurst Exponent./01('23,''4 "!!!!!"!""!&!"!5,!,!+!*!,'!&-!&'!- '!!!!"! " &' &-,' φ( p( t )) = u exp( v p2 )

2008 PDF of returns Autocorrelation Pattern Conformity

20 GPU computing / Hurst exponent p(t + t) p(t) q 1/q t H q( t)

21 GPU computing / Hurst exponent Time [ms] Acceleration # ! Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Time on GPU for post processing Time on GPU for final processing Total processing time on GPU Total processing time on CPU 1 0 "1 " Length parameter!

22 GPU computing / Hurst exponent p(t) [percent of par value] t [ 5 units of time tick] FGBL JUN Hurst exponent H(!t) Random walk FGBL (CPU) FGBL (GPU) Relative error " [%] Time lag!t [units of time tick] Time lag!t [units of time tick] TP, P. Virnau, W. Paul, and J. J. Schneider, Preprint submitted (2009)

23 Fluctuation Patterns The aim is to compare the current reference pattern of time interval length t with all previous patterns in the time series. p(t) [units of points] (a)! t= "(t) (b)

24 Fluctuation Patterns 1.0 True range adapted modified time series p(t) p p t l ( t, ) (t) = p h ( t, ) p l ( t, ) p t (t) ~ "t p t^ (t) ~ "t p (t!#) t^!# 0.5 ~ p "t (t^! 1) t^ 0.0 p t (t) [0; 1] t [ t ; ) ^ t!"t ^ t! 1 t^

25 Fluctuation Patterns Mean-square quality between current and comparison sequence with Q t (τ) = Q t (τ) [0, 1] t θ=1 ( p t ( θ) p t τ ( τ θ) 1.0 t ) 2 ~ "t p t^ (t) ~ "t p (t!#) t^!# 0.5 ~ p "t (t^! 1) t^ 0.0 ω t (τ t + )= ( p t ^ t!"t ) ( 1+ t + ) p t ( 1) ( p t ^ t! 1 t^ In order to quantify the value of reference and comparison pattern relative to the reference point, one can define..., ) τ ( τ 1+ t + ) p t ( 1)

26 Fluctuation Patterns Observable for pattern conformity:, ξ χ ( t + t ) = T t + = t τ =τ ( sgn ( exp ω t χq t ) (τ t, + ) ) (τ) Limitation: τ = { ˆτ if ˆτ t 0 t else Normalized pattern conformity:, Ξ χ ( t + t ) = Definition: T t + = t sgn (x) = 1 for x>0 0 for x =0 1 for x<0 ( ξ χ t + t ) ( ) sgn ω t (τ t, + ) ( ) exp (τ) τ =τ, χq t

27 Pattern Conformity / Trivial Cases Straight Line Random Walk (a) (b) " " !t +!t !t +!t 0

28 Pattern Conformity / FDAX (a) (b) " !t +!t Complex correlations for financial market time series especially for large pattern lengths. 0 " * γ = !t +!t Ξ =Ξ FDAX χ=0 Ξ ACRW χ=0 Q t (τ) =Q p, t (τ) 0

29 Inclusion of volumes and ITWT (c) T. Preis et al., Europhys. Lett. 82, (2008) (d) " * " * !t +!t !t + Same structure high values of the pattern conformity Q t (τ) =Q p, t (τ)+q v, t (τ) Q t!t (τ) =Q p, t 0 (τ)+q ι, t (τ)

30 GPU computing / Pattern Conformity (a) " (b) " !t (c) !t " # !t!t !t +!t Time [ms] "1 "2 Acceleration # ! Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Scan interval parameter!

31 Final remarks TP, PV, WP, and JJS, GPU Accelerated Monte Carlo Simulation of 2D and 3D Ising Model, J. Comp. Phys. 228, (2009) TP, PV, WP, and JJS, Accelerated Fluctuation Analysis by Graphic Cards and Complex Pattern Formation in Econophysics, Preprint submitted (2009) Source code available:

32 Thank you!

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis