GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis Capital Asset Management GmbH preis@uni-mainz.de With thanks to: Peter Virnau, Wolfgang Paul, and Johannes J. Schneider
GPGPU computing GT200 single precision G80 Realistic illustrations Driving force: computer game industry NV30 3.2 GHz Harpertown Source: NVIDIA CUDA programming guide 2003 2006 2008
GPU device architecture GeForce GTX 280: Global memory 24 MB Number of multiprocessors 30 Number of cores 240 Constant memory 64 kb Shared memory 16 kb Clock rate 1.30 GHz
GPU device / Reference system GeForce GTX 280: Global memory 24 MB Number of multiprocessors 30 Number of cores 240 Constant memory 64 kb Reference CPU: Intel Core 2 Quad 2.66 GHz Shared memory 16 kb Clock rate 1.30 GHz Cache size 4096 KB
C code with extensions global void gpu_function(int n, float* a, float* b) { } //Determine array element int i = threadidx.x + blockidx.x * blockdim.x; if(i<n) b[i] += a[i] * a[i]; Block 0 Block 1 Block 2... host void cpu_function() { int n = 128 * 128; int n_blocks = 128; int n_threads = 128; gpu_function<<<n_blocks,n_threads>>>(n, a, b); // Global barrier between GPU functions Thread 0 Block 1 Thread 1 Thread 2... } gpu_function<<<n_blocks/2,n_threads*2>>>(n, a, b);
Linear congruential RNGs x i+1,j =(a x i,j + c) mod m x 0,j+1 = (16807 x 0,j ) mod m a = 1664525 c = 13904223 32 bit architecture provided by the GPU x i,j [ 2 31 ;2 31 1] y i,j = abs ( x i,j /2 31) 4.656612 abs (x i,j )
Computation times Random numbers Time [ms] 8 7 6 5 4 3 2 Acceleration " 0 0 50 0 50 0 0 200 s Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 1 0 β = Total processing time on CPU Total processing time on GPU!1!2 50 0 0 200 250 Block number s
Ising model H = J i,j S i S j H i S i nearest neighbors Spin update: Metropolis criterion W a b = exp( H /k B T ) if H > 0 W a b =1 H 0 if
2D Ising: GPU implementation Noninteracting domains where Monte Carlo moves are performed in parallel Checkerboard algoritm
Computation times 2D Time [ms] 9 8 7 6 5 4 3 Acceleration " 60 50 40 30 20 0 2 4 2 5 2 6 2 7 2 8 2 9 n/2 Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 2 1 0!1 2 4 2 5 2 6 2 7 2 8 2 9 Block size n/2
Binder cumulant 2D kbtc = 2.269185 J "M (T )4 # U4 (T ) = 1 3"M (T )2 #2 0.7 0.6 U4 0.5 Critical temperature 0.4 0.3 0.2 0.1 n/2 = 16 n/2 = 32 n/2 = 64 n/2 = 128 n/2 = 256 2.245 2.250 2.255 2.260 2.265 2.270 2.275 2.280 2.285 2.290 2.295 kbt [J]
Ising model 3D H = J i,j S i S j H i S i nearest neighbors
3D Ising: GPU implementation Noninteracting domains where Monte Carlo moves are performed in parallel
Computation times 3D Time [ms] 9 8 7 6 5 4 3 Acceleration " 35 30 25 20 2 4 2 5 2 6 2 7 n/2 Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU Speedup factor 2 1 0!1 2 4 2 5 2 6 2 7 Block size n/2
Binder cumulant 3D 0.7 0.6 k B T C = {4.53 J, 4.51 J} Critical temperature U 4 0.5 0.4 U 4 (T )=1 M(T )4 3 M(T ) 2 2 0.3 0.2 0.1 n/2 = 16 n/2 = 32 n/2 = 64 n/2 = 128 4.47 4.48 4.49 4.50 4.51 4.52 4.53 4.54 4.55 4.56 4.57 4.58 4.59 k B T [J]
GPGPU / Time Series Analysis
GPGPU / Time Series Analysis
Random Walk
20.06.2008 German Stock Index (Dax).09.2008 19.09.2008 PDF of returns Autocorrelation Pattern Conformity #$%&'()!!!!!"!"" Hurst Exponent./01('23,''4 "!!!!!"!""!&!"!5,!,!+!*!,'!&-!&'!- '!!!!"! " &' &-,' φ( p( t )) = u exp( v p2 )
GPU computing / Hurst exponent p(t + t) p(t) q 1/q t H q( t)
GPU computing / Hurst exponent Time [ms] 7 6 5 4 3 2 Acceleration # 80 60 40 20 0 2 4 6 8 12 14! Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Time on GPU for post processing Time on GPU for final processing Total processing time on GPU Total processing time on CPU 1 0 "1 "2 2 4 6 8 12 14 Length parameter!
GPU computing / Hurst exponent p(t) [percent of par value] 116.5 116.0 1.5 1.0 114.5 114.0 113.5 113.0 112.5 112.0 111.5 t [ 5 units of time tick] FGBL JUN 2007 1 2 3 4 5 6 7 8 9 Hurst exponent H(!t) 0.5 0.4 0.3 0.2 0.1 Random walk FGBL (CPU) FGBL (GPU) Relative error " [%] 0.06 0.04 0.02 0.00 0.0 0 200 300 400 500 Time lag!t [units of time tick] 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 Time lag!t [units of time tick] TP, P. Virnau, W. Paul, and J. J. Schneider, Preprint submitted (2009)
Fluctuation Patterns The aim is to compare the current reference pattern of time interval length t with all previous patterns in the time series. p(t) [units of points] 6120 61 60 (a)! t=75 6090 "(t) 2500 2000 00 00 (b)
Fluctuation Patterns 1.0 True range adapted modified time series p(t) p p t l ( t, ) (t) = p h ( t, ) p l ( t, ) p t (t) ~ "t p t^ (t) ~ "t p (t!#) t^!# 0.5 ~ p "t (t^! 1) t^ 0.0 p t (t) [0; 1] t [ t ; ) ^ t!"t ^ t! 1 t^
Fluctuation Patterns Mean-square quality between current and comparison sequence with Q t (τ) = Q t (τ) [0, 1] t θ=1 ( p t ( θ) p t τ ( τ θ) 1.0 t ) 2 ~ "t p t^ (t) ~ "t p (t!#) t^!# 0.5 ~ p "t (t^! 1) t^ 0.0 ω t (τ t + )= ( p t ^ t!"t ) ( 1+ t + ) p t ( 1) ( p t ^ t! 1 t^ In order to quantify the value of reference and comparison pattern relative to the reference point, one can define..., ) τ ( τ 1+ t + ) p t ( 1)
Fluctuation Patterns Observable for pattern conformity:, ξ χ ( t + t ) = T t + = t τ =τ ( sgn ( exp ω t χq t ) (τ t, + ) ) (τ) Limitation: τ = { ˆτ if ˆτ t 0 t else Normalized pattern conformity:, Ξ χ ( t + t ) = Definition: T t + = t sgn (x) = 1 for x>0 0 for x =0 1 for x<0 ( ξ χ t + t ) ( ) sgn ω t (τ t, + ) ( ) exp (τ) τ =τ, χq t
Pattern Conformity / Trivial Cases Straight Line Random Walk (a) (b) 1.002 0.002 1.001 0.001 " 1.000 " 0.000 0.999 0.001 0.998 25 20 5 5 20 0 25!t +!t 0 0.002 25 20 5 5 20 0 25!t +!t 0
Pattern Conformity / FDAX (a) (b) " 0.25 0.20 0. 0. 0.05 0.00 25 20 5 5 20 0 25!t +!t Complex correlations for financial market time series especially for large pattern lengths. 0 " * 0. 0.08 0.06 0.04 0.02 0.00 25 20 γ =0.16 5 5 20 0 25!t +!t Ξ =Ξ FDAX χ=0 Ξ ACRW χ=0 Q t (τ) =Q p, t (τ) 0
Inclusion of volumes and ITWT (c) T. Preis et al., Europhys. Lett. 82, 68005 (2008) (d) " * 0. 0.08 0.06 0.04 " * 0.06 0.04 0.02 0.02 0.00 25 20 5 5 20 0 25!t +!t 0 0.00 25 20 5 5 20 0 25!t + Same structure high values of the pattern conformity Q t (τ) =Q p, t (τ)+q v, t (τ) Q t!t (τ) =Q p, t 0 (τ)+q ι, t (τ)
GPU computing / Pattern Conformity (a) " (b) " 0.8 0.6 0.0 25 5 20 5 20 0 25!t + 0.8 0.6 (c) 0.4 0.2 0.4 0.2 0.0 25 5 20 5 20 0 25!t + 0.20 0. " # 0. 0.05!t!t 0.00 25 5 20 5 20 0 25!t +!t 0 0 0 Time [ms] 12 11 9 8 7 6 5 4 3 2 1 0 "1 "2 Acceleration # 5 2 0 2 1 2 2 2 3 2 4 2 5! Time on GPU for allocation Time on GPU for memory transfer Time on GPU for main function Total processing time on GPU Total processing time on CPU 2 0 2 1 2 2 2 3 2 4 2 5 Scan interval parameter!
Final remarks TP, PV, WP, and JJS, GPU Accelerated Monte Carlo Simulation of 2D and 3D Ising Model, J. Comp. Phys. 228, 4468 4477(2009) TP, PV, WP, and JJS, Accelerated Fluctuation Analysis by Graphic Cards and Complex Pattern Formation in Econophysics, Preprint submitted (2009) Source code available: www.tobiaspreis.de
Thank you!