Kerrighed: use cases Cyril Brulebois cyril.brulebois@kerlabs.com Kerrighed http://www.kerrighed.org/ Kerlabs http://www.kerlabs.com/ 1 / 23
Introducing Kerrighed What s Kerrighed? Single-System Image (SSI) cluster system Patched Linux kernel, plus userland tools Started at INRIA in 1999, collaboration with University of Rennes 1 and EDF Since 2006, mainly developed by Kerlabs, an INRIA spin-off Released under the GPL Last releases: Kerrighed 2.4.4, based on Linux 2.6.20 (January 29th, 2010) Kerrighed 3.0.0, based on Linux 2.6.30 (June 14th, 2010) 2 / 23
What s new? What s planned? Memory CPU IPC Network C/R Misc Sharing Injection Process migration Thread migration Pool of threads migration Configurable scheduler SysV POSIX Migratable streams Cluster IP High performance Single process Applications Open files File substitution IPC Node hotplug v2.4.4 v3.0.0 in 2011 3 / 23
Managing nodes (1/2) To manage cluster and nodes, a single command: krgadm. # krgadm cluster status status: up on 4 nodes # krgadm nodes status 1:online 2:online 3:online 4:online 5:present 6:present 7:present 8:present 4 / 23
Managing nodes (2/2) Adding 2 nodes: # krgadm nodes add --count 2 Waiting for 2 nodes to join... done Adding nodes [5,6]... done # krgadm cluster status status: up on 6 nodes Adding 2 nodes, in a different way: # krgadm nodes add --total 8 Waiting for 2 nodes to join... done Adding nodes [7,8]... done # krgadm cluster status status: up on 8 nodes 5 / 23
Managing features Use of capabilities to turn Kerrighed features on/off as appropriate. Examples: krgcapset -d +DISTANT_FORK krgcapset --pid 6291711 -e +CAN_MIGRATE Also possible through libkerrighed. Most common capabilities: DISTANT FORK: may fork remotely. CAN MIGRATE: may be migrated while running. CHECKPOINTABLE: may be checkpointed. SEE LOCAL PROC STAT: only see local resources. 6 / 23
1. Introducing Kerrighed What s Kerrighed? What s new? What s planned? Managing nodes Managing features 2. Load balancing Use case 1: build platform Use case 2: network computing Use case 3: distributed rendering Use case 4: webservers Use case 5: parallel computing Use case 6: LTSP There s more! 3. Checkpoint/restart Use case 1: long running computations Use case 2: playing with sockets 7 / 23
Use case 1: build platform Setup: $C cores Capability: DISTANT FORK Trivial: make -j$c 8 / 23
Use case 2: network computing Capabilities: DISTANT FORK and/or CAN MIGRATE Example: BOINC (Berkeley project), running @home, PrimeGrid, etc. How: Just start BOINC! It runs as many children as there are cores. It starts new children as they return. 9 / 23
Use case 3: distributed rendering Setup: $C cores Capabilities: DISTANT FORK and/or CAN MIGRATE Example: Blender, rendering a 1 $N frame range. How: Blender is able to render frames by batch, either a single frame at once, or a frame range. blender -b foo.blender -F PNG -o //render_####.png -f $i 10 / 23
Naive approach Trivial implementation: Spawn $C processes. Wait for all of them to return. Back to spawning until the last frame is rendered. Issue: if some frames are quicker to render than others, the global wait will leave some cores idle. 11 / 23
Smarter approach A more efficient implementation: Spawn $C processes. Wait for one of them to return. Spawn a new process unless the last frame has been reached. Back to waiting. That ensures $C processes running all the time until the end, almost no idling. Many other renderers and mostly anything scriptable can be run this way, with this single and simple job scheduler. No need for complex client/server solutions. 12 / 23
Use case 4: webservers Capabilities: DISTANT FORK and/or CAN MIGRATE Example: Apache MPM worker (Multi-Processing Module), prefork version (non-threaded, pre-forking). Drawback: Socket handling. Short-term solution: Enable an extra capability to have all sockets listen on a given node, which acts as an entry point. Long-term solution: Use cluster IP. 13 / 23
Use case 5: parallel computing Capabilities: DISTANT FORK and/or CAN MIGRATE Example: R and its multicore package. Code: Replace %do% with %dopar% library("domc") registerdomc() x <- foreach(i=1:42) %dopar% svd(matrix(rnorm(1000*1000),ncol=1000)) Cores are automatically detected, but the worker count can be tweaked by calling: options(nodes=42) 14 / 23
Use case 6: LTSP Capabilities: DISTANT FORK and/or CAN MIGRATE Example: Run one VNC server per user on the first node. Launched applications get load-balanced over the whole cluster. Possible issue with desktop environments: Heavy use of local networking services (e.g. D-Bus). Possible solutions: Same as the web servers use case: use an extra capability to direct all sockets to a given node. Cluster IP? Probably a bit more complex: loopback, global address space for UNIX sockets, etc. 15 / 23
There s more! Schedulers for DISTANT FORK and CAN MIGRATE can be tweaked, extended, or replaced. Configurable through configfs, thanks to: Probes (e.g. free RAM). Filters (e.g. reaching some threshold). Policies. Process sets and node sets. Some possible policies, in addition to load balancing: Swap avoidance Disk I/O balancing Slightly more complex: keep interactive applications local, move others away, and welcome them back when there are no more interactive applications (use case: Network of Workstations in universities). 16 / 23
1. Introducing Kerrighed What s Kerrighed? What s new? What s planned? Managing nodes Managing features 2. Load balancing Use case 1: build platform Use case 2: network computing Use case 3: distributed rendering Use case 4: webservers Use case 5: parallel computing Use case 6: LTSP There s more! 3. Checkpoint/restart Use case 1: long running computations Use case 2: playing with sockets 17 / 23
Use case 1: long running computations Why: Even with parallel algorithms running on powerful clusters, computations can take hours, days, weeks, or more. Checkpoint/Restart useful in case of hardware failures, system errors, etc. Example: R. How: Either enable the CHECKPOINTABLE capability, or use a wrapper which also creates a new session for the program to be checkpointed: krgcr-run -- R Create a checkpoint every hour: while :; do checkpoint $(pgrep R head -1); sleep 3600; done 18 / 23
Step by step (1/2) Starting: $ krgcr-run R Running application 6291648 R version 2.11.1 (2010-05-31) [...] > Checkpointing: $ checkpoint 6291648 Freezing application in which process 6291648 is involved... Checkpointing application in which process 6291648 is involved... Identifier: 6291648 Version: 1 Description: No description Date: Thu Jul 8 22:54:06 2010 Unfreezing application in which process 6291648 is involved... 19 / 23
Step by step (2/2) Contents of the checkpoint: $ ls /var/chkpt/6291648/v1/ description.txt node_1.bin task_6291648.bin global.bin shared_obj_1.bin user_info_1.txt Restarting: $ restart 6291648 1 Restarting application 6291648 (v1)... Application 6291648 has been successfully restarted 20 / 23
Use case 2: playing with sockets Problem: checkpointing sockets isn t supported yet. Solution: Force checkpointing: -i option for checkpoint. Use file descriptor substitution. Plugging a given file descriptor on a given checkpoint identifier: $ cat /var/chkpt/6291711/v1/user_info_1.txt tty 0001FFFF88003ED8AC48 /dev/pts/1 6291711:0,6291711:1,6291711:2 socket 0001FFFF88007E5F3168 socket:[162646] 6291711:3 $ restart -s 0001FFFF88007E5F3168,0 6291711 1 Restarting application 6291711 (v1)... Application 6291711 has been successfully restarted Future: Use that to restore the Unix socket to the X server? 21 / 23
Conclusion Kerrighed s strong features right now: Stability Flexibility Can be configured/tweaked to suit specific needs General solution for many common use cases Kerrighed s next features (short to mid-term): Performance More flexibility Partial thread support 22 / 23
The end Thanks for your attention! Questions? A few pointers: Want to play? http://www.kerrighed.org/ Want to talk? kerrighed.{users,dev}@listes.irisa.fr #kerrighed on Freenode Want to apply for a nice job? contact@kerlabs.com 23 / 23