Hw t build a 7 nde Raspberry Pi Hadp Cluster Intrductin Inspired by a desire t learn mre abut Hadp and the fact I already wned a Raspberry Pi I wndered whether anyne had yet built a Hadp cluster based n this hbby cmputers. I wasn t surprised t discver that peple have already dne this and the fllwing instructins are the where I started: Carsten Mönning: http://scn.sap.cm/cmmunity/bi-platfrm/blg/2015/04/25/a-hadp-data-labprject-n-raspberry-pi--part-14 Jnas Widrikssn: http://www.widrikssn.cm/raspberry-pi-hadp-cluster/ Jnas s instructins are based n Hadp versin 1.0 and Carsten s is based n versin 2.x If, like me, yu re interested in building with the newer versin f Hadp then fllw Carsten s instructins but read thrugh Jnas s t because he prvides useful links fr dwnlading the Raspian (Linux Operating System built specifically fr the Raspberry Pi) distributin as well as cmmands and example files fr testing yur cluster. The first stage is t build a single nde cluster where yur ne nde perfrms all tasks such as NameNde, Secndary NameNde and DataNde. Once yu have this up-and-running yu re reading t add a secnd nde. This secnd nde will be a dedicated DataNde frm which yu will clne all subsequent DataNdes. Creating the secnd nde is slightly mre difficult which is why I decided t write this pst in the hpe that it will save thers time and effrt. Hardware
Fr a seven nde Raspberry Pi (Mdel 2 B+ used here) Hadp cluster yu will need the fllwing hardware: 7x Raspberry Pi Mdel 2 B+ (ThePiHut.cm) 1x 10GigaBit Switch (8 Prts TP-Link) (ebuyer.cm) 1x USB Pwer Hub (7 Pwered Prts by StarTech.cm) (ebuyer.cm) 7x 8GB Class 10 MicrSD cards - NB mine came with the NOOBs Operating System preinstalled which shrinks the size t 3GB. Yu can expand this partitin t use the whle disk frm within the raspi-cnfig utility but I used fdisk t delete the existing partitin and create a new ne (a refrmat is required after ding this). (ThePiHut.cm) 6x 19.05mm standffs (MdMyPi.cm) 7x Angled USB t MicrUSB cables (ebuyer.cm) 7x Shrt Ethernet Cables (ebuyer.cm) 7x 32GB Class 10 USB sticks (ebuyer.cm) 1x Case t put it all in Optinal Building the Single Nde Cluster S, as stated previusly, fllw these instructins t create yur NameNde/Single Nde Cluster: http://scn.sap.cm/cmmunity/bi-platfrm/blg/2015/04/25/a-hadp-data-lab-prject-nraspberry-pi--part-14 Adding a new nde t the cluster Once yur NameNde (192.168.0.110 nde1) is up and running adjust the fllwing files t include yur planned/subsequent ndes. In rder t edit the system wned files yu ll need t switch user r sud. The simplest editr t use n this flavur f Linux is nan : $ sud nan /etc/hsts Add all the ndes yu plan t have in yur cluster: ~ 192.168.0.110 nde1 192.168.0.111 nde2 192.168.0.112 nde3 192.168.0.113 nde4 192.168.0.114 nde5 192.168.0.115 nde6 192.168.0.116 nde7 The Hadp specific settings files, fr versins 2.x can be fund in the fllwing directry:
/pt/hadp/etc/hadp Tw ntable files are the master and slaves file. Ensure that nde1 is the nly line listed in yur master file and all ndes (if yu cnfigure yur nde1 t still be a DataNde) in the slaves file. Fr simplicity I have zipped up the files I had t update s yu may cpy them directly t yur nde1 and nde2 Pi s r use them as reference: http://www.nigelpnd.cm/uplads/nde1-cnfig-files.zip http://www.nigelpnd.cm/uplads/nde2-cnfig-files.zip NB: Because yu'll be temprarily cnnecting t an IP address f an existing Pi it's prbably wrth having nly the NameNde pwered up when perfrming this step. 1. Clne the NameNde MicrSD card and duplicate the image file n t a new MicrSD card. On Linux the cmmands t d this are as fllws. Nte that when munting yur MicrSD card yur device name may differ frm /dev/sdb: $ sud dd if=/dev/sdb f=nde1.img $ sud dd if=nde1.img f=/dev/sdb 2. Frmat the USB strage device - giving it a label f 'hdfs'. Again I used Linux t frmat the device and used the EXT4 file system. 3. Put yur clned MicrSD card and USB strage stick int yur new Raspberry Pi 4. Attach the Ethernet netwrk cable & pwer cable 5. SSH int the Pi using the IP address f the clned Pi using the hduser/passwrd credentials: $ ssh hduser@192.168.0.110
6. Cnfigure the Raspberry Pi using the raspi-cnfig utility - yu'll be prmpted t rebt when yu've finished making this set f changes: $ sud raspi-cnfig Change the nde name (i.e. frm nde1 t nde2) Check: enable SSH server Check: memry split - shuld be 16M Overclcking t Pi2 (rebt) 8. After the rebt frm exiting raspi-cnfig wait a few secnds and then SSH back in again. Edit the /etc/netwrk/interfaces file and change the IP address t the next available ne: $ sud nan /etc/netwrk/interfaces 9. Check the /etc/hsts file cntains ALL the IP addresses and nde names f ALL yur Pi's: $ cat /etc/hsts 10. Rebt the Pi $ sud rebt 11. SSH back int the Pi using the new IP address: $ ssh hduser@192.168.0.111 12. Frm the new Pi nde SSH int the NameNde - say 'yes' t questin: $ ssh nde1 13. On the NameNde SSH t the new nde name - say 'yes' t questin and then return t nde2: $ ssh nde2 $ exit 14. Creating the /hdfs data partitin. Change directry int HDFS partitin. Create the /tmp flder and change wnership. $ sud mkdir -p /hdfs/tmp $ sud chwn -R hduser:hadp /hdfs $ sud chmd 750 /hdfs/tmp 15. Add an entry t the /etc/fstab file s that the /hdfs partitin is munted n bt $ sud nan /etc/fstab Make it match this but remember the /dev/sd* may differ:
15. Make the required /pt/hadp/etc/hadp file changes - we're changing the purpse f this machine frm NameNde + DataNde t just DataNde. Check the afrementined nde2-cnfigfiles.zip fr reference. 16. On the NameNde (nde1) Start DFS and YARN: /pt/hadp/sbin/start-dfs.sh /pt/hadp/sbin/start-yarn.sh On the DataNde, nde2, take a lk in the /hdfs/tmp directry and yu shuld see that the NameNde has created a whle bunch f sub-directries. $ ls lrt /hdfs/tmp Check t see that the DataNde tasks are running with the jps cmmand: 17. Brwse t the NameNde web interface: http://192.168.0.110:8088/ and ensure yu can see the new nde. Here s a screensht shwing all 7 ndes included:
18. Running MapReduce tasks - the instructins are identical t thse fund n Jnas s site except there are sme subtle updates t the hdfs cmmands fr versin 2.x f Hadp. The steps I used are: $ hadp fs -cpyfrmlcal mediumfile.txt /mediumfile.txt $ yarn jar /pt/hadp/share/hadp/mapreduce/hadp-mapreduce-examples-2.6.0.jar wrdcunt /mediumfile.txt /mediumfile-ut Yu can mnitr the running tasks frm the NameNde web interface. Here are sme screenshts f my test runs: 19. When yu re happy with yur first DataNde, and yu re ready t add anther, g thrugh steps 1-8 and 10-13 t create each subsequent nde. If yu ve perfrmed any tests, laded any data, yu will have t clear dwn the /hdfs/tmp directry t ensure all ndes are in-sync. When yu start DFS frm the NameNde (nde1) it will recreate the file structure required n each nde.
The Finished Cluster Next Steps/Still t d: Get a ruter s that it's n its wn netwrk. This will enable the use f the web interface URL's that use the nde names. Currently my hme ruter is trying t DNS the IP addresses s these links dn't wrk. Write sme scripts, r find tl, fr running cmmands acrss all ndes. Especially fr cmmands like startup/shutdwn and clearing dwn the /hdfs partitin. I m thinking Chef r Puppet at this stage as I think they wuld be useful tls t learn abut.