Chapter 1. Create a audit persistence on JOnAS with HBase (NoSQL)

Size: px

Start display at page:

Download "Chapter 1. Create a audit persistence on JOnAS with HBase (NoSQL)"

Audrey Bates
7 years ago
Views:

1 Chapter 1. Create a audit persistence on JOnAS with HBase (NoSQL) Table of Contents 1.1. Introduction Create a Hadoop Cluster to obtain an HDFS (Hadoop Distributed File System) Create Zookeeper cluster Create HBase cluster Start persistence Introduction Currently, there are 4 event types on JOnAS : WEB, EJB, JAXWS, JNDI that you can persiste on HBase. In this HOWTO, you will create a Hadoop and HBase cluster, there isn't any HOWTO to create this on a standalone mode. Create a cluster is a very good thing to separate informations and to obtain a low time wait to research with a very large base. Nevertheless, a cluster is costly (several machines) and not really interesting below 10 machines on the cluster. This HOWTO works with the following versions : JAVA JDK 6 Hadoop HBase Zookeeper Create a Hadoop Cluster to obtain an HDFS (Hadoop Distributed File System) 1. Download Java 1 2. Extract file 3. Place environment variable JAVA_HOME export JAVA_HOME=~/yourPathToJavaJDK-6 4. Download Hadoop 2 5. Extract file 6. Put it on each machine of the cluster. 7. Place environment variable HADOOP_HOME on each machine export HADOOP_HOME=~/yourPathToHadoop 1

2 8. Choice which machine will be the master (NameNode), all others will be slaves (DataNode) A NameNode can be a DataNode too but not really recommanded 9. On NameNode and DataNodes Open conf/hadoop-env.sh Delete # in front of "JAVA_HOME" and set exactly same as previously 10.On NameNode Open conf/masters Write localhost if it's not already done Open conf/slaves Write all ip (or name if these ip are defined on /etc/hosts) of DataNodes (one per line) Open conf/core-site.xml Complete file as below replacing HERE by ip (or host) of the NameNode. Localhost could work but it's not tested. <name>fs.default.name</name> <value>hdfs://here:54310</value> Open conf/mapred-site.xml (configuration of Map/Reduce) Complete file as below replacing HERE by ip (or host) of the NameNode. A JobTracker can be an other machine than NameNode but to simplify here it's merge. You can also change port as you want, it's just for example. <name>mapred.job.tracker</name> <value>here:54311</value> <name>mapred.local.dir</name> <value>select Your Directory</value> <name>mapred.system.dir</name> <value>select Your Directory</value> Open conf/hdfs-site.xml Complete file as below. Suppress dfs permission to simplify acces fr JOnAS <name>dfs.permissions</name> <value>false</value> 11.On DataNodes 2

3 Open conf/masters Write ip (or host) of NameNode Open conf/slaves Write ip (or host) of all DataNodes including localhostc Open conf/core-site.xml Complete file as below replacing HERE by ip (or host) of the NameNode. To pay attention to put the same port that before in the same part. <name>fs.default.name</name> <value>hdfs://here:54310</value> Open conf/mapred-site.xml (configuration of Map/Reduce) Complete file as below replacing HERE by ip (or host) of the NameNode. To pay attention to put the same port that before in the same part. <name>mapred.job.tracker</name> <value>here:54311</value> Open conf/hdfs-site.xml Complete file as below. Suppress dfs permission to simplify acces fr JOnAS <name>dfs.permissions</name> <value>false</value> 12.To start Hadoop Cluster. Don't forget to create a ssh connexion between machines if it's not already done. Open a terminal, go to HADOOP_HOME file and write this bin/start-all.sh You will flash past all ip (or host) with their work (NameNode,JobTracker,DataNode,TaskTracker) 13.To stop Hadoop Cluster Open a terminal, go to HADOOP_HOME file an write this bin/stop-all.sh You will flash past all ip (or host) stoping one by one 14.To restart Hadoop Cluster Open a terminal, go to HADOOP_HOME file an write this 3

4 bin/hadoop dfsadmin -safemode get If the response is not Safe mode is OFF write this Hadoop Cluster works ONLY if the response is : Safe mode is OFF bin/hadoop dfsadmin -safemode leave 15.Watch NameNode interface ip : The namenode s HTTP server address port : The namenode s HTTP server port (default 50070) 1.3. Create Zookeeper cluster 1. Download Zookeeper 3 2. Extract file 3. Put this file on at least 2 machines 4. Open conf/zoo.cfg 5. Select your data directory datadir=/yourselecteddirectory 6. List the whole Zookeeper cluster with this scheme : server.numberofserver=ip server.1= :2888:5888 server.2= :2888:5888 server.3= :2888:5888 [...] 7. On each machine respect the same numberofserver with the same ip. 8. Start Zookeeper bin/zkserver.sh start 9. Check if Zookeeper is runing bin/zkserver.sh status 10.It is running if we can see JMX enabled by default Using config: ZOOKEEPER_HOME/bin/../conf/zoo.cfg Mode: follower Or JMX enabled by default Using config: ZOOKEEPER_HOME/bin/../conf/zoo.cfg Mode: leader 11.Verify that numberofserver matches the selected ip. Go to /yourselecteddirectory and open myid on each machine. 4

5 One the first machine (server.1= :2888:5888 on the example), we have to see on myid : 1 One the second machine (server.2= :2888:5888 on the example), we have to see on myid : 2 Etc. 12.Leader and Followers On the Zookeeper cluster, we must have only one Leader and many Followers Create HBase cluster 1. Download HBase 4 2. Extract file 3. Put this file on each DataNode and NameNode machines Each machine have to be the same login and the same url directory for HBase 4. On HMaster and HRegionServers (same idea that NameNode and DataNode for Hadoop) Edit conf/hbase-env.sh export JAVA_HOME=~/yourPathToJavaJDK-6 export HBASE_MANAGES_ZK=false HBASE_MANAGES_ZK=true to run a local HBase and skip section 3 because HBase will create his own Zookeeper server Edit conf/hbase-site.xml replacing HERE by ip(or host) of the NameNode List onhbase.zookeeper.quorum all zookeeper ip listed before on 3.6 <name>hbase.rootdir</name> <value>hdfs://here:54310/user/youraccount/hbase</value> <name>hbase.defaults.for.version.skip</name> <value>true</value> <name>hbase.cluster.distributed</name> <value>true</value> <name>hbase.zookeeper.quorum</name> <value> , , ,...</value> <name>hbase.zookeeper.property.datadir</name> <value>/selectyoursavedirectory</value> 5. On HMaster (Same machine that the NameNode), edit conf/regionservers and list all the HRegionServers (on per line) 6. On HRegionServers (Same machine that the NameNode), edit conf/regionservers and list all the HRegionServers (on per line) including localhost for the own ip adress of HRegionServer 7. Start HBase 5

6 bin/start-hbase.sh 8. Check if HBase is running bin/hbase shell hbase(main):001:0> status 1 servers, 0 dead, average load 9. Create the EventPath table where will be save persistence calculations hbase(main):002:0> create 'EventPath','primary','secondary' Attention : To be very strict with the case sensitivity hbase(main):003:0> list TABLE EventPath 1 row(s) in seconds hbase(main):004:0> status 1 servers, 0 dead, average load 1.5. Start persistence 1. Start JOnAS (use -Deasybeans.useSimplePool=true if there is a possibility to undeploy "audit-jpanosql"-"jpa" bundle) 2. check out git@gitorious.ow2.org:ow2-jonas/audit-jpa-nosql.git 3. compile project (using Maven) mvn clean install 4. Go on audit-jpa-nosql/deployment/gotozip/target/distrib There is a Zookeeper Server started on the same machine that JOnAS will be start ant -DzkServer=on There is not any Zookeeper Server started on the same machine that JOnAS will be start To open persistence.xml and modify the following property <property name="datanucleus.connectionurl" value="hbase:ip:port" /> ip : The namenode s server address port : The namenode s server port (write on $HADOOP_HOME/conf/core-site.xml) Start ant 6

7 To restore the persistence.xml used by JOnAS (works with Zookeeper) ant -Drestore=on 7

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster Setup Hadoop On Ubuntu Linux ---Multi-Node Cluster We have installed the JDK and Hadoop for you. The JAVA_HOME is /usr/lib/jvm/java/jdk1.6.0_22 The Hadoop home is /home/user/hadoop-0.20.2 1. Network Edit