Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak
Table of Contents 1 Introduction... 3 1.1 Purpose... 3 1.2 Product Information... 3 2 Installation Manual... 3 2.1 Required Software... 3 2.2 Detail procedure... 3 Step 1: Install Java... 3 Step 2: Install Maven... 9 Step 3: Install Python... 11 Step 4: Install Git... 13 Step 5: Install Cassandra... 15 Step 6: Get YCSB... 18 Step 7: Create Keyspace and Column family in Cassandra... 21 Step 8: Load Data... 24 Step 9: Run the benchmark... 24 2 Y C S B
1 Introduction 1.1 Purpose This document aims to cover essential requirements, software functionalities and configuration guidelines in order to configure YCSB to test the backend NoSQL database Cassandra in Windows 7. 1.2 Product Information Yahoo Cloud System Benchmark (YCSB) is a tool for benchmarking cloud storage services, specifically, NoSQL databases. It is a Cloud service testing client that performs reads, writes and updates according to specified workloads. Running from the command line it can create an arbitrary number of threads that will query the system under test. It will measure throughput in operations per second and record the latency in performing these operations. 2 Installation Manual 2.1 Required Software 1. YCSB 0.1.4 2. Cassandra 2.0.3 3. Java (Jdk 1.7 above) 4. Python 2.7.3 5. Maven 3.2.1 6. Git 1.8.4 2.2 Detail procedure Step 1: Install Java 1. To see if your system already has java installed, open a command prompt window and right the commands as shown in the figure below. (Results will be similar if java is not installed.) 3 Y C S B
2. Go to the link shown in the figure below for downloading the JDK. 3. Install from the downloaded setup. 4. After installation, open a file explorer, select right click Computer and select Properties. 4 Y C S B
5. Select Advance System Settings. 6. Select Environment Variables. 5 Y C S B
7. Click New from System variables. 8. Create a new variable named JAVA_HOME and set its path to the root directory of JDK. 6 Y C S B
9. Select Path variable and click edit. 10. In the variable value field, append %JAVA_HOME%\bin; and click ok. 7 Y C S B
11. To confirm that the JDK was installed successfully, open a new command window and enter the command as shown in the figure below. 8 Y C S B
Step 2: Install Maven 1. Download the latest version of Maven (preferably apache-maven-3.2.1) from the following link: http://maven.apache.org/download.cgi#installation 2. Place the downloaded folder into C:\ 3. Set Environment Variable by going into Advanced System Properties. 4. Edit System Variable Path and append %MAVEN_HOME%\bin; and click ok. 9 Y C S B
5. To confirm that the Maven was installed successfully, open a new command window and enter the command as shown in the figure below. 10 Y C S B
Step 3: Install Python 1. Download the latest version of Python (preferably Python 2.7.3) from the following link: https://www.python.org/download/ 2. Run the setup and install it. 11 Y C S B
3. Set Environment Variable by going into Advanced System Properties. 4. Edit System Variable Path and add the path of the folder where Python is installed as shown in figure below: 12 Y C S B
5. To confirm that the Python was installed successfully, open a new command window and enter the command as shown in the figure below. Step 4: Install Git 1. Download the latest version of Git (preferably Python 1.8.4) from the following link: http://code.google.com/p/msysgit/downloads/list?q=full+installer+official+git 2. Run the setup and install it. 13 Y C S B
3. Set Environment Variable by going into Advanced System Properties. 4. Edit System Variable Path and add the path of the installed Git bin folder as shown in figure below. 5. To confirm that the Git was installed successfully, open a new command window and enter the command as shown in the figure below. 14 Y C S B
Step 5: Install Cassandra 1. Download the required version of Cassandra you want to test (This tutorial used 2.0.4) from the following link: http://cassandra.apache.org/download/ 15 Y C S B
2. Extract Cassandra source files. 3. Set Environment Variable by going into Advanced System Properties. 4. Add new system variable CASSANDRA_HOME as shown in the figure below: 5. Now, go to bin directory where Cassandra is installed and double click cassandra.bat. 16 Y C S B
6. You should see output as shown in Figure below. 17 Y C S B
Step 6: Get YCSB 1. Use git to get the latest version of YCSB using command: git clone git://github.com/brianfrankcooper/ycsb.git 18 Y C S B
2. There are prebuilt jar files in the repository. To build it for yourself run command: mvn clean package 3. Test if YCSB is properly configured on the Windows using command: Python ycsb shell basic The output of this command should be the one line in figure below: 19 Y C S B
4. In the main YCSB folder, there is another folder workloads. It contains the different workloads files having configuration requirements. Set the IP address at the end of that file. For example: hosts = 127.0.0.1 Also, set recordcount and operationcount according to data records you want to insert. In this tutorial, it is set as 10 each. 20 Y C S B
Step 7: Create Keyspace and Column family in Cassandra You need to create a keyspace named usertable and a column family for YCSB. This is a must for YCSB to load data and run. In order to create a keyspace and a column family, you can use the following commands after connecting to the server with cassandra-cli utility under $CASSANDRA_ROOT/bin. 21 Y C S B
Note: the semicolon is important in the all the commands. 1. Run command: create keyspace usertable; 2. Run command: use usertable; 22 Y C S B
3. Run command: create column family data with column_type = 'Standard' and comparator = 'UTF8Type'; 23 Y C S B
Step 8: Load Data NUST School of Electrical Engineering and Computer Science Now, run the load command. The load command will populate the database Cassandra with some random records. The number and size of records are defined in the workload specification. python ycsb load cassandra-10 -P C:\Users\Faria\Desktop\Test\YCSB\workloads\workloada -s > load.log Step 9: Run the benchmark Now, run the run command. The run command will populate the database Cassandra with some random records. The number and size of records are defined in the workload specification. python ycsb run cassandra-10 -P C:\Users\Faria\Desktop\Test\YCSB\workloads\workloada -s > load.log 24 Y C S B
A log file load.log will be generated in the bin folder of YCSB directory. Analysis of the run time parameters generated and stored in the file helps in doing the comparison of different database s throughput, average latency and runtime as shown in figure below: In order to check whether data is inserted properly into the newly made Cassandra keyspace, run command: list data; 25 Y C S B
26 Y C S B