Getting Started with SandStorm NoSQL Benchmark

Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop, and Message queue systems. The current SandStorm release (7.3) provides out of the box support for the following NoSQL technologies: 1) MongoDB 2) Cassandra 3) HBase 4) OracleNoSQL 1. Obtain SandStorm NoSQL Benchmark Utility Download the latest version: Systems may have additional requirements for running clients for different databases. Depending on the database, additional dependencies will have to be included in the lib folder of SandStorm. For example, HBase requires the client be able to contact Zookeeper. You will be executing the runbenchmark script from SandStorm to execute performance benchmarks on the NoSQL DB or cluster. You can pass the arguments via command line or alternatively you can use the sandstorm.properties file to set the parameter values. For e.g. Following are the sample arguments for loading data in MongoDB: mongodb -w load -s 100000 -v 50 In the above command mongodb is the database type, -w: database workflow, load for inserting data -s: total number of records to be inserted (Or Total number of transactions to be performed) -v: total number of concurrent - threads You can execute runbenchmark without any arguments to see its usage. 2. Now, execute a Workload Executing a Workload There are 5 steps to execute a workload: 1. Set up the database system to test 2. Load the data 3. Choose the appropriate runtime parameters (number of client threads, duration, etc.) 4. Select the appropriate workload 5. Execute the workload

The steps described here assume that you are running a single client server. This should be sufficient for small to medium clusters (e.g. 10 or so machines). For much larger clusters, you may have to run additional load generator agents on different servers to generate enough load. Similarly, loading a database may be faster in some cases using multiple client machines. Note- This single server setup of SandStorm client has been tested on an 8GB/i5/Dual Core and is able to load test a MongoDB cluster with 250 million data rows amounting to 250 GB of test data. These numbers are only for indicative purpose. The numbers may vary depending on the cluster setup and other NoSQL databases. Step 1. Set up the database system to test The first step is to set up the database system you wish to test. This can be done on a single machine or a cluster, depending on the configuration you wish to benchmark. You must also create or set up tables/keyspaces/storage buckets to store records. The details vary according to each database system, and depend on the workload you wish to run. Before the SandStorm Client is used, the tables must be created, since the Client itself will not request to create the tables. This is because for some systems, there is a manual (human-operated) step to create tables, and for other systems, the table must be created before the database cluster is started. The tables that must be created depends on the workload. For Data Load, the SandStorm Client will assume that there is a "table" called Test_Table with a flexible schema: columns can be added at runtime as desired. This " Test_Table " can be mapped into the appropriate storage container. For example, in MongoDB you would create a collection; in Cassandra you would define a keyspace in the Cassandra configuration, and so on. The database interface layer will receive requests for reading or writing records in Test_Table and translate them into requests for the actual storage you have allocated. This may mean that you have to provide information for the database interface layer to help it understand what the structure of the underlying storage is. For example, in Cassandra, you must define "column families" in addition to keyspaces. Thus, it is necessary to create a column family and give the family some name (for example, you might use "values.") Then, the database access layer will need to know to refer to the "values" column family, either because the string "values" is passed in as a property, or because it is hardcoded in the database interface layer. Step 2. Choose the appropriate workload SandStorm includes a set of core workloads that define a basic benchmark for cloud systems. Of course, you can define your own workloads. However, the core workloads are a useful first step, and obtaining these benchmark numbers for a variety of different systems would allow you to understand the performance tradeoffs of different systems.

Load: Data insert workload This workload has 100% writes. An application example is an audit log application. The purpose of this workload is to generate test data for the other workloads. Workload A: Update heavy workload This workload has a mix of 50/50 reads and writes. An application example is a session store recording recent actions. Workload B: Read mostly workload This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags. Workload C: Read only This workload is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop). Step 3. Choose the appropriate runtime parameters There are additional runtime settings that you may want to specify for a particular run of the benchmark. These settings can either be provided in sandstorm.properties file or can be provided on the command line when you run the SandStorm client. These settings are: --host <host> for IP of the machine on which database is running --port <port> for port of the machine on which database is running --DB <dbname> Required: Database name --username <username> for accessing the database --password <password> for accessing the database -w <workflow> workflow. Options load, A, B, C. Default is A --startfrom starting index of record for load workflow -d <duration> duration of test. Provide only duration or scale. Scale will override duration -s <scale> scale (no. of records) for running test. Provide only duration or scale. Scale will override duration -v <vuser> total no. of vuser for running test. Default is 10 -R <host1, host2> A comma separated list of remote hosts for generating load -m flag for enabling monitoring. Please provide monitoring details in the properties file Note: - The command line arguments will override the settings in sandstorm.properties file.

Step 4. Load the data Workloads have two executable phases: the loading phase (which defines the data to be inserted) and the transactions phase (which defines the operations to be executed against the data set). To load the data, you run the SandStorm Client and tell it to execute the loading section. To load the MongoDB dataset provide the following command line arguments: Mongodb -w load -s 100000 -v 50 A few notes about this command: MongoDB: Tells SandStorm client to use MongoDB database layer -w: database workflow, load for inserting data -s: total number of transactions to be performed (in this case the records to be inserted) -v: total number of concurrent threads In general, it is good practice to store any important properties in parameter file, instead of specifying them on the command line. Step 5: Execute the workload Once the data is loaded, you can execute the workload. This is done by telling the client to run the transaction section of the workload. To execute the workload, you can use the following command: C:\SandStormInstallationDirectory\> runbenchmark.bat Typically you will want to use the -v and -d parameters to control the amount of offered load. For example, we might want 10 threads running for 30 min. and analyze the throughput. We can then increase the virtual users and duration to achieve the desired throughput. It s preferred to make these settings by editing the runbenchmark.bat before executing a workload. This is a good practice to follow to avoid any mistakes with the command line parameters. At the end of the run, SandStorm will generate performance reports in the resultsfiles folder. The default is to produce a summary report that includes total operations passed, failed, average, min, max and 90th percentile response time for each operation type (read, update, etc.).

Screens:- Sandstorm.properties:- runbenchmark.bat:-

Scenario Running:- Scenario Stop:-

Results Location:-

Results:- Here is a snippet of Transaction Performance Summary Report This report gives the following information:- Transaction- Description of the Operation that is performed Pass- Total number of successful transactions Fail- Total number of failed transactions Not Executed- Total number of not executed transactions Min- Minimum time taken by an operation to successfully complete Max- Maximum time taken by an operation to successfully complete Avg- Average time taken by an operation to successfully complete Std.Deviation- Standard deviation of the response time from the average value 90%- 90 th percentile value of the response time for a successful transaction Another important counter- Throughput can be found in the Overall Scenario Throughput Report (OSTR) which is generated at the end of each scenario in the same location as that of the Transaction Performance Summary Report (TPSR).