Service Integration course. Cassandra

Similar documents
Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak

Contents. Apache Log4j. What is logging. Disadvantages 15/01/2013. What are the advantages of logging? Enterprise Systems Log4j and Maven

Software project management. and. Maven

Hands on exercise for

Software project management. and. Maven

LAB 1. Familiarization of Rational Rose Environment And UML for small Java Application Development

Content. Development Tools 2(63)

Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI)

Getting started Cassandra Access control list

Going Native With Apache Cassandra. QCon London, 2014

Click Start > Control Panel > System icon to open System Properties dialog box. Click Advanced > Environment Variables.

Easy-Cassandra User Guide

Developing Web Services with Apache CXF and Axis2

CS506 Web Design and Development Solved Online Quiz No. 01

NoSQL Databases. Big Data 2015

Mercury User Guide v1.1

Sonatype CLM for Maven. Sonatype CLM for Maven

1 Building, Deploying and Testing DPES application

SETTING UP YOUR JAVA DEVELOPER ENVIRONMENT

Crystal Reports for Eclipse

Hadoop Basics with InfoSphere BigInsights

Jenkins on Windows with StreamBase

Eclipse installation, configuration and operation

Java Software Development Kit (JDK 5.0 Update 14) Installation Step by Step Instructions

PTC Integrity Eclipse and IBM Rational Development Platform Guide

Installing Java (Windows) and Writing your First Program

Setting up Hadoop with MongoDB on Windows 7 64-bit

Getting Started. SAP HANA Cloud End-to-End-Development Scenarios. Develop your first End-to-End SAP HANA Cloud Application Scenario. Version 1.4.

CI/CD Cheatsheet. Lars Fabian Tuchel Date: 18.March 2014 DOC:

Software Quality Exercise 2

Working With Derby. Version 10.2 Derby Document build: December 11, 2006, 7:06:09 AM (PST)

EMC Documentum Composer

Configuring the LCDS Load Test Tool

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

SDK Code Examples Version 2.4.2

Tutorial: BlackBerry Object API Application Development. Sybase Unwired Platform 2.2 SP04

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

Maven2 Reference. Invoking Maven General Syntax: Prints help debugging output, very useful to diagnose. Creating a new Project (jar) Example:

Word 2010: Mail Merge to with Attachments

Overview of Web Services API

What's New In DITA CMS 4.0

Documentum Developer Program

CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT

Integrating your Maven Build and Tomcat Deployment

Tutorial: Android Object API Application Development. SAP Mobile Platform 2.3 SP02

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

Builder User Guide. Version Visual Rules Suite - Builder. Bosch Software Innovations

Installation Guide. Copyright (c) 2015 The OpenNMS Group, Inc. OpenNMS SNAPSHOT Last updated :19:20 EDT

Builder User Guide. Version 5.4. Visual Rules Suite - Builder. Bosch Software Innovations

Appendix A Using the Java Compiler

Introduction to Eclipse

IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>

Creating a Java application using Perfect Developer and the Java Develo...

Developer Guide: Unwired Server Management API. Sybase Unwired Platform 2.1

CIMHT_006 How to Configure the Database Logger Proficy HMI/SCADA CIMPLICITY

Maven or how to automate java builds, tests and version management with open source tools

Elixir Schedule Designer User Manual

Appendix K Introduction to Microsoft Visual C++ 6.0

MarkLogic Server. Java Application Developer s Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Data Tool Platform SQL Development Tools

SnapLogic Salesforce Snap Reference

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Installing Java. Table of contents

Designing with Exceptions. CSE219, Computer Science III Stony Brook University

ACCESS Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818)

Build management & Continuous integration. with Maven & Hudson

Getting Started using the SQuirreL SQL Client

Developing In Eclipse, with ADT

Java Web Services SDK

Portals and Hosted Files

Talend for Data Integration guide

MAKE A NEW SUBSITE 1. On the left navigation, click Site Contents (or Sites). Scroll down to subsites. Click new subsite.

IDS 561 Big data analytics Assignment 1

Version Control with. Ben Morgan

Hadoop Tutorial. General Instructions

Cascading Pattern - How to quickly migrate Predictive Models (PMML) from SAS, R, Micro Strategies etc., onto Hadoop and deploy them at scale

Drupal CMS for marketing sites

ORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE

Exam Name: IBM InfoSphere MDM Server v9.0

Runtime Monitoring & Issue Tracking

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

How to read Temperature and Humidity from Am2302 sensor using Thingworx Edge java SKD for Raspberry Pi

Tutorial: BlackBerry Application Development. Sybase Unwired Platform 2.0

Contents Set up Cassandra Cluster using Datastax Community Edition on Amazon EC2 Installing OpsCenter on Amazon AMI References Contact

ArpViewer Manual Version Datum

Installing (1.8.7) 9/2/ Installing jgrasp

Capabilities of a Java Test Execution Framework by Erick Griffin

Tutorial: Android Object API Application Development. SAP Mobile Platform 2.3

Aspects of using Hibernate with CaptainCasa Enterprise Client

MariaDB Cassandra interoperability

Single Node Hadoop Cluster Setup

Basic Unix/Linux 1. Software Testing Interview Prep

Tutorial: Android Object API Application Development. Sybase Unwired Platform 2.2 SP02

Setting up Sudoku example on Android Studio

Building and Using Web Services With JDeveloper 11g

Java with Eclipse: Setup & Getting Started

XMLVend Protocol Message Validation Suite

Transcription:

Budapest University of Technology and Economics Department of Measurement and Information Systems Fault Tolerant Systems Research Group Service Integration course Cassandra Oszkár Semeráth Gábor Szárnyas March 16, 2014

Contents 1 Installation 2 1.1 Linux......................................................... 2 1.2 Windows....................................................... 2 1.3 Documentation pointers.............................................. 3 2 Laboratory exercises 4 2.1 Database design................................................... 4 2.2 Java client...................................................... 7 2.3 Integration...................................................... 11 2.4 Version control................................................... 13 2.5 Eclipse tips...................................................... 14 1

Chapter 1 Installation Apache Cassandra (http://cassandra.apache.org/) is an open source NoSQL database management system based on the column family data model. Figure 1.1: The logo of Apache Cassandra 1.1 Linux sudo mkdir /var/lib/cassandra sudo mkdir /var/log/cassandra sudo chown -R $USER:$USER /var/lib/cassandra sudo chown -R $USER:$USER /var/log/cassandra 1.2 Windows Set the JAVA_HOME environment variable to to the JDK s installation directory (Cassandra will work with the JRE as well, but some other Java-based applications such as Maven will not). 2

Install Cygwin with Python 2.7.x (if you want to add the package to an existing Cygwin installation, you have to re-run the installer). Navigate to /cygdrive/c/apache-cassandra-2.0.5. Run the shell with the following command: $ bin/cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 4.1.1 Cassandra 2.0.5 CQL spec 3.1.1 Thrift protocol 19.39.0] Use HELP for help. Note that Cassandra itself is not supported under Cygwin (http://stackoverflow.com/questions/12355507/ error-when-starting-cassandra-with-bin-cassandra-f). 1.3 Documentation pointers Official Apache wiki (http://wiki.apache.org/cassandra/). It is not very informative, use the DataStax documentation (http://www.datastax.com/docs) instead. CQL documentation: http://www.datastax.com/documentation/cql/3.1/index.html 3

Chapter 2 Laboratory exercises 2.1 Database design 1. You may always reset the database with the following command: DROP KEYSPACE bicycle_database; 2. Create the keyspace: CREATE KEYSPACE bicycle_database WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 1 ; 3. Create a column family: USE bicycle_database; CREATE TABLE bicycles ( manufacturer text, model text, PRIMARY KEY (manufacturer, model) ); INSERT INTO bicycles (manufacturer, model) VALUES ('KTM', 'Strada 1000'); 4. Run a simple query: SELECT * FROM bicycles; 5. Add some columns. ALTER TABLE bicycles ADD year int; ALTER TABLE bicycles ADD size int; 6. Insert a few lines. INSERT INTO bicycles (manufacturer, model, year, size) VALUES ('KTM', 'Strada 1000', 2014, 57); INSERT INTO bicycles (manufacturer, model, year, size) VALUES ('Trek', '1.2', 2013, 54); 4

7. Run a simple query again: cqlsh:bicycle_database> SELECT * FROM bicycles; manufacturer model size year --------------+-------------+------+------ KTM Strada 1000 57 2014 Trek 1.2 54 2013 (2 rows) 8. We would like to collect the bicycles with the frame size of 54: SELECT * FROM bicycles WHERE size = 54; This throws the following error: Bad Request: No indexed columns present in by-columns clause with Equal operator. 9. Create indices. CREATE INDEX bicycles_size ON bicycles (size); CREATE INDEX bicycles_year ON bicycles (year); 10. Re-run the query. It may not return any results as the rows were added before the index was created. Add the rows again and the query will work fine. 11. Try a query which filters for inequality: SELECT * FROM bicycles WHERE year > 2013; 12. This also throws an error: Bad Request: 13. Add an equality filter as well: No indexed columns present in by-columns clause with Equal operator SELECT * FROM bicycles WHERE year > 2013 AND size = 57; This time, the error message is the following: Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING 14. Allow filtering: SELECT * FROM bicycles WHERE year > 2013 AND size = 57 ALLOW FILTERING; This query works just fine. For details, see http://wiki.apache.org/cassandra/secondaryindexes. 15. Try using an OR logical operation in the query: SELECT * FROM bicycles WHERE size = 54 OR size = 57; The parser does not even recognize the operation: Bad Request: line 1:39 missing EOF at OR. 16. CQL3 supports collections let s use them for storing the number of cogs: ALTER TABLE bicycles ADD front_cogs list<int>; ALTER TABLE bicycles ADD rear_cogs list<int>; 5

Figure 2.1: Drivetrain. Source: http://commons.wikimedia.org/wiki/file:derailleur_bicycle_drivetrain.svg 6

17. Update the KTM bicycle with an INSERT operation (source: http://www.ktm-bikes.at/bikes/road.html? action=bike_details&bike_id=160&chash=dbfc043b8dd2263b610187cf9d093fc7). INSERT INTO bicycles (manufacturer, model, front_cogs, rear_cogs) VALUES ('KTM', 'Strada 1000', [50, 34], [12, 13, 14, 15, 17, 19, 21, 23, 25, 28]); Query it, the result should look like this: manufacturer model front_cogs rear_cogs size year --------------+-------------+------------+------------------------------------------+------+------ KTM Strada 1000 [50, 34] [12, 13, 14, 15, 17, 19, 21, 23, 25, 28] 57 2014 Trek 1.2 null null 54 2013 2.2 Java client 1. Create a new Maven project. Choose the org.apache.maven.archetypes:maven-archetype-quickstart archetype. Set the Group Id to hu.bme.mit.sysint and the Artifact Id to client. 2. Go to the App class and run it. 3. Go to the pom.xml file. The Ctrl+F11 hotkey does not run the application if we are on the POM file. Solution: go to Window Preferences Run/Debug Launching, in the Launch Operation groupbox choose Always launch the previously launched application. 4. Add the DataStax driver dependency (https://github.com/datastax/java-driver): <dependency> <groupid>com.datastax.cassandra</groupid> <artifactid>cassandra-driver-core</artifactid> <version>2.0.0</version> </dependency> 5. Create the CassandraClient class. private static final String KEYSPACE_NAME = "bicycle_database"; protected Session session; protected Cluster cluster; public CassandraClient(String host) { cluster = Cluster.builder().addContactPoints(host).build(); session = cluster.connect("system"); 6. Instantiate the CassandraClient class from the main method: public class App { private static final String CASSANDRA_HOST = "127.0.0.1"; public static void main(string[] args) { client = new CassandraClient(CASSANDRA_HOST); client.load(); 7. The following warning is shown. 7

SLF4J: Failed to load class "org.slf4j.impl.staticloggerbinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#staticloggerbinder for further details. 8. Pick a logger implementation, e.g. log4j and search for its Maven dependency: <dependency> <groupid>org.slf4j</groupid> <artifactid>slf4j-log4j12</artifactid> <version>1.7.6</version> </dependency> 9. Run the project. 10. The following warning is displayed: log4j:warn No appenders could be found for logger (com.datastax.driver.core.framecompressor). log4j:warn Please initialize the log4j system properly. log4j:warn See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 11. Create log4j.properties file in the src/main/java folder. The content should look something like this: # Root logger option log4j.rootlogger=info, stdout # Direct log messages to stdout log4j.appender.stdout=org.apache.log4j.consoleappender log4j.appender.stdout.target=system.out log4j.appender.stdout.layout=org.apache.log4j.patternlayout log4j.appender.stdout.layout.conversionpattern=%d{yyyy-mm-dd HH:mm:ss %-5p %c{1:%l - %m%n Source: http://www.mkyong.com/logging/log4j-log4j-properties-examples/ 12. If you d like to hide the warning messages (e.g. for a production system), change the value of the log4j.rootlogger property from INFO to ERROR. 13. Alternatively, we may add the missing libraries: <dependency> <groupid>org.xerial.snappy</groupid> <artifactid>snappy-java</artifactid> <version>1.1.0.1</version> </dependency> <dependency> <groupid>net.jpountz.lz4</groupid> <artifactid>lz4</artifactid> <version>1.2.0</version> </dependency> 14. Use the following commands in the CQL shell: DESCRIBE KEYSPACE bicycle_database; DESCRIBE TABLE bicycles; 8

15. The application does not stop after it finishes its operations. This is because is has open connections. To solve this, implement the Closeable interface in the CassandraClient class: public class CassandraClient implements Closeable { [...] public void close() { session.close(); cluster.close(); Java 7 has a feature similar to the using block in C#. public static void main(string[] args) { try (CassandraClient client = new CassandraClient(CASSANDRA_HOST)) { client.load(); finally { 16. We start the load with dropping the keyspace. session.execute("drop KEYSPACE bicycle_database"); This works fine for the first time, however, it throws an exception if the keyspace does not exist: Exception in thread "main" com.datastax.driver.core.exceptions.invalidqueryexception: Cannot drop non existing keyspace bicycle_database. 17. While the Javadoc of the Session interface s execute() method shows that it may throw an exception, the application is not required to catch this exception as it is a RuntimeException. For details, see the Javadoc of the RuntimeException class. try { session.execute("drop KEYSPACE bicycle_database"); catch (InvalidQueryException e) { System.out.println("Cannot drop keyspace."); A more elegant solution is to use: DROP KEYSPACE IF EXISTS bicycle_database; 18. Create the keyspace: String createkeyspace = "CREATE KEYSPACE " + KEYSPACE_NAME + " WITH replication = {\r\n" + " 'class': 'SimpleStrategy',\r\n" + " 'replication_factor': '1'\r\n" + ";"; session.execute(createkeyspace); session.execute("use " + KEYSPACE_NAME); 19. Create the table. Autoformatting may be messy here. To correct this, Go to Window Preferences Java Code Style Formatter. Create a new profile (e.g. Eclipse FTSRG) and edit it. Go to the Off/On Tags tab and check Enable Off/On tags. 9

// @formatter:off String createtable = "CREATE TABLE bicycles (\r\n" + " manufacturer text,\r\n" + " model text,\r\n" + " front_cogs list<int>,\r\n" + " rear_cogs list<int>,\r\n" + " size int,\r\n" + " year int,\r\n" + " PRIMARY KEY (manufacturer, model)\r\n" + ")"; session.execute(createtable); // @formatter:on 20. Implement a method for creating indices: protected void createindex(string columnfamily, String columnname) { String createindex = String.format("CREATE INDEX %s_%s ON %s (%s)", columnfamily, columnname, columnfamily, columnname); session.execute(createindex); Invoke the method: createindex("bicycles", "size"); createindex("bicycles", "year"); 21. The insertion uses the builder pattern (http://en.wikipedia.org/wiki/builder_pattern): Insert insert = QueryBuilder.insertInto(KEYSPACE_NAME, "bicycles"); insert.value("manufacturer", "KTM"); insert.value("model", "Strada 1000"); insert.value("size", 57); insert.value("year", 2014); insert.value("front_cogs", Arrays.asList(53, 39)); insert.value("rear_cogs", Arrays.asList(12, 13, 14, 15, 17, 19, 21, 23, 25, 28)); session.execute(insert); 22. Run some queries: System.out.println("Query 1"); for (Row row : resultset1) { System.out.println(row); String manufacturer = row.getstring("manufacturer"); String model = row.getstring("model"); List<Integer> frontcogs = row.getlist("front_cogs", Integer.class); List<Integer> rearcogs = row.getlist("rear_cogs", Integer.class); System.out.println(manufacturer + " " + model + ", front cogs: " + frontcogs + ", rear cogs: " + rearcogs); System.out.println("Query 2"); String select2 = QueryBuilder.select("manufacturer").from("bicycles"). where(querybuilder.eq("size", 57)).getQueryString(); ResultSet resultset2 = session.execute(select2); for (Row row : resultset2) { System.out.println(row); 10

23. To improve the readability of the code, add a static import. Tip: use the Ctrl+Shift+M, while the cursor is in the eq part of QueryBuilder.eq. import static com.datastax.driver.core.querybuilder.querybuilder.eq; [...] String select2 = QueryBuilder.select("manufacturer").from("bicycles"). where(eq("size", 57)).getQueryString(); 2.3 Integration 1. Install Maven: set the M2_HOME environment variable and add M2_HOME/bin to the PATH environment variable. 2. Check if Maven is working properly: mvn --version 3. The standard Maven command for initiating the build process is: mvn clean install 4. The build may throw the folloing error: Fatal error compiling: tools.jar not found: To fix this, set the JAVA_HOME environment variable from the JRE s directory to the JDK s directory. 5. The CassandraClient class may cause an error related to the Java version, e.g. diamond operator is not supported in -source 1.5. Add the following lines to the POM file: <build> <plugins> <plugin> <artifactid>maven-compiler-plugin</artifactid> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> </build> Source: http://stackoverflow.com/questions/14043042/compiling-java-7-code-via-maven 6. Once the build succeeds, try to run the application. java -jar target\hu.bme.mit.sysint-0.0.1-snapshot.jar 7. It throws the following error: no main manifest attribute, in target\hu.bme.mit.sysint-0.0.1-snapshot.jar Define the main class in the POM file: 11

<plugin> <groupid>org.apache.maven.plugins</groupid> <artifactid>maven-jar-plugin</artifactid> <configuration> <archive> <manifest> <mainclass>hu.bme.mit.sysint.cassandraclient.app</mainclass> </manifest> <manifestentries> <mode>development</mode> <url>${pom.url</url> </manifestentries> </archive> </configuration> </plugin> Source: http://docs.codehaus.org/pages/viewpage.action?pageid=72602 8. It throws the following exception: Exception in thread "main" java.lang.noclassdeffounderror: com/datastax/driver/core/statement at hu.bme.mit.sysint.cassandraclient.app.main(app.java:13) Caused by: java.lang.classnotfoundexception: com.datastax.driver.core.statement The reason for this is that the dependencies are not available for the application. To solve this, we have to (a) Copy the dependencies: <plugin> <artifactid>maven-dependency-plugin</artifactid> <executions> <execution> <phase>install</phase> <goals> <goal>copy-dependencies</goal> </goals> <configuration> <outputdirectory>${project.build.directory/lib</outputdirectory> </configuration> </execution> </executions> </plugin> The execution tag shows an error: maven-dependency-plugin (goals "copy-dependencies", "unpack") is not supported by m2e. Use the provided Permanently mark goal copy-dependencies in pom.xml as ignored in Eclipse build. fix. This adds a long <pluginmanagement> tag. (b) Add them to the classpath: <plugin> <groupid>org.apache.maven.plugins</groupid> <artifactid>maven-jar-plugin</artifactid> <configuration> <archive> <manifest> 12

[...] <addclasspath>true</addclasspath> <classpathprefix>lib/</classpathprefix> </manifest> </archive> </configuration> </plugin> For some libraries (e.g. emfjson) you may get a java.lang.noclassdeffounderror caused by a java.lang.classnotfoundexception. The reason for this is that Maven sometimes resolves the SNAPSHOT string to the actual snapshot version in the generated JAR s MANIFEST.MF file (e.g. org.eclipselabs.emfjson-0.7.0-snapshot.jar becomes org.eclipselabs.emfjson-0.7.0-20140221.135604-5) but the filename of the dependency s JAR stays the same. To solve this, add a <useuniqueversions>false</useuniqueversions> element between the <manifest> tags (source: http://jira.codehaus.org/browse/mjar-156). Source: http://stackoverflow.com/questions/97640/force-maven2-to-copy-dependencies-into-target-lib 9. The old slf4j warning is back. The solution is to create a the src/main/resources folder and move the log4j.properties file to it. 10. If you would like to package your project as one fat JAR, use the maven-assembly-plugin. <plugin> <artifactid>maven-assembly-plugin</artifactid> <configuration> <archive> <manifest> <mainclass>hu.bme.mit.sysint.cassandraclient.app</mainclass> </manifest> </archive> <descriptorrefs> <descriptorref>jar-with-dependencies</descriptorref> </descriptorrefs> </configuration> <executions> <execution> <id>make-assembly</id> <!-- this is used for inheritance merges --> <phase>package</phase> <!-- bind to the packaging phase --> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> Source: http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven If you just want to get the fat JAR, build the project with: mvn clean compile assembly:single 2.4 Version control 1. Start by cloning the repository to your local computer. To do this, go to the Git Repositories view (from the Window Show View Other... window). Provide the clone URI to clone the Git repository. 13

2. Right click the project an choose Team Share project... Choose Git and select the repository. 3. Add a.gitignore file to the project. Search for GitHub s gitignore repository (https://github.com/github/ gitignore) repository and use the Maven.gitignore and the Java.gitignore files as a guideline. 4. Commit and Push the project. 2.5 Eclipse tips In the Window Preferences dialog, go to Java Editor Typing. In the Automatically insert at correct position group, check the Semicolons checkbox. In the In string literals group, check Escape text when pasting into a string literal. 14