Simba ODBC Driver with SQL Connector for Apache Cassandra Installation and Configuration Guide May 7, 2013 Simba Technologies Inc.
Copyright 2012-2013 Simba Technologies Inc. All Rights Reserved. Information in this document is subject to change without notice. Companies, names and data used in examples herein are fictitious unless otherwise noted. No part of this publication, or the software it describes, may be reproduced, transmitted, transcribed, stored in a retrieval system, decompiled, disassembled, reverse-engineered, or translated into any language in any form by any means for any purpose without the express written permission of Simba Technologies Inc. Trademarks Simba, the Simba logo, SimbaEngine, SimbaEngine C/S, SimbaExpress and SimbaLib are registered trademarks of Simba Technologies Inc. All other trademarks and/or servicemarks are the property of their respective owners. ICU License - ICU 1.8.1 and later COPYRIGHT AND PERMISSION NOTICE Copyright (c) 1995-2010 International Business Machines Corporation and others All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, provided that the above copyright notice(s) and this permission notice appear in all copies of the Software and that both the above copyright notice(s) and this permission notice appear in supporting documentation. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization of the copyright holder. All trademarks and registered trademarks mentioned herein are the property of their respective owners. OpenSSL Copyright (c) 1998-2008 The OpenSSL Project. All rights reserved. www.simba.com i
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgment: "This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)" 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact openssl-core@openssl.org. 5. Products derived from this software may not be called "OpenSSL" nor may "OpenSSL" appear in their names without prior written permission of the OpenSSL Project. 6. Redistributions of any form whatsoever must retain the following acknowledgment: "This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (http://www.openssl.org/)" THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Apache Cassandra Copyright 2009-2010 The Apache Software Foundation. Apache Thrift Copyright 2006-2010 The Apache Software Foundation. Expat Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd www.simba.com ii
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ""Software""), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NOINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Contact Us Simba Technologies Inc. 938 West 8 th Avenue Vancouver, BC Canada V5Z 1E5 www.simba.com Telephone: +1 (604) 633-0008 Information and Product Sales: Extension 2 Technical Support: Extension 3 Fax: +1 (604) 633-0004 Information and Product Sales: solutions@simba.com Technical Support: support@simba.com Follow us on Twitter: @SimbaTech Printed in Canada www.simba.com iii
Table of Contents Introduction...1 System Requirements...1 Installing the Driver...1 Configuring a Data Source Name...2 Configuring Advanced Options...3 Defining the Schema...5 Features...6 SQL Connector...6 Data Types...6 Authentication...6 Write-back...7 Catalog Support...8 Contact Us...8 www.simba.com iv
Introduction The Simba ODBC Driver with SQL Connector for Cassandra enables Business Intelligence (BI), analytics and reporting on data that is stored in Apache Cassandra databases. The driver complies with the ODBC 3.52 data standard and adds important functionality such as Unicode, as well as 32- and 64-bit support for high-performance computing environments on all platforms. ODBC is one the most established and widely supported APIs for connecting to and working with databases. At the heart of the technology is the ODBC driver, which connects an application to the database. For more information about ODBC, see http://www.simba.com/odbc.htm. For complete information on the ODBC 3.52 specification, see the ODBC API Reference at http://msdn.microsoft.com/en-us/library/ms714562(vs.85).aspx System Requirements You install Simba ODBC Driver with SQL Connector for Cassandra on client computers accessing data in Cassandra databases. Each computer where you install the driver must meet the following minimum system requirements: One of the following operating systems (32- and 64-bit editions are supported): o Windows XP with SP3 o Windows Vista o Windows 7 Professional o Windows Server 2008 R2 25 MB of available disk space Important: To install the driver, you need Administrator privileges on the computer. Installing the Driver On 64-bit Windows operating systems, you can execute 32- and 64-bit applications transparently. You must use the version of the driver matching the bitness of the client application accessing Cassandra databases: SimbaCassandraODBC32.msi for 32-bit applications SimbaCassandraODBC64.msi for 64-bit applications You can install both versions of the driver on the same computer. Note: For an explanation of how to use ODBC on 64-bit editions of Windows, see http://www.simba.com/docs/how-to-32-bit-vs-64-bit-odbc-data-source-administrator.pdf www.simba.com 1
To install Simba ODBC Driver with SQL Connector for Cassandra: 1. Depending on the bitness of your client application, double-click to run SimbaCassandraODBC32.msi or SimbaCassandraODBC64.msi 2. Click Next 3. Select the check box to accept the terms of the License Agreement if you agree, and then click Next 4. To change the installation location, click the Change button, then browse to the desired folder, and then click OK. To accept the installation location, click Next 5. Click Install 6. When the installation completes, click Finish 7. If you are installing a driver with an evaluation license and you have purchased a perpetual license, then copy the License.lic file you received via e-mail into the \lib subfolder in the installation folder you selected in step 4. Configuring a Data Source Name After installing Simba ODBC Driver with SQL Connector for Cassandra, you need to create a Data Source Name (DSN). To create a Data Source Name: 1. On the Windows Start menu, click All Programs, then click the Simba Cassandra ODBC Driver 0.5 program group corresponding to the bitness of the client application accessing data in BigQuery, and then click ODBC Administrator 2. In the ODBC Administrator, click the Drivers tab, and then scroll down as needed to confirm that Simba Cassandra ODBC Driver appears in the alphabetical list of driver names. 3. To create a DSN on the computer that only the user currently logged into Windows can use, click the User DSN tab. OR To create a DSN on the computer that all users who log into Windows can use, click the System DSN tab. 4. Click the Add button. 5. In the Create New Data Source dialog, select Simba Cassandra ODBC Driver, and then click Finish 6. In the Simba Cassandra ODBC Driver DSN Setup dialog, type a name for the data source in the Data Source Name field. 7. In the Description field, type relevant details about the DSN. 8. In the Host field, type the name or IP address of the host where your Cassandra instance is running. www.simba.com 2
9. In the Port field, type the number of the port that the Cassandra instance uses. 10. In the Catalog field, type the name of your Cassandra keyspace. OR Click the drop-down arrow next to the Catalog field, and then select the appropriate keyspace from the list of keyspaces. 11. If you need to customize the schema that Simba ODBC Driver with SQL Connector for Cassandra detects for Cassandra databases, then click the Schema Definition button. For details, see the section Defining the Schema on page 5. 12. To configure advanced driver options, click the Advanced Options button. For details, see the section Configuring Advanced Options on page 3. 13. To confirm that the DSN connects to your Cassandra database, click the Test button. Review the Test Result dialog as needed, and then click OK 14. In the Simba Cassandra ODBC Driver DSN Setup dialog, click OK 15. In the ODBC Data Source Administrator dialog, click OK Configuring Advanced Options Table 1 lists advanced configuration settings available in Simba ODBC Driver with SQL Connector for Cassandra. Option Name Maximum rows per fetch Schema detection row limit Default Value Description 4096 The maximum number of rows that a query returns at a time 128 The number of rows to sample when detecting the schema for a table Connect Timeout (ms) 10000 The interval of time, in milliseconds, to wait for a connection to the database before returning a timeout error Send Timeout (ms) 30000 The interval of time, in milliseconds, that the database connection can remain inactive after querying the database before the connection is dropped. Receive Timeout (ms) 30000 The interval of time, in milliseconds, that the driver waits for messages from the database before the connection is dropped Default string column length 255 The default string column length to use. Cassandra does not provide the length for String columns in its column metadata. The option allows you to tune the length of String columns. www.simba.com 3
Option Name Decimal Column Precision Default Value Description 38 The maximum number of digits that values in decimal columns may have. Digits may be before or after the decimal point. Decimal Column Scale 10 The maximum number of digits to the right of the decimal point that values in decimal columns may have Allow Update Key Column Clear Control whether updating the key column is allowed Filter Tombstone Row Clear Control whether the driver filters out tombstone rows Use Native Query Disabled Reserved for future use Table 1 Advanced Configuration Options You can configure advanced options using the following: Data Source Name Database connection string Using the Data Source Name To set advanced options using the Simba Cassandra ODBC Driver DSN Setup dialog: 1. In the ODBC Data Source Administrator where you created the DSN, select the DSN tab where the Data Source Name appears, and then select the Data Source Name. 2. Click the Configure button, and then click the Advanced Options button. 3. Type an appropriate value in the Maximum Rows per Fetch field to control the maximum number of rows that the driver retrieves per fetch call. 4. Type an appropriate value in the Schema Detection Row Limit field to control the maximum number of rows the driver retrieves for schema detection. 5. Type an appropriate value in the Connect Timeout field, in milliseconds, as needed. 6. Type an appropriate value in the Sent Timeout field, in milliseconds, as needed. 7. Type an appropriate value in the Recv Timeout field, in milliseconds, as needed. 8. Type an appropriate value in the Default String Column Length field to control what the driver reports for the maximum column length in result set metadata. 9. Type an appropriate value in the Decimal Column Precision field, as needed. 10. Type an appropriate value in the Decimal Column Scale field, as needed. 11. Select the Allow Update Key Column check box to permit changing key column values. 12. Select the Filter Tombstone Row check box to exclude tombstone rows from query results. www.simba.com 4
13. Click OK 14. In the Simba Cassandra ODBC Driver DSN Setup dialog, click OK Using a Database Connection String Here is an example connection string that sets advanced options: DSN=Sample Simba Cassandra DSN; Host=192.168.100.100; Port=9160; Catalog=MyKeyspace; MaxFetchRows=2000; SchemaDetectionRowLimit=100; ConnTimeout=60000; SendTimeout=20000; RecvTimeout=20000; DecimalPrecision=38; DecimalScale=10; UpdateKeyColumn=0; FilterTombstoneRow=1; DefaultStringColumnLength=255 Defining the Schema Simba ODBC Driver with SQL Connector for Cassandra dynamically detects the database schema as needed in the process of connecting to a Cassandra database. You can also edit the schema that the driver uses to connect to the database manually. To define manually the schema to use when Simba ODBC Driver with SQL Connector for Cassandra connects to the database: 1. In the ODBC Data Source Administrator where you created the DSN, select the DSN tab where the Data Source Name appears, and then select the Data Source Name. 2. Click the Configure button, and then click the Schema Definition button. 3. In the Schema Definition dialog, click the drop-down arrow next to the Table Name field, and then select the table for which you want to edit the schema. 4. To change the number of rows in the table that the driver samples to detect columns that exist in the table and corresponding data types, type a number in the Rows field next to the Sample button, and then click the Sample button. In the confirmation dialog, click Yes 5. To change the data type for a column, click the Type field for the column in the Columns area, and then select the appropriate data type. The Data Preview pane updates to reflect your change. Note: If data cannot be represented using the data type you select, then the corresponding column in the Preview Pane displays a conversion error; unsuitable type message. 6. Repeat steps 3 to 5 as needed to define the schema. www.simba.com 5
Features SQL Connector Note: Currently, the Add and Remove buttons are not implemented. 7. When you are finished defining the schema, click OK, and then click Yes when prompted to write metadata to the database. OR To discard changes, click the Cancel button, and then click Yes when prompted to discard changes. The SQL Connector feature of the driver allows applications to use normal SQL queries against Cassandra databases, translating standard SQL-92 queries into equivalent Cassandra Thrift calls. This allows standard queries that BI tools execute to run against your Cassandra instance. Data Types The following data types are supported: AsciiType BooleanType BytesType DateType DecimalType DoubleType FloatType Int32Type IntegerType LongType UTF8Type The following types not yet supported and map to raw binary: CompositeType CounterColumnType LocalByPartitionerType ReversedType UUIDType Authentication The Cassandra service currently does not support authentication in the typical manner of a user login. There is no mechanism to pass in a user context such as a user name, password or token. Data security models are still under active development so this will change in the future. As a workaround, use the features available in your client application to implement access control. www.simba.com 6
Write-back Simba ODBC Driver with SQL Connector for Cassandra supports Data Manipulation Language (DML) statements such as INSERT, UPDATE and DELETE statements. These statements expose standard SQL-92 behavior. The driver does not execute the DML statements atomically, as Cassandra does not have transaction support. Because of the fundamental difference between Cassandra and a traditional Relational Database in handling write-back DML queries, there are a few things to take note of when using INSERT, UPDATE, and DELETE statements with the driver. INSERT and UPDATE Cassandra provides an UPSERT operation for modifying data stored in Cassandra. The driver does not provide an INSERT operation in the sense of an operation always adds a new row in the table, nor does the driver provide an UPDATE operation in the sense of updating only an existing column of an existing row. The driver maps INSERT and UPDATE DML queries to the UPSERT operation provided by Cassandra. An UPSERT operation consists of: A row key identifying the row into which the column is to be inserted The column name The column value A timestamp If the row does not already exist, then the UPSERT operation adds the column into a new row using the row key. If the row already exists, then the UPSERT operation modifies the column in the row with the specified column value. If you use the driver to execute an INSERT or UPDATE statement, then the statement adds a new row if a row having the same row key (key column value) does not exist. If the row exists, then the statement updates the existing row. UPDATE Key Column Cassandra does not support updating the row key (key column) of an existing row as a native operation. Simba ODBC Driver with SQL Connector for Cassandra allows you to update the key column by copying all the columns to the row having the new row key (key column value), and then deleting the old row. Performing the operation can be time consuming because a row may have millions of columns. To control whether the driver carries out such operations, you can configure the Allow Update Key Column check box in the Advanced Options dialog. See Configuring Advanced Options on page 3 for details. If the feature is off, then the driver returns an error when the user attempts to update the key column. INSERT with Only Key Column Rows in Cassandra cannot exist without having at least one column value. Therefore, if you try to execute an INSERT query with only the key column, then Cassandra ignores the operation. www.simba.com 7
INSERT Without Key Column DELETE Cassandra does not allow you to INSERT a column without specifying the row key. If you do not specify a value for the key column in an INSERT statement, then the driver returns an error. To implement distributed DELETE, Cassandra uses the tombstone concept to mark columns and rows as deleted. Tombstone rows contain a key column and null values in all other columns. In query results, Cassandra returns tombstone rows without indicating that the row is a tombstone. Simba ODBC Driver with SQL Connector for Cassandra provides the Filter Tombstone Row option in the Advanced Options dialog to filter out tombstone rows. For further details, see Configuring Advanced Options on page 3. Identifying tombstone rows can decrease performance of the driver. Catalog Support Simba ODBC Driver with SQL Connector for Cassandra maps Cassandra keyspaces to catalogs, allowing the driver to work easily with various ODBC applications. Contact Us If you have difficulty using the driver, please contact our Technical Support staff. We welcome your questions, comments and feature requests. Technical Support is available Monday to Friday from 8 a.m. to 5 p.m. Pacific Time. Important: To help us assist you, prior to contacting Technical Support please prepare a detailed summary of the client and server environment including operating system version, patch level and configuration. You can contact Technical Support via: E-mail: support@simba.com Web site: www.simba.com Telephone: (604) 633-0008 Extension 3 Fax: (604) 633-0004 You can also follow us on Twitter @SimbaTech www.simba.com 8