Getting Started with User Profile Data Modeling. White Paper



Similar documents
A basic create statement for a simple student table would look like the following.

CQL for Cassandra 2.0 & 2.1

NoSQL: Going Beyond Structured Data and RDBMS

Big Data Analytics. Rasoul Karimi

MariaDB Cassandra interoperability

A Brief Introduction to MySQL

Performance Implications of Various Cursor Types in Microsoft SQL Server. By: Edward Whalen Performance Tuning Corporation

Information Systems SQL. Nikolaj Popov

Introduction to Cassandra

Oracle Database 10g Express

Going Native With Apache Cassandra. QCon London, 2014

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

How To Create A Table In Sql (Ahem)

Oracle Database 10g: Introduction to SQL

CQL for Cassandra 2.2 & later

Apache Cassandra for Big Data Applications

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Short notes on webpage programming languages

Persistent Stored Modules (Stored Procedures) : PSM

Using distributed technologies to analyze Big Data

Unit 5.1 The Database Concept

DBMS / Business Intelligence, SQL Server

SQL. Short introduction

Advance DBMS. Structured Query Language (SQL)

Developing Web Applications for Microsoft SQL Server Databases - What you need to know

Simba Apache Cassandra ODBC Driver

Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014

Oracle Database: SQL and PL/SQL Fundamentals

Media Upload and Sharing Website using HBASE

A table is a collection of related data entries and it consists of columns and rows.

Darshan Institute of Engineering & Technology PL_SQL

Microsoft Access 2000

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint InfoPath 2013 Web Enabled (Browser) forms

Relational Databases and SQLite

Developing SQL and PL/SQL with JDeveloper

May 6, DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform

Oracle Database: SQL and PL/SQL Fundamentals NEW

INSTALLING, CONFIGURING, AND DEVELOPING WITH XAMPP

Oracle Database 12c: Introduction to SQL Ed 1.1

Apache Cassandra Query Language (CQL)

Introduction to Database Searching

Managing User Accounts

Objectives. Oracle SQL and SQL*PLus. Database Objects. What is a Sequence?

Oracle Database: SQL and PL/SQL Fundamentals

Use Mail Merge to create a form letter

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Resources You can find more resources for Sync & Save at our support site:

Learning MySQL! Angola Africa SELECT name, gdp/population FROM world WHERE area > !

Optimizing Your Data Warehouse Design for Superior Performance

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Mul$media im Netz (Online Mul$media) Wintersemester 2014/15. Übung 03 (Nebenfach)

MYSQL DATABASE ACCESS WITH PHP

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

System requirements 2. Overview 3. My profile 5. System settings 6. Student access 10. Setting up 11. Creating classes 11

CSI 2132 Lab 3. Outline 09/02/2012. More on SQL. Destroying and Altering Relations. Exercise: DROP TABLE ALTER TABLE SELECT

Oracle Database: Introduction to SQL

Understanding NoSQL Technologies on Windows Azure

Once the schema has been designed, it can be implemented in the RDBMS.

Setting Up Specify to use a Shared Workstation as a Database Server

Oracle Database: Introduction to SQL

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Using AND in a Query: Step 1: Open Query Design

Oracle Fusion Middleware

Can the Elephants Handle the NoSQL Onslaught?

Oracle SQL. Course Summary. Duration. Objectives

SAS Views The Best of Both Worlds

Sequential Query Language Database Networking Using SQL

INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Managing User Accounts

A brief MySQL tutorial. CSE 134A: Web Service Design and Programming Fall /28/2001

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

Database Forms and Reports Tutorial

Structured Query Language. Telemark University College Department of Electrical Engineering, Information Technology and Cybernetics

Oracle8/ SQLJ Programming

INTRODUCTION TO CASSANDRA

Relational Database Basics Review

Title. Syntax. stata.com. odbc Load, write, or view data from ODBC sources. List ODBC sources to which Stata can connect odbc list

SPSS: Getting Started. For Windows

Oracle Database: Introduction to SQL

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

AP Computer Science Java Mr. Clausen Program 9A, 9B

Python Driver 1.0 for Apache Cassandra

RECURSIVE COMMON TABLE EXPRESSIONS DATABASE IN ORACLE. Iggy Fernandez, Database Specialists INTRODUCTION

NoSQL Databases. Nikos Parlavantzas

Access Queries (Office 2003)

Relational Databases for the Business Analyst

How To Use Big Data For Telco (For A Telco)

Transcription:

Getting Started with User Profile Data Modeling White Paper 1

What was old is new again... 3 Fixed schema User Data Pattern 1... 3 Fixed schema on a wide row User Data Pattern 2... 3 Fixed Schema with a dynamic collection User Data Pattern 3... 5 Conclusion... 6 2

What was old is new again Creating static schema in Cassandra may seem unusual, but is frequently done in production data models. Not every table created needs to have dynamic columns. In some cases it's useful to have the data stored in fixed column names with type validation. The following examples will get you started with defining static schema tables and in some cases create hybrid data models that add a dynamic column component for greater flexibility. Fixed schema User Data Pattern 1 To start, we'll work on a very basic table of data that you might find in any relational database. If you have created a table in MySQL or Oracle, this should look very familiar to you. In the syntax below, the username is the only primary key, so we put the declaration to the right of field type. You can use this reference to see the different types that can be used: http://www.datastax.com/docs/1.1/references/cql/cql_data_types CREATE TABLE users ( username text PRIMARY KEY, first_name text, last_name text, address1 text, city text, postal_code text, last_login timestamp ); Once we have our table definition, it should be as equally as familiar to insert data using the INSERT INTO syntax. INSERT INTO users (username,first_name,last_name,address1,city,postal_code,last_login) VALUES ('cstar','cassandra','database','123 Main St','San Mateo','11111','2013-4-4'); Once the data is in, you can use the SELECT command. SELECT first_name, last_name FROM users One large difference between relational databases and Cassandra CQL is the WHERE clause syntax. Because of how the data is stored, you can only use the elements declared in the primary key. Unless you want every row, you need to specify the row key at a minimum. We'll use the next example to dive further into this difference. Fixed schema on a wide row User Data Pattern 2 For the this pattern, we'll again be storing what looks like static row oriented data. The subtle difference here is in how the data is stored by Cassandra. By using multiple fields in the PRIMARY KEY definition, we are specifying that this data will be stored in a wide row. Let's use a playlist model as an example: 3

CREATE TABLE user_playlist ( username text, playlist_name text, song_name text, artist text, list_order int, songid UUID PRIMARY KEY (username, playlist_name, song_name) ); At first glance, it would appear that we are creating a table that will contain multiple rows with 3 unique fields. The first element 'username', will be used as the row key. The remaining elements will be be used by CQL to create unique columns in the underlying storage engine. The result set presented to the user is a large table of data similar to a relational database. Let's insert some data and see how it works: VALUES ('cstar','classic Rock','Stairway to heaven','led Zepplin',1,c9df5407-7dd8-46a4-84ee-1c358106c926); VALUES ('cstar','classic Rock','Sweet home Alabama','Lynyrd Skynyrd',2,d2fb151c-cba8-46ff-bc03-66b32c45434a); VALUES ('cstar','alternative','the Queen is dead','the Smiths',1,24106aa0-10d5-4de1-9dab-a3a652e32c49); VALUES ('cstar','alternative','blue Monday','New Order',2,7a0b82ba- 5a24-4554-903c-1970d1e3aaea); All of the data inserted will be on one row with the key 'cstar', however, we we access that data with CQL, it will appear as many rows. SELECT playlist_name,song_name,artist,list_order from user_playlist WHERE username='cstar'; playlist_name song_name artist list_order ---------------+--------------------+----------------+------------ Alternative Blue Monday New Order 2 Alternative The Queen is dead The Smiths 1 Classic Rock Stairway to heaven Led Zepplin 1 Classic Rock Sweet home Alabama Lynyrd Skynyrd 2 The output is nicely formatted and easier to read by the user. The underlying storage engine has actually created a single row of data using unique column names derived from the PRIMARY KEY definition. The result is a faster access pattern on reads when all data is co-located on the same node and sequentially read from disk. You can read more in-depth here: http://www.datastax.com/docs/1.2/ddl/table 4

Fixed Schema with a dynamic collection User Data Pattern 3 When you need some dynamic element to the static table, this can be accomplished by using the collection feature of CQL. A collection can be a Map, Set or List and can be read about more here: http://www.datastax.com/dev/blog/cql3_collections The example we'll use is a user table with an added requirement to dynamically store one or more locations sorted by time. The map collection can be used to create an ordered pair of data where the key is a timestamp and the value is a location stored as text. CREATE TABLE user_with_location ( username text PRIMARY KEY, first_name text, last_name text, address1 text, city text, postal_code text, last_login timestamp, location_by_date map<timestamp,text> ); First we insert the entire user record. Note: When inserting a collection, you will overwrite any of the previous values if the row key exists. INSERT INTO user_with_location (username,first_name,last_name,address1,city,postal_code,last_login,loc ation_by_date) VALUES ('cstar','cassandra','database','123 Main St','San Mateo','11111','2013-4-4',{ '2013-4-4' : 'San Francisco'}); Then we add a couple of locations for this user. This will be an update so not overwrite the previous location data in the collection. If you use the same map key as a previous data point, you will overwrite just that map value. UPDATE user_with_location SET location_by_date['2013-4-5'] = 'San Diego' UPDATE user_with_location SET location_by_date['2013-4-2'] = 'Santa Rosa' When we select the row of data for the user 'cstar', the result set will contain the entire map of locations. Collections have to be returned completely, so keep that in mind as you design your model. Too much in one collection could cause a lot of data to be returned. SELECT first_name, last_name, location_by_date FROM user_with_location first_name last_name location_by_date ------------+-----------+-------------------------------------- Cassandra Database {2013-04-02 00:00:00-0700: Santa Rosa, 2013-04-04 00:00:00-0700: San Francisco, 2013-04-05 00:00:00-0700: San Diego} You might also notice that the data in the first element of the map is sorted. This is a feature of the map to sort by the type of the key. 5

Conclusion As you can see, creating user based schema can be simple and with some of the CQL features shown, powerful in creating wide row data sets. Cassandra has had the reputation of only fitting large dynamic data sets, but with CQL you can create fixed schema and strong validation typing. There are many other patterns for user profile management. These three should get you started on your user centered data model. 6