NoSQL's biggest secret: SQL went nowhere Matthew Revell Director of Developer Advocacy, Couchbase 1
Meet my toaster 2
A toaster Redundancy built-in Balanced input/output Commodity hardware 3
A cruster of toasters Redundancy built-in Balanced input/output Commodity hardware Easily clustered 100% NoSQL! 4
A brief history of data storage 5
This is data 6
But what does it mean? 7
This is data 8
1960: the first commercial database 9
The data model determines the query pattern 10
Hierarchical The GOTO statement of databases 11
Hierarchical 12
Hierarchical CEO CTO CFO CMO SVP HR VP Engineering Chief Architect VP of Comms VP of Demand Gen Director of Product Marketing Head of Media Relations Head of Analyst Relations Head of Event Marketing 13
Network The programmer as navigator 14
Network Employee Employee Name Matthew Revell Name Owen Hughes Office London Office London Title Next Owner Prior Director of Developer Advocacy Owen Hughes Arun Gupta Adam Blackshaw Title Next Owner Prior Head of Pre-Sales Bindi Bhullar Dipti Borkar Matthew Revell 15
Relational (with SQL) Heirarchical A declarative Network/ data query CODASYL language Relational with SQL 16
Object oriented databases Why are we flattening everything? 17
2005-2010: NoSQL 18
Key value email: matthew@couchbase.com { } personal : matthew@understated.co.uk, work : matthew@couchbase.com 19
Document London matthew@couchbase.com: Developer Advocacy matthew@couchbase.com { matthew@couchbase.com james@couchbase.com "city": "London", james@couchbase.com laura@couchbase.com "glasses": true, laura@couchbase.com tom@couchbase.com "team": "Developer Advocacy", laurent@couchbase.com david@couchbase.com "music": "METAAAAAL!" martin@couchbase.com greg@couchbase.com } matt@couchbase.com nic@couchbase.com will@couchbase.com 20
Document London and Developer Advocacy matthew@couchbase.com james@couchbase.com laura@couchbase.com 21
Column and graph 22
Context is all 23
There's always a trade-off Offload from some other data store (i.e. caching) Computation offload Speed Scalability Availability Flexibility in what you store Query flexibility 24
Querying NoSQL 25
It's your problem Photo by Donarreiskoffer. CC-by-3.0 26
Manual 2i 27
Map/Reduce 28
Declarative query 29
NoSQL declarative query DBMS-specific Bold, new options SQL-derivatives 30
MongoDB query db.staff.find({office: 'London'}) Index document contents db.staff.find({office: {$in:['london', 'Amsterdam']}}) db.staff.insert({name: 'Matthew Revell', office: 'London'}) and query natively db.staff.update({name: 'Matthew Revell', office: 'Amsterdam'}) 31
JSONiq XQuery for JSON Declarative language for JSON Functional,composable, set-based 32
JSONiq for $p in collection('staff') where $p.serviceyears gt 2 let $name := $p.firstname " " $p.lastname group by $p.office order by $p.serviceyears return { $name, $p.office, $p.serviceyears } 33
JSONiq for $captain in collection("captains"), $movie in collection("movies") [ try { $$.captain eq $captain.name } catch * { false } ] return { "captain" : $captain.name, "movie" : $movie.name } 34
Why SQL? Creative Commons Attribution-Share Alike 2.5 Generic, image by Per Erik Strandberg 35
Cassandra's CQL 36
Cassandra's CQL Really looks like SQL Schema is back No JOINs, no GROUP BY 37
Cassandra's CQL CREATE TABLE authors ( name text, year int, title text, isbn text, publisher text, PRIMARY KEY (name, year, title) ) WITH CLUSTERING ORDER BY (year DESC); http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/ 38
Cassandra's CQL INSERT INTO books (title, author, year) VALUES ('Patriot Games', 'Tom Clancy', 1987); INSERT INTO books (title, author, year) VALUES ('Without Remorse', 'Tom Clancy', 1993); http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/ 39
Cassandra's CQL name year title isbn publisher ------------+------+-----------------+---------------+----------- Tom Clancy 1993 Without Remorse 0-399-13825-0 Putnam Tom Clancy 1987 Patriot Games 0-399-13241-4 Putnam RowKey: Tom Clancy => (name=1993:without Remorse:ISBN, value=0-399-13825-0) => (name=1993:without Remorse:publisher, value=putnam) => (name=1987:patriot Games:ISBN, value=0-399-13241-4) => (name=1987:patriot Games:publisher, value=putnam) http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/ 40
Cassandra's CQL SELECT * FROM authors WHERE name = 'Tom Clancy' AND year >= 1993; http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/ 41
Cassandra's CQL RowKey: Tom Clancy => (name=1996:executive Orders:publisher, value=putnam) => (name=1996:executive Orders:ISBN, value=0-399-13825-0) => (name=1994:debt of Honor:publisher, value=putnam) => (name=1994:debt of Honor:ISBN, value=0-399-13826-1) => (name=1993:without Remorse:publisher, value=putnam) => (name=1993:without Remorse:ISBN, value=0-399-13825-0) => (name=1991:the Sum of All Fears:publisher, value=putnam) => (name=1991:the Sum of All Fears:ISBN, value=0-399-13241-6)... => (name=1987:patriot Games:publisher, value=putnam) => (name=1987:patriot Games:ISBN, value=0-399-13241-4) http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/ 42
Cassandra's CQL Not an ad-hoc query language 43
This story is mostly about JSON 44
SQL++ 45
SQL++ Non-relational data is semi-structured Non-relational data is heterogenous JSON in, JSON out! 46
What must happen to SQL to make it JSON friendly? Data is nested Sometimes data is missing Data is likely to be found in more than one place JOINs need thinking about 47
SQL++ Superset of SQL for semi-structured data Handles missing data gracefully and/or explicitly Can query inside nested data Nests and unnests data JOINs between documents 48
N1QL: SQL++ in practice 49
Couchbase Server 4.0 High availability cache Key-value store Document database N1QL SQL-like query for JSON 50
A profile { "email": "matthew@couchbase.com", "office": "London", "title": "Director of Developer Advocacy", "team": "Developer Advocacy", "manager": "Arun Gupta", "start-date": "2014-01-06", "meet-up-groups": ["London", "Dublin", "Manchester"], "conferences": [ { "name": "OSCON Europe", "location": "Amsterdam", "roles": ["booth", "speaker"], "start-date": "2015-10-26", "end-date": "2015-10-28" }, { "name": "Topconf", "location": "Talinn", "roles": "speaker", "start-date": "2015-11-17", "end-date": "2015-11-18" }, { "name": "Big Data Strategy", "location": "Vilnius", "roles": "speaker", "start-date": "2015-10-05", "end-date": "2015-10-05" } ] } 51
N1QL N1QL implements much of SQL++ Dive into arrays and objects NEST data from JOINs UNNEST data Gracefully handles MISSING data 52
SELECT SELECT email FROM `default` WHERE office = "London"; 53
ARRAY ELEMENTS SELECT conferences[0].name AS event_name FROM `default`; 54
REMOVE MISSING ITEMS SELECT DISTINCT conferences[0].name AS event_name FROM `default` WHERE conferences IS NOT MISSING; 55
WHO IS GOING TO DROIDCON SWEDEN? SELECT email AS person, conferences[0].name AS event FROM `default` WHERE ANY event in conferences SATISFIES event.name = "Droidcon Sweden" END; 56
Updating and deleting DELETE: provide the key to delete the document INSERT: provide a key and some JSON to create a new document UPSERT: as INSERT but will overwrite existing docs UPDATE: change individual values inside existing docs 57
A larger data-set: travel-sample 58
TRAVEL SAMPLE DATA 59
JOINs 60
JOINs Retrieve data from multiple documents in a single query Join within a keyspace/bucket Join across keyspaces/buckets 61
IN A RELATIONAL DATABASE AIRLINES id country iata icao name callsign 5209 United States UA UAL United Airlines UNITED 1355 United Kingdom BA BAW British Airways SPEEDBIRD AIRPORTS id airportname city country alt lat lon icao tz 507 Heathrow London United Kingdom 83 51.4775-0.461389 EGLL Europe/London 3469 San Francisco Intl San Francisco United States 13 37.618972-122.374889 KSFO America/Los_Angeles FLIGHTS id airline source destination equipment day flight utc stops 57047-1 UA LHR SFO 777 0 UA894 02:32:00 0 62
IN JSON { } "callsign": "UNITED", "country": "United States", "iata": "UA", "icao": "UAL", "id": 5209, "name": "United Airlines", "type": "airline" { } "airline": "UA", "airlineid": "airline_5209", "destinationairport": "SFO", "equipment": "777", "id": 57047, "schedule": [ { "day": 0, "flight": "UA894", "utc": "02:32:00" },... ], "sourceairport": "LHR", "stops": 0, "type": "route" { } "airportname": "Heathrow", "city": "London", "country": "United Kingdom", "faa": "LHR", "geo": { "alt": 83, "lat": 51.4775, "lon": -0.461389 }, "icao": "EGLL", "id": 507, "type": "airport", "tz": "Europe/London" 63
A SIMPLE JOIN SELECT * FROM `travel-sample` r JOIN `travel-sample` a ON KEYS r.airlineid WHERE r.sourceairport="lhr" AND r.destinationairport = "SFO"; 64
WHO FLIES LHR->SFO? SELECT DISTINCT a.name FROM `travel-sample` r JOIN `travel-sample` a ON KEYS r.airlineid WHERE r.sourceairport="lhr" AND r.destinationairport = "SFO"; 65
UNNEST Breaks out nested JSON from the results 66
SOMETHING USEFUL SELECT a.name, s.flight, s.utc, r.sourceairport, r.destinationairport, r.equipment FROM `travel-sample` r UNNEST r.schedule s JOIN `travel-sample` a ON KEYS r.airlineid WHERE r.sourceairport="lhr" AND r.destinationairport = "SFO" AND s.day=1 ORDER BY s.utc; 67
Next Steps
Couchbase Developer Portal developer.couchbase.com 69
SQL++ paper http://arxiv.org/abs/1405.3631 70
Forums http://forums.couchbase.com 71