nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University
What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases trading Consistancy for Availability Easily Scalable (Partitioning) Highly fault tolerent Google, Facebook, Amazon, Twitter et al.
What? Why Non Relational? No complicated Relationships Schema light/free Less inter dependencies Easier scaling Higher fault tolerance Distributed Computing Store and search Hashing & MapReduce
CAP Theorem Choose 2 and work around the other. Eric Brewer UC Berkley Consistency Availability Patritioning
Dropping some of ACID Compromises must be made ACID Atomic All or nothing? partial Consitency eventually consistent Isolation revision history Durability written in stone sometimes
BASE BASE Basically Available Soft State Eventual consistency ASYNC conflict resolution and repair
ID <unique id> <unique id> <unique id> <unique id> Content Key1, Value Key2, Value Key3, Value Key4, Value Key2, Value Key3, Value Key8, Value Key9, Value Key5, Value Key4, Value Key1, Value Key2, Value Key3, Value Key4, Value Key5, Value Key6, Value Key7, Value Key8, Value
Things these DBs do... Easily Highlights Fast processing/specific tasks Usage of distributed queries and operations MapReduce/Hadoop Async reads and writes Fire & Forget Flexible schema (often) And its scalable / easy replecation
Things these DBs do... Easily Distributed storage (performance/fault tolerance) Increased response time and fault tolerance User Request User Request Master Master Master Slave 1 Slave 2 Slave 3 Slave 4
Things these DBs do... Server 2 Distributed storage (locality) North America Server 4 Server 1 Europe Australia Server 3
Things these DBs *dont* do... Easily Issues and challenges ACID goes out of the windows No direct translation SQL< >nosql Relatively new field many similar solutions All solutions have different trade offs
nosql means mostly no SQL Querying nosql Map/Reduce Various query languages RQL rasdaman CQL Cassandra
Map/Reduce Easily distributed method of processing data Fault tolarant map() and reduce() Input reader/partitioner map() Sort and partition reduce() Output writer
Map/Reduce Feburary 14th 2013
Map/Reduce INPUT map() sorting reduce() Chunk_1 map_out_1 Chunk_2 map_out_2 red_out_1 Chunk_3 Chunk_4 map_out_3 map_out_4 magic red_out_2 Chunk_5 map_out_5 Chunk_6 map_out_6 red_out_3 Chunk_7 map_out_7 Chunk_8 map_out_8
Types of nosql DBs Document Databases Key/Value Stores Array Databases Column Oriented Datastores Graph Databases(they exist)
Types of nosql DBs Column Oriented Datastores Indexing over Column families Fast aggregation & searching Inline compression Easy sharding Column Families id Fname Lname Zip Street 1 Joe Shmoe 32818 Cedar 2 Ralph Peters 65636 Birch 3 Mary Lewis 10337 Green Name { Location { Joe,Ralph,Mary Shmoe,Peters,Lewis 32818,65636,10337 Cedar,Birch,Green
Types of nosql DBs Column Oriented Datastores Big Table and its clones Hbase Google Hypertable Facebook, Hulu and StumbleUpon Baidu and Rediff
Types of nosql DBs Key/Value Stores (simple) Some of the earliest nosql early 90's Easily distributed Storage and Searching Hashtable like structure MapReduce Often used as caching engine O(1) ave lookup time [hash] : bytes[n]
Types of nosql DBs Document stores (mostly structured K,V store) MongoDB FourSquare, Shutterfly, Intuit, Github & more CouchDB BBC, Canonical, Cern, Android apps & more Redis Digg, Flicker, StackOverflow, Craigslist & more
Types of nosql DBs Key/Value Stores BerkleyDB Redis MySQL, Bitcoin, MemcachedDB, SVN & more Digg, Flicker, StackOverflow, Craigslist & more Cassandra (CQL) Facebook, Reddit, Twitter, Netflix & many more
Types of nosql DBs Document stores (mostly structured K,V store) Versitile Dynamic schema Eventual consistency Highly Parallelizable Easy replication "_id": "4eea98de1550e2cc04000000": { "lastmodified": "2011-12-15 20:03:26", "name": "Peter Lustig", "avatar": "4eea61a11550e26f7d000000", "email": "Peter.Lustig@void.net", hobbies : sleeping }
Types of nosql DBs Document stores (mostly structured K,V store) MongoDB FourSquare, Shutterfly, Intuit, Github & more CouchDB BBC, Canonical, Cern, Android apps & more Redis Digg, Flicker, StackOverflow, Craigslist & more
Distributed TinyURL crawler CouchDB set it up and relax cluster of unreliable commodity hardware RESTful JSON API Document Store with easy replication Eventual consistency Light weight (runs on phones) Easy replication
Distributed TinyURL crawler TinyURL crawler Quick deploy TinyURL resolver Master Slave architecture Replicating Databases Amazon EC2 spot instances
Distributed TinyURL crawler TinyURL crawler resolver TinyURL http://tinyurl.com/2tx google.com Feburary 14th 2013
Distributed TinyURL crawler TinyURL crawler Master R Amazon EC2 R R R R R R
Any questions? Comments? Thanks for listening Interested in this? Want to know more? #jhuacm on irc.freenode.net +Matthias Lee github.com/madmaze