Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37
Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4 NoSQL Approaches Slides Slides are based on Introduction to Databases course from Stanford University by Jennifer Widom Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 2 / 37
NoSQL Motivation NoSQL: What is in the name? Very often SQL = traditional relations DBMS Experience from the past decade: not every data management/analysis problem is best solved using a traditional relational DBMS NoSQL = not using traditional relational DBMS NoSQL don t use SQL language NoSQL = Not Only SQL Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 3 / 37
NoSQL Motivation Databases: Types of problems DBMS Not every data management/analysis problem is best solved using a traditional DBMS Database Management System (DBMS) provides... efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 4 / 37
NoSQL Motivation Databases: Types of problems Not every data management/analysis problem is best solved using a traditional DBMS Convenient Multi-user Safe Persistent Reliable Massive Efficient Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 5 / 37
NoSQL Motivation Databases: Types of problems Not every data management/analysis problem is best solved using a traditional DBMS E.g. a tagging system http://delicious.com/ http://www.bibsonomy.org/ The relation between tags and URLs is a many-to-many (m : n) relation Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 6 / 37
NoSQL Motivation Tagging url id url 1 kti.tugraz.at 2 www.tugraz.az...... Table: URL table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 7 / 37
NoSQL Motivation Tagging tag id tag 1 knowledge 2 data 3 school...... Table: Tag table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 8 / 37
NoSQL Motivation Tagging tag id url id 1 1 2 1 3 2...... Table: Bookmark table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 9 / 37
NoSQL Motivation Tagging To retrieve tags and URLs you JOIN the tables and select records that fulfill a criteria E.g. to form a tag cloud http://www.bibsonomy.org/ How to retrieve related tags? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 10 / 37
NoSQL Motivation Tagging Retrieve all URLs for a tag, then for each URL retrieve all tags Hub tags have millions of URLs That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 11 / 37
NoSQL Motivation Tagging Retrieve all URLs for a tag, then for each URL retrieve all tags Hub tags have millions of URLs That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here? Relational data model is not well-suited for tagging Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 11 / 37
Tagging NoSQL Motivation knowledge data school... kti.tugraz.at www.tugraz.at... Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 12 / 37
NoSQL Motivation Tagging Graph-based model is better suited Bipartite graph Finding related tags is equivalent to traversing graphs E.g. breadth-first search Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 13 / 37
NoSQL Systems NoSQL systems Alternative to traditional relational DBMS Advantages Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency higher performance & availability Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 14 / 37
NoSQL Systems NoSQL systems Alternative to traditional relational DBMS Disadvantages No declarative query language more programming Relaxed consistency fewer guarantees Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 15 / 37
NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Load into database system Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 16 / 37
NoSQL Examples Example: Web log analysis Relational DBMS 1 Data cleaning 2 Data extraction 3 Verification 4 Schema specification NoSQL 1 Nothing Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 17 / 37
NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all records for: Given UserID Given URL Given timestamp Certain construct appearing in additional-info None of the operations requires SQL: NoSQL Highly parallelizable Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 18 / 37
NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all pairs of UserIDs accessing same URL Here a SELFJOIN is required But very unlikely that we will have such a query at all Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 19 / 37
NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender,... Task: Find average age of user accessing given URL SQL-like Some aspects of NoSQL: do we need consistency? Statistical operation: e.g. we could sample Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 20 / 37
NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of given user Simple operation: we do not need a complicated query language Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 21 / 37
NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of friends given user The number of JOINs increases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 22 / 37
NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all women friends of men friends of given user The number of JOINs increases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 23 / 37
NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of friends of friends of... friends of given user The number of JOINs explodes Graph-based data model Do we need consistency? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 24 / 37
NoSQL Examples Example: Wikipedia Large collection of documents Combination of structured and unstructured data Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900 Not suitable for SQL systems because of the mix of structured and unstructured database Do we need consistency? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 25 / 37
NoSQL Examples NoSQL systems Alternative to traditional relational DBMS Advantages Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency higher performance & availability Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 26 / 37
NoSQL Examples NoSQL systems Alternative to traditional relational DBMS Disadvantages No declarative query language more programming Relaxed consistency fewer guarantees Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 27 / 37
NoSQL Approaches Types of NoSQL systems Map-Reduce framework Key-value stores Document stores Graph database systems Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 28 / 37
NoSQL Approaches Map-Reduce Framework Originally from Google Open source Hadoop No data model, data stored in files User provides specific functions Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 29 / 37
NoSQL Approaches Map-Reduce Framework System provides data processing glue, fault-tolerance, scalability Map and Reduce functions Map: Divide problem into subproblems Reduce: Do work on subproblems, combine results Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 30 / 37
NoSQL Approaches Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Create(key,value), Retrieve(key), Update(key), Delete(key) CRUD operations Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 31 / 37
NoSQL Approaches Key-Value Stores Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, eventual consistency Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 32 / 37
NoSQL Approaches Key-Value Stores Some allow (non-uniform) columns within value Some allow Retrieve on range of keys Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, etc. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 33 / 37
NoSQL Approaches Document Stores Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Create(key,document), Retrieve(key), Update(key), Delete(key) CRUD operations Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 34 / 37
NoSQL Approaches Document Stores Also Retrieve based on document contents Example systems CouchDB, MongoDB, SimpleDB, etc. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 35 / 37
NoSQL Approaches Graph database systems Data model: nodes and links Nodes may have properties (including ID) Links may have labels, roles, or other properties Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 36 / 37
NoSQL Approaches Graph database systems Interfaces and query languages vary Single-step versus path expressions versus full recursion Example systems Neo4j, FlockDB, Pregel, etc. RDF triple stores can map to graph databases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 37 / 37