Databases for text storage



Similar documents
Querying MongoDB without programming using FUNQL

Sharding with postgres_fdw

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Comparing SQL and NOSQL databases

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Cassandra vs MySQL. SQL vs NoSQL database comparison

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

Integrating Big Data into the Computing Curricula

Open Source Technologies on Microsoft Azure

Time-Series Databases and Machine Learning

MongoDB. An introduction and performance analysis. Seminar Thesis

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

No SQL! no injection? A talk on the state of NoSQL security

Getting Started with MongoDB

CSCC09F Programming on the Web. Mongo DB

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

HTSQL is a comprehensive navigational query language for relational databases.

Frictionless Persistence in.net with MongoDB. Mogens Heller Grabe Trifork

Can the Elephants Handle the NoSQL Onslaught?

Intro to Databases. ACM Webmonkeys 2011

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

MongoDB Developer and Administrator Certification Course Agenda

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Monitoring Agent for PostgreSQL Fix Pack 10. Reference IBM

An Approach to Implement Map Reduce with NoSQL Databases

Practical Cassandra. Vitalii

Document Oriented Database

Microsoft Azure Data Technologies: An Overview

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Introduction to MongoDB. Kristina Chodorow

CSI 2132 Lab 3. Outline 09/02/2012. More on SQL. Destroying and Altering Relations. Exercise: DROP TABLE ALTER TABLE SELECT

Why Zalando trusts in PostgreSQL

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

There and Back Again As Quick As a Flash

Below is a table called raw_search_log containing details of search queries. user_id INTEGER ID of the user that made the search.

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

An Open Source NoSQL solution for Internet Access Logs Analysis

Lecture 21: NoSQL III. Monday, April 20, 2015

Big Data Analytics. Rasoul Karimi

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

SQL Server In-Memory by Design. Anu Ganesan August 8, 2014

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

ADVANCED DATABASES PROJECT. Juan Manuel Benítez V Gledys Sulbaran

Comparing Scalable NOSQL Databases

Cloud Powered Mobile Apps with Azure

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

How To Write A Store On A Microsoft Database On A Flash (Orchestra) On A Linux Computer (Ortho) On An Ipro (Orroster) On Your Computer Or Your Computer (For Ahem) On The

iservdb The database closest to you IDEAS Institute

SQL - QUICK GUIDE. Allows users to access data in relational database management systems.

Applications for Big Data Analytics

PSU SQL: Introduction. SQL: Introduction. Relational Databases. Activity 1 Examining Tables and Diagrams

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Introduction to SQL for Data Scientists

CSCI110 Exercise 4: Database - MySQL

INTRODUCING AZURE SEARCH

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Python and MongoDB. Why?

DbSchema Tutorial with Introduction in MongoDB

Database Design Patterns. Winter Lecture 24

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

EDG Project: Database Management Services

SURVEY ON MONGODB: AN OPEN- SOURCE DOCUMENT DATABASE

DBMS / Business Intelligence, SQL Server

Log management with Logstash and Elasticsearch. Matteo Dessalvi

NO SQL! NO INJECTION?

How, What, and Where of Data Warehouses for MySQL

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

ICAB4136B Use structured query language to create database structures and manipulate data

Data storing and data access

Linas Virbalas Continuent, Inc.

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Using distributed technologies to analyze Big Data

NoSQL: Going Beyond Structured Data and RDBMS

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

The software shall provide the necessary tools to allow a user to create a Dashboard based on the queries created.

Flatten from/to Relational

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Oracle Database: SQL and PL/SQL Fundamentals NEW

Programming Against Hybrid Databases with Java Handling SQL and NoSQL. Brian Hughes IBM

Extracting META information from Interbase/Firebird SQL (INFORMATION_SCHEMA)

Transcription:

Databases for text storage Jonathan Ronen New York University jr4069@nyu.edu December 1, 2014 Jonathan Ronen (NYU) databases December 1, 2014 1 / 24

Overview 1 Introduction 2 PostgresSQL 3 MongoDB Jonathan Ronen (NYU) databases December 1, 2014 2 / 24

Why Databases? Structured way to store your data Accessible, shareable Manage growing volumes of data You cannot keep all of your data in working memory... indexing Jonathan Ronen (NYU) databases December 1, 2014 3 / 24

Basic issues with databases Inserting data Schema Querying Indexing Jonathan Ronen (NYU) databases December 1, 2014 4 / 24

I ll show you how to do this in PostgresSQL MongoDB Jonathan Ronen (NYU) databases December 1, 2014 5 / 24

PostgreSQL Relational DB Which means we define tables with columns and relations Queried using Structured Query Language ES-QUE-ELL, or SEQUEL, but not SQUEAL opensource, free, very fast, advanced text search capabilities Friendly elephant logo Jonathan Ronen (NYU) databases December 1, 2014 6 / 24

Basics of SQL Jonathan Ronen (NYU) databases December 1, 2014 7 / 24

Basics of SQL Jonathan Ronen (NYU) databases December 1, 2014 8 / 24

Basics of SQL Jonathan Ronen (NYU) databases December 1, 2014 9 / 24

Basics of SQL SELECT statement SELECT * FROM tweets WHERE user id=2170941466; SELECT statement with time range SELECT * FROM tweets WHERE timestamp > 2014-12-2 ; SELECT statement with LIKE SELECT * FROM tweets WHERE lower(text) LIKE %obama% ; Jonathan Ronen (NYU) databases December 1, 2014 10 / 24

Indexing Imagine searching through a table: id user id timestamp text 1 1 2014-11-30 10:23:40 I love the biebsssss! 2 2 2014-11-30 11:33:44 Bieberboy make me a baby! 3 1 2014-11-30 10:23:23 God if biebs dont come i shoot myself!! 4 3 2014-11-30 9:12:11 I love bieber so much i have bieber san 5 2 2014-11-30 12:33:10 RT if you love biebsbs as much ias me! Find me all tweets since noon. Jonathan Ronen (NYU) databases December 1, 2014 11 / 24

Indexing Imagine searching through a table: id user id timestamp text 4 3 2014-11-30 9:12:11 I love bieber so much i have bieber san 3 1 2014-11-30 10:23:23 God if biebs dont come i shoot myself!! 1 1 2014-11-30 10:23:40 I love the biebsssss! 2 2 2014-11-30 11:33:44 Bieberboy make me a baby! 5 2 2014-11-30 12:33:10 RT if you love biebsbs as much ias me! Easy! Sort by time! Jonathan Ronen (NYU) databases December 1, 2014 12 / 24

Indexing An index is a sorted copy of a column. (Or really, it s usually a btree...) timestamp id 2014-11-30 9:12:11 4 2014-11-30 10:23:23 3 2014-11-30 10:23:40 1 2014-11-30 11:33:44 2 2014-11-30 12:33:10 5 Jonathan Ronen (NYU) databases December 1, 2014 13 / 24

Text search in postgres SELECT statement using PG text search SELECT * FROM tweets WHERE to tsvector( english, text) @@ to tsquery( obama ); to tsvector to tsquery (show these in the terminal...) Jonathan Ronen (NYU) databases December 1, 2014 14 / 24

Text indexing CREATE INDEX statement CREATE INDEX text idx ON tweets USING gin(to tsvector( english, text)); SELECT statement using text index SELECT * FROM tweets WHERE to tsvector( english, text) @@ to tsquery( obama ); Jonathan Ronen (NYU) databases December 1, 2014 15 / 24

Aggregation GROUP BY statement SELECT user id, count(*) FROM tweets GROUP BY user id; Jonathan Ronen (NYU) databases December 1, 2014 16 / 24

MongoDB Document store nosql doesn t mean query language isn t structured (but it s different..) opensource, free, really fast (sometimes) Jonathan Ronen (NYU) databases December 1, 2014 17 / 24

JSON documents { } c r e a t e d a t : Wed Aug 13 1 5 : 2 0 : 4 6 +0000 2014, l a n g : en, r e t w e e t c o u n t : 0, t e x t : P e n n s y l v a n i a USA P h i l a d e l p h i a \u00bb Mike u s e r : { name : J e f f, screen name : j e f f e r s o n d o l, s t a t u s e s c o u n t : 207845, d e s c r i p t i o n : #android, #androidgames,# iphon f o l l o w e r s c o u n t : 810, l a n g : en, g e o e n a b l e d : f a l s e, l o c a t i o n : F l o r i d a, } Jonathan Ronen (NYU) databases December 1, 2014 18 / 24

MongoDB is a document database MongoDB lets you store these documents directly No need to flatten to tabular form! Comes with its own query syntax Also uses indexing to speed queries SQL Database Table Row Index Mongo Database Collection Document Index Jonathan Ronen (NYU) databases December 1, 2014 19 / 24

MongoDB Query Syntax Regex matching db. c o l l e c t i o n. f i n d ({ t e x t : /obama /}) Date range db. c o l l e c t i o n. f i n d ({ timestamp : { $gt : new Date ( 2 0 1 4, 1 0, 6 ) } }) Jonathan Ronen (NYU) databases December 1, 2014 20 / 24

Text search in MongoDB Creating a text index db.tweets.ensureindex({text: text }) Using text search db.tweets.findone({text : {search: obama }}) Jonathan Ronen (NYU) databases December 1, 2014 21 / 24

Aggregation in MongoDB Aggregation framework db. t w e e t s. a g g r e g a t e ({ $group : { i d : $ u s e r. screen name, number : { $sum : 1 } } }) Jonathan Ronen (NYU) databases December 1, 2014 22 / 24

SMAPP Some info on the smapp backend: MongoDB with index on tweet id, timestamp, random number (for sampling) No text index (yet!) New!: multiple collection for smappler indexes (smapptoolkit) Jonathan Ronen (NYU) databases December 1, 2014 23 / 24

The End Jonathan Ronen (NYU) databases December 1, 2014 24 / 24