Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment



Similar documents
DYNAMIC QUERY FORMS WITH NoSQL

Getting Started with MongoDB

MongoDB Developer and Administrator Certification Course Agenda

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

MONGODB - THE NOSQL DATABASE

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Can the Elephants Handle the NoSQL Onslaught?

Cloud Scale Distributed Data Storage. Jürmo Mehine

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

How To Handle Big Data With A Data Scientist

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

NoSQL Database Options

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

An Approach to Implement Map Reduce with NoSQL Databases

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

NoSQL: Going Beyond Structured Data and RDBMS

Understanding NoSQL Technologies on Windows Azure

INTRODUCTION TO CASSANDRA

Structured Data Storage

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

NoSQL Databases. Polyglot Persistence

Microsoft Azure Data Technologies: An Overview

1. INTRODUCTION TO RDBMS

How To Use Big Data For Telco (For A Telco)

Understanding NoSQL on Microsoft Azure

Application of NoSQL Database in Web Crawling

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Oracle Big Data SQL Technical Update

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

How To Compare The Economics Of A Database To A Microsoft Database

NoSQL Databases. Nikos Parlavantzas

PHP and MongoDB Web Development Beginners Guide by Rubayeet Islam

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Predictive Research Inc., Predict & Benefit

Slave. Master. Research Scholar, Bharathiar University


HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

Comparing SQL and NOSQL databases

Integrating Big Data into the Computing Curricula

In Memory Accelerator for MongoDB

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Database Scalability and Oracle 12c

Developing Web Applications for Microsoft SQL Server Databases - What you need to know

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Performance Analysis for NoSQL and SQL

GigaSpaces Real-Time Analytics for Big Data

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

An Oracle White Paper May Oracle Database Cloud Service

How To Scale Out Of A Nosql Database

NoSQL in der Cloud Why? Andreas Hartmann

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

EVALUATION OF SERVER-SIDE TECHNOLOGY FOR WEB DEPLOYMENT

Open Source Technologies on Microsoft Azure

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

BIG DATA SOLUTION DATA SHEET

Cloud Computing at Google. Architecture

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

CSCC09F Programming on the Web. Mongo DB

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Big data and urban mobility

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

MongoDB and Couchbase

So What s the Big Deal?

CitusDB Architecture for Real-Time Big Data

HP OO 10.X - SiteScope Monitoring Templates

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

Oracle Database 10g Express

Postgres Plus Advanced Server

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Navigating the Big Data infrastructure layer Helena Schwenk

NoSQL document datastore as a backend of the visualization platform for ECM system

Scalable Architecture on Amazon AWS Cloud

Sisense. Product Highlights.

Practical Cassandra. Vitalii

NoSQL Data Base Basics

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Resources You can find more resources for Sync & Save at our support site:

EMBL-EBI. Database Replication - Distribution

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

DbSchema Tutorial with Introduction in MongoDB

Data Modeling for Big Data

Big Data and Data Science: Behind the Buzz Words

Document Oriented Database

Transcription:

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Subject NoSQL Databases - MongoDB Submission Date 20.11.2013 Due Date 26.12.2013 Programming Environment Platform Indepented (Java/C#/C++) Advisors R.A. Ahmet Selman BOZKIR (Dr. Sevil Şen) Background For the past forty-some years, relational databases have ruled the data world. Relational models first appeared in the early 1970s thanks to the research of computer science pioneers such as E.F. Codd. Early versions of SQL-like languages were also developed in the early 70s, with modern SQL appearing in the late 1970s, and becoming popular by the mid-1980s [1]. Interactive applications have changed dramatically over the last 15 years, and so have the data management needs of those apps. Today, three interrelated megatrends Big Data, Big Users, and Cloud Computing are driving the adoption of NoSQL technology. And NoSQL is increasingly considered a viable alternative to relational databases, especially as more organizations recognize that operating at scale is better achieved on clusters of standard, commodity servers, and a schema-less data model is often better for the variety and type of data captured and processed today [2]. NoSQL databases offer various benefits over traditional relational databases such as MySQL, SQL Server or Oracle. These benefits can be listed briefly as follows [2]: Big Users Not that long ago, 1,000 daily users of an application was a lot and 10,000 was an extreme case. Today, with the growth in global Internet use, the increased number of hours users spend online, and the growing popularity of smartphones and tablets, it's not uncommon for apps to have millions of users a day. Big Data Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the everexpanding array of data being captured. 1

More Flexible Schema-less Data Model Relational and NoSQL data models are very different. The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well. When looking up data, the desired information needs to be collected from many tables (often hundreds in today s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables. NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for webbased applications. Another major difference is that relational technologies have rigid schemas while NoSQL models are schemaless. Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal, extremely disruptive and frequently avoided the exact opposite of the behavior desired in the Big Data era, where application developers need to constantly and rapidly incorporate new types of data to enrich their apps. Adopted from [2] Scalability Prior to NoSQL databases, the default scaling approach at the database tier was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology. To support more concurrent users and/or store more data, you need a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables. Big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware typically used so effectively at the web/application server tier. NoSQL databases were developed from the ground up to be distributed, scale out databases. They use a cluster of standard, physical or virtual servers to store data and support database operations. To scale, additional servers are joined to the cluster and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failure making them highly resilient. 2

Because of the reality of new demands in applications, NoSQL database systems are being much preferred. In this experiment it is aimed to employ a very well-known and well-documented mature NoSQL database system named Mongo. MONGODB MongoDB (from "HUMONGOUS") is a cross-platform document-oriented database system. Classified as a "NoSQL" database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster [3]. MongoDB serves the schema-less database design and present those useful features [4]: Document-Oriented Storage JSON-style documents with dynamic schemas offer simplicity and power. Full Index Support Index on any attribute, just like you're used to. Replication & High Availability Mirror across LANs and WANs for scale and peace of mind. Auto-Sharding Scale horizontally without compromising functionality. Querying Rich, document-based queries. Map/Reduce Flexible aggregation and data processing. GridFS Store files of any size without complicating your stack. EXPERIMENT In this experiment you are given a sample person database depicted in Figure 1. In this database there exist some tables: Person, PersonPhone, EmailAddress, Password, Business EntityAddress, Address. Figure 1. Person database 3

These tables are adopted from SQL Server 2008 Adventure Works DB and exported as comma seperatad files format (CSV) for your usage. As can be easily seen at CSV file, some of the fields at tables are removed for simplicity. Currently the fields of tables are listed as below: Person { BusinessEntityID,PersonType,Title,FirstName,MiddleName,LastName,ModifiedDate} PersonPhone { BusinessEntityID,PhoneNumber,PhoneNumberTypeID} Password { BusinessEntityID, PasswordHash, PasswordSalt, ModifiedDate} EmailAddress { BusinessEntityID, EmailAddress} BusinessEntityAddress {BusinessEntityID,AddressID} Address { AddressID, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode} All the tables are connected via primary key BusinessEntityID. Some fields at various fields can contain null values. In CSV files all fields are separated via a column separator character. In this experiment, you are expected to develop an application accessing to a relational database system (You are free to choose a RDBMS in this list: MySQL, SQL Server, PostgreSQL) and MongoDB database system working on this sample database with the following requirements. Please read them carefully before design and implementation. Person table holds 19972 people. First of all you must increase the row number to more than 5.000.000 records by randomly picking the field values and appending to CSV file again by considering unique BusinessEntityID key. You should also do this step for the other tables. (Hint: Develop an application to achieve this) Do not duplicate the same rows. Instead, prefer an original random selection method for generating new records. Then these modified CSV files must be exported to the relational database system and define the primary/foreign key relations at RDBMS. Third, according to NoSQL database paradigm you must create your MongoDB database. This stage requires your research about NoSQL and MongoDB. Note that NoSQL databases are differing than relational ones in terms of db design and usage. You must develop a visual application for querying people by first name, last name, email, city and phone number. Users can select arbitrary numbers of filter. In other words, user can filter records by adding first name = Richard and city = New Jersey or only one filter. However field name will be used only once. Your application must enable users to connect relational DB and Mongo DB with a radio button and user should have the ability to define query terms via text boxes and drop down lists (Dropdown lists will hold field names). If relational DB is selected, your application should connect to RDBMS and run SQL query to retrieve the results in tabular format. If MongoDB is selected then your application must connect to MongoDB server and do the similar operations. Required time (as milliseconds) to complete the querying operations must be written on screen (a label can be used) upon querying. By this way, we can understand the difference of amount of time in querying between relational and NoSQL DB systems for same result set. After querying title, firstname, middlename, lastname, persontype, email,phone number, passwordsalt, addressline1, city and postal code fields must shown to user in 4

a tabular format. At UI side, you can use GridView or DataGrid controls for tabular viewing. Even a large multiline text box for result set viewing is allowed. At implementation stage of your program you are free to select either Java or C#. Microsoft Visual C++ environment is also accepted. Due to selection of your environment you must download and use required MongoDB driver and API. Without an appropriate driver you cannot access MongoDB. This information is also valid for your relational database server access. For instance, if you are using MySQL and C# then you must obtain appropriate drivers. GENERAL RULES First of all, you must learn the theoretical background of NoSQL database paradigm and MongoDB You are free to complete your homework with these defined relational databases (MySQL, SQL Server, PostgreSQL, Oracle) and those defined programming environments (Java, C#, Visual C++). Due to your selection you must use appropriate driver that can be freely downloaded from MongoDB web site. Do not duplicate your original CSV file contents for increasing the number of records. Use random selection and assignment. NOTES: SAVE all your work until the experiment is graded. The assignment must be original, INDIVIDUAL work. You will work as groups. Each group will contain 2 person. SUBMISSONS: The experiment code will be tested under custody of advisor (R.A. Selman Bozkır). Therefore you will not submit the application and your development environment. Advisor will announce the time and place for evaluating your experiment. REFERENCES: 1. http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/, 17.11.2013 2. http://www.couchbase.com/why-nosql/nosql-database 17.11.2013 3. http://en.wikipedia.org/wiki/mongodb 18.11.2013 4. http://www.mongodb.org/ 18.11.2013 5