How To Improve Performance In A Database



Similar documents
Comparing SQL and NOSQL databases

Big Data Analytics. Rasoul Karimi

How To Scale Out Of A Nosql Database

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

1 File Processing Systems

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Cloud Computing at Google. Architecture

Lecture Data Warehouse Systems

Big Data Analytics. Lucas Rego Drumond

Databases 2 (VU) ( )

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Databases and BigData

NoSQL for SQL Professionals William McKnight

Infrastructures for big data

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Can the Elephants Handle the NoSQL Onslaught?

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

Challenges for Data Driven Systems

Databases in Organizations

Scaling Up HBase, Hive, Pegasus

Big Systems, Big Data

How To Write A Database Program

Data Modeling for Big Data

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Big Data With Hadoop

Lesson 8: Introduction to Databases E-R Data Modeling

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

RDF graph Model and Data Retrival

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Contents RELATIONAL DATABASES

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

2.1.5 Storing your application s structured data in a cloud database

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Choosing The Right Big Data Tools For The Job A Polyglot Approach

How graph databases started the multi-model revolution

NoSQL Databases. Nikos Parlavantzas

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Big Data and Scripting Systems build on top of Hadoop

A programming model in Cloud: MapReduce

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Hypertable Architecture Overview

NoSQL in der Cloud Why? Andreas Hartmann

Foundations of Business Intelligence: Databases and Information Management

An Approach to Implement Map Reduce with NoSQL Databases

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Slave. Master. Research Scholar, Bharathiar University

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Best Practices for Migrating from RDBMS to Amazon DynamoDB

Integrating Big Data into the Computing Curricula

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

OBJECTS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 21

Big Data and Scripting Systems build on top of Hadoop

Overview RDBMS-ORDBMS- OODBMS

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

BCA. Database Management System

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Foundations of Business Intelligence: Databases and Information Management

Cloud Scale Distributed Data Storage. Jürmo Mehine

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Unit Storage Structures 1. Storage Structures. Unit 4.3

AN INTEGRATIVE MODELING OF BIGDATA PROCESSING

Open source large scale distributed data management with Google s MapReduce and Bigtable

Chapter 1: Introduction

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Preparing Your Data For Cloud

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

CSE 132A. Database Systems Principles

ECS 165A: Introduction to Database Systems

Course MIS. Foundations of Business Intelligence

Overview of Database Management Systems

Understanding NoSQL Technologies on Windows Azure

Referential Integrity in Cloud NoSQL Databases

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

Completing the Big Data Ecosystem:

Principles of Distributed Database Systems

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Database Management. Chapter Objectives

Xiaoming Gao Hui Li Thilina Gunarathne

InfiniteGraph: The Distributed Graph Database

Apache HBase: the Hadoop Database

CSE 233. Database System Overview

26/05/2015. Relational Databases BIG DATA: STORING STRUCTURED INFORMATION. Information Retrieval: Storing Unstructured Information

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Modern Databases. Database Systems Lecture 18 Natasha Alechina

NoSQL Database Systems and their Security Challenges

Transcription:

Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1

Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed length record, tree structure Network Model (IDMS) - field, fixed length record; owner, member, set (circular linked list) Relational Model - fixed length, normal forms based on FD & MVD Nested Relation Entity-Relationship Approach - for conceptual database design Object-Oriented (OO) Data Model object, object ID, class hierarchy, inheritance, method, Object Relational Data Model Deductive and Object-Oriented (DOOD) deductive OO rules Semi-structured Data Model (XML data) hierarchical structure 2

Some problems in Relational Model Use FDs and MVDs, but no concept on relationship FDs and MVDs are not relationships, they are integrity constraints Normal forms are to remove redundancies and to reduce updating anomalies. Does redundancy definitely incur updating anomalies? Not always! E.g. supply (S#, Sname, P#, Pname, price) has redundant information on Sname and Pname, but it does not suffer from updating anomalies as we don t change Sname and Pname of suppliers and parts, resp. Concepts: Strong FD, Strong MVD, Strong relationship. E.g. In an invoice of a purchase order, we add the product name, price of product, and total amount of the invoice. These are redundant, but they don t incur updating anomalies we don t change the invoice. Better! Adding redundancy for physical database design may not incur updating anomalies and instead may improve performance significantly. Normal form relations may be bad for the performance of some applications. Any theory? 3

Some problems in Relational Model RDBMS cannot handle multi-valued attributes and composite attributes efficiently. E.g. Nested relation: employee (e#, name, sex, dob, hobby*, qual (degree, university, year )* ) To store employee information, we need 3 normal form relations. To get information of an employee, we need to join the 3 relations, very inefficient and slow. Normal form relations are not efficient for storing and processing multi-valued attributes; nested relation/oo storage is better. 4

Some problems in Relational Model RDBMS join operation is very expensive. SQL is not useful for many applications SQL is a declarative language which operates on a set of tuples at a time basis. MapReduce is an imperative language which operates a record at a time basis. Good for some applications. ACID (Atomicity, Consistency, Isolation, Durability) is to sure consistency of data. Overhead for enforcing ACID is very high and may not be necessary for many applications (such as large data volume applications which don t update the data). 5

NoSQL NoSQL (not only SQL? no join?) databases are categorized according to the way they store the data. 4 major categories for NoSQL databases: 1) Key-value stores Each data/object is stored, indexed, and accessed using a key. Data can be structured or semi-structured or unstructured. E.g. Tables: Customer and Orders Notes. Tag name: value. Can have different tag names for different tuples, semi-structured or unstructured. 6

NoSQL 2) Wide-Column stores (Column-oriented stores) Contain extendable columns of closely related data, E.g. Bigtable Bigtable is a spare, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value is the map is an un-interpreted array of bytes. (row-key, column-key, time) -> string The row key value is a reversed url. Bigtable maintains data in lexicographic order by row key. So webpages in the same domain are grouped together into contiguous rows. Column keys are grouped into sets called column families, which form the basic unit of access control. A column key is named using the syntax: family:qualifier 3) Document Stores Similar to XML data: semi-structured data 7

NoSQL 4) Graph Database Nodes and edges (objects and relationships). Nodes and/or edges are typed an/or labelled? Directed or undirected edges? Edges are weighted? Binary relationships between pairs of nodes. Where to store the binary relationships attributes? On the edges? Problems in handling n-ary relationship types. How? If we store a data graph using a relation edge (from-node, to-node) to traverse from one node to another node (e.g. shortest path problem) will be very slow, needs many joins. So, RDBMS is bad for storing and processing graph data. Graph data can be stored as in-memory database using pointers/node IDs for efficient node traversal. Write programs to solve specific problems Problems. How to express user ad hoc queries? How to present query answers for users to understand? E.g. Steiner tree. 8

SQL vs NoSQL has a fixed schema, many related relations and fixed length vs does not require a schema or uses a semi-structure model declarative standard query language vs different imperative programming languages write queries using SQL vs write programs (e.g. MapReduce programs with API s, etc.) for queries Query optimizer of RDBMS vs optimization done by programmers return accurate/precise query answer vs return an opinion or best guess similar to data mining and IR. 9

SQL vs NoSQL allow updates (rewrite) vs just have new data, no or seldom updates (delete or change). Enforce ACID for consistency vs emphasis on speed performance, use eventual consistent Use join operator vs avoid or no join operator (e.g. use redundant data) to speed up processing For many different various database applications - mission critical transaction systems vs each design for some specific applications - search engines, web-based systems, real-time, cloud DBMS vs data stores? Question: When do we use SQL or NoSQL? Depend on applications. Criteria: Fixed schema? Transactions? ACID required? Precise answers? Ad hoc user queries? Frequent updates? horizontal and/or vertical partition? Very large volume data? Complex algorithms needed to solve the problems and queries (e.g. data mining techniques)? 10

Use of existing database techniques for NoSQL conceptual modeling and Questions Materialized view and introduce redundant data for faster processing. Theory behind. Horizontal and vertical partitioning of data for different applications in physical database designing. Can the data be horizontally and/or vertically partitioned? What overhead will incur? Theory behind. If data can be partitioned horizontally and processed in parallel, and merge the results of the partitions, then Hadoop with MapReduce is a good solution. o o E.g. To sort a big volume of string data, we can partition the string data on the first character, the sort individual partition using a O(n log n) method. If the complexity of the sort method is f(n) = c*n*log n where n is the data size and c is a constant, each partition needs c*n/k*log (n/k) < f(n)/k, so much faster completion time, where k is the number of parallel nodes. However, the total computation operations is about the same. For O(n 2 ) problems, the speed up can be up to K 2 times. Class hierarchy and inheritance for NoSQL data? DOOD rules for NoSQL data? 11

Use of existing database techniques for NoSQL conceptual modeling and Questions Data and Schema integration. Entity resolution (object identification) is not enough we need to consider relationship resolution (relationship identification). Theory behind. We need to handle local/global FD/key, key vs OID of object class. Need to consider n-ary relationship types and their attributes. Temporal aspect is important for data integration like in data warehouse. Q: How to use ORA-semantics for NoSQL data in order to improve the quality/accuracy of the answers, such as in Data/schema integration, keyword query search? 12

Thank you! 13