Big Data Analytics. Rasoul Karimi

Similar documents
Big Data Analytics. Lucas Rego Drumond

NoSQL and Graph Database

Cloud Scale Distributed Data Storage. Jürmo Mehine

How To Improve Performance In A Database

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Preparing Your Data For Cloud

Databases 2 (VU) ( )

RDF graph Model and Data Retrival

Peninsula Strategy. Creating Strategy and Implementing Change

NoSQL Databases. Nikos Parlavantzas

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Lecture Data Warehouse Systems

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

2.1.5 Storing your application s structured data in a cloud database

INTRODUCTION TO CASSANDRA

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Polyglot Persistence

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

Structured Data Storage

Big Data Technologies Compared June 2014

Databases in Organizations

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

The Quest for Extreme Scalability

Slave. Master. Research Scholar, Bharathiar University

Understanding NoSQL on Microsoft Azure

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

NoSQL Systems for Big Data Management

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 30,

An Introduction To Presented by Leon Guzenda, Founder, Objectivity

Can the Elephants Handle the NoSQL Onslaught?

Making Sense of NoSQL Dan McCreary Ann Kelly

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

ADVANCED DATABASES PROJECT. Juan Manuel Benítez V Gledys Sulbaran

Heterogeneous databases mediation

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

Study concluded that success rate for penetration from outside threats higher in corporate data centers


NOSQL DATABASES AND CASSANDRA

Comparing SQL and NOSQL databases

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

How To Handle Big Data With A Data Scientist

NoSQL a view from the top

An Open Source NoSQL solution for Internet Access Logs Analysis

White Paper: Big Data and the hype around IoT

NoSQL Databases and Data Modeling Techniques for a Document-oriented NoSQL Database

REAL-TIME BIG DATA ANALYTICS

MongoDB and Couchbase

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Sentimental Analysis using Hadoop Phase 2: Week 2

Open Source Technologies on Microsoft Azure

EFFECTIVE APPROACHES FOR PROCESSING OF NOSQL DATABASES IN BIG DATA ENVIRONMENT

A Selection Method of Database System in Bigdata Environment: A Case Study From Smart Education Service in Korea

Data sharing in the Big Data era

Enterprise Operational SQL on Hadoop Trafodion Overview

NoSQL and Agility. Why Document and Graph Stores Rock November 12th, 2015 COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED

Applications for Big Data Analytics

AllegroGraph. a graph database. Gary King gwking@franz.com

Rule-Based Engineering Using Declarative Graph Database Queries

A Study of NoSQL and NewSQL databases for data aggregation on Big Data


Understanding NoSQL Technologies on Windows Azure

NoSQL Database Options

How graph databases started the multi-model revolution

OData Extension for XML Data A Directional White Paper

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Data Modeling for Big Data

So What s the Big Deal?

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

The NoSQL Generation: Embracing the Document Model. May 2014

How To Scale Out Of A Nosql Database

A Comparison of Current Graph Database Models

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

NoSQL Evaluation. A Use Case Oriented Survey

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Relational Database Basics Review

Introduction to Big Data Training

Data Modeling in the Age of Big Data

Introduction to Epinomy. Big Data Semantics

How To Write A Database Program

Getting Started with User Profile Data Modeling. White Paper

Completing the Big Data Ecosystem:

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre

12 The Semantic Web and RDF

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Big Systems, Big Data

NOSQL DATABASES IN EEG/ERP

Transcription:

Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1

Introduction Big Data: Storing and accessing large amounts of (unstructured) data Distributed File Systems: allow files to be accessed using the same interfaces and semantics as local files. Distributed Databases: store data in multiple computers. Big Data Analytics 0 / 1

Traditional Database Applications Big Data Analytics 1 / 1

Distributed Applications Big Data Analytics 2 / 1

Relational vs Big-Data Technologies Structure: In relational data bases, the data is structured but in big data technologies the data is raw. Process: Relational data bases were initially created for transactional processing, but big data technologies are for data analysis. Entities: A relational data base is big if it has many entities and many rows per each entity. But in big data technologies, the number of entities is not necessary high. The amount of information that exist for entities is big. horizontal scaling: Adding new nodes in big data technologies can be done with low cost while in relational data base, either it fixed or it is expensive. Big Data Analytics 3 / 1

NoSQL Databases A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design and horizontal scaling. The data structure differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS. Most NoSQL stores lack true ACID transactions. Big Data Analytics 4 / 1

NoSQL Databases Graph: Allegro, Neo4J, OrientDB, Virtuoso Column: Accumulo, Cassandra, HBase Document: Clusterpoint, Couchbase, MarkLogic, MongoDB Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak Big Data Analytics 5 / 1

Graph A graph is a representation of a set of objects where some pairs of objects are connected by links. Example applications of graph in computer science: Automata: self-operating virtual machines to help in logical understanding of input and output process Markov Decision Process: modeling decision making provide a mathematical framework for XML Path Language (XPATH): is a query language for selecting nodes from an XML document. In big data, graphs are used to model entities and their relationships that are distributed in all nodes. Big Data Analytics 6 / 1

Graph databases A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. Big Data Analytics 7 / 1

Graph databases Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. Properties are relevant information that relate to nodes. Edges are the lines that connect nodes to nodes Most of the important information is really stored in the edges. Meaningful patterns emerge when one examines the connections and interconnections of nodes, properties, and edges. Big Data Analytics 8 / 1

Graph databases Compared with relational databases, graph databases are often faster for associative data sets They map more directly to the structure of object-oriented applications. As they depend less on a rigid schema, they are more suitable to manage ad hoc and changing data with evolving schemas. Graph databases are a powerful tool for graph-like queries. Big Data Analytics 9 / 1

Graph queries Reachability queries shortest path queries Pattern queries Big Data Analytics 10 / 1

Resource Description Framework (RDF) RDF is W3C standard format for Linked Data. It is similar to classic conceptual modeling approaches like entity-relationship and class diagram. In RDF, data are subject-predicate-object expressions (triples). For example, The Shirt has the color blue in RDF is as the triple: a subject denoting the shirt, a predicate denoting has, and an object denoting the color blue. A collection of RDF statements represents a labeled, directed multi-graph. Big Data Analytics 11 / 1

Example There is a Person identified by: http : //www.w 3.org/People/EM/contact#me whose name is Eric Miller, whose email address is em@w3.org, and whose title is Dr. < http : //www.w3.org/people/em/contact#me >< http : //www.w3.org/2000/10/swap/pim/contact#fullname > EricMiller < http : //www.w3.org/people/em/contact#me >< http : //www.w3.org/2000/10/swap/pim/contact#mailbox >< mailto : em@w3.org > < http : //www.w3.org/people/em/contact#me >< http : //www.w3.org/2000/10/swap/pim/contact#personaltitle > Dr. < http : //www.w3.org/people/em/contact#me >< http : //www.w3.org/1999/02/22 rdf syntax ns#type >< http : //www.w3.org/2000/10/swap/pim/contact#person > Big Data Analytics 12 / 1

Column Databases Column databases are a tuple (a key-value pair) consisting of three elements: Unique name: Used to reference the column Value: The content of the column. Timestamp: The system timestamp used to determine the valid content. Differences to a relational database Big Data Analytics 13 / 1

Example { } street: name: street, value: 1234 x street, timestamp: 123456789, city: name: city, value: san francisco, timestamp: 123456789, zip: name: zip, value: 94107, timestamp: 123456789, Big Data Analytics 14 / 1

Document Databases A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information. In contrast to relational databases and their notions of Relations (or Tables ), these systems are designed around an abstract notion of a Document. Documents inside a document-oriented database are not required to have all the same sections, slots, parts, or keys. Big Data Analytics 15 / 1

Example 1 { } FirstName: Bob, Address: 5 Oak St., Hobby: sailing Big Data Analytics 16 / 1

Example 2 { } FirstName: Jonathan, Address: 15 Wanamassa Point Road, Children: { Name: Michael, Age: 10, Name: Jennifer, Age: 8, Name: Samantha, Age: 5, Name: Elena, Age: 2 } Big Data Analytics 17 / 1

Document Databases Documents are addressed in the database via a unique key that represents that document. Documents can be retrieved by their key content Documents are organized through Collections Tags Non-visible Metadata Directory hierarchies Buckets Big Data Analytics 18 / 1

XML Databases XML databases are in the category of document-oriented databases. An XML database is a data persistence software system that allows data to be stored in XML format. Two major classes of XML database exist: XML-enabled: The input and output are XML, but they are stored in relational tables. Native XML: In addition to the input and output, the structure of data base is also XML. Big Data Analytics 19 / 1

XML-enabled Databases XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure: XML is stored into a CLOB (Character large object) XML is cut into a series of Tables based on a Schema XML is stored into a native XML Type as defined by the ISO Typically an XML enabled database is best suited where the majority of data are non-xml. Big Data Analytics 20 / 1

Native XML databases Defines a (logical) model for an XML document and stores and retrieves documents according to that model. Has an XML document as its fundamental unit of (logical) storage. Need not have any particular underlying physical storage model. Additionally, many XML databases provide a logical model of grouping documents, called collections. Big Data Analytics 21 / 1

querying Languages in XML databases All XML databases now support at least one form of querying syntax: XPath XSLT XQuery Big Data Analytics 22 / 1

Key Value stores Key Value stores use the associative array as their fundamental data model. In this model, data is represented as a collection of key value pairs. The key value model is one of the simplest non-trivial data models. Example: { Great Expectations : John, Pride and Prejudice : Alice, Wuthering Heights : Alice } Big Data Analytics 23 / 1

Object Database An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Most object databases also offer some kind of query language, allowing objects to be found using a declarative programming approach (OQL) Access to data can be faster because joins are often not needed. Many object databases offer support for versioning. They are specially suitable in applications with complex data. Big Data Analytics 24 / 1

NoSQL databases on the cloud NoSQL databases can be run cloud platforms like Amazon Web Services. There are three common deployment models for NoSQL on the cloud: Virtual machine image: cloud platforms allow users to rent virtual machine instances for a limited time and run a NoSQL database on these virtual machines. Database as a service: some cloud platforms offer options for using familiar NoSQL database products as a service, such as MongoDB, Redis and Cassandra, without physically launching a virtual machine instance for the database. Big Data Analytics 25 / 1