A Selection Method of Database System in Bigdata Environment: A Case Study From Smart Education Service in Korea
|
|
|
- Cleopatra Montgomery
- 10 years ago
- Views:
Transcription
1 Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN A Selection Method of Database System in Bigdata Environment: A Case Study From Smart Education Service in Korea Jong Sung Hwang 1, Sangwon Lee 2, Yeonwoo Lee 1, and Sungbum Park 1 * 1 National Information & Society Agency, Big Data Strategy Center [email protected], [email protected], [email protected] 2 Wonkwang University Division of Information and Electronic Commerce (Institute of Convergence and Creativity) [email protected] *corresponding author Abstract Relational databases have been widely used in organizations for decades. Although relational databases have significant advantages for data preservation and concurrency controlling as the standard model in databases, there are disadvantages as well, such as object relationship inconsistency and heavy dependence on strict database schema. Recently, NoSQL became the alternative for big data utilization in that it can be operated without a specific schema while works well in cluster environment in the 21st century. However, its utilization still has pros and cons. In this backdrop, we do not only enumerate the advantages and disadvantages of NoSQL on relational databases, but also show the comparison between the NoSQL databases. More importantly, this research does not necessarily recommend the specific single type database to build an information system. Instead, using the polyglot persistence concept, we suggest the database management system selection criteria. Then we present an optimal combination of the database system in order to demonstrate our suggestion from the example of the smart education service in Korea. Keywords: Database Selection, Big Data, Polyglot Persistence. 1 Introduction Currently, firms create a large amount of data in their business processes and manage them using a database. However, in the 21 st century, it is impossible for traditional relational databases (RDBs) which have been widely used for decades to store and
2 Jongsung Hwang et al. 10 analyze such a large amount of data due to the cost and performance issues. Accordingly, organizations that manage large amounts of unstructured data are turning to non-relational database now frequently called Nosql database [4]. In this respect, we define the key elements of NoSQL database and its critical factors required for the application of the big data area. Then, after introducing various types of big data DBs, we propose the database selection criteria by inclusively considering the merits of existing database. In fact, Nosql became an alternative for big data utilization not only because it can be operated without specific schema, but also it works well in cluster environment in the 21st century. However, even though relational databases have disadvantages such as object relationship inconsistency and heavy dependence on DB schema, there are advantages as well, such as their significant ability for data preservation and concurrency controlling as a de facto standard model in databases. Against this backdrop, in this paper, we do not necessarily recommend the use of single type database in order to build a big data based information system. Especially, using the polyglot persistence concept as a basis, we exemplify the optimal DB management system (DBMS) and present its combination from specific areas of the information system such as smart education system. The example presented in this paper enables organizations to use data storage suitable to their business circumstances. 2 Changes in Database Model In this paper, we investigate the various database management models, and find technical changes which cannot be resolved properly with the relational database model used for decades. 2.1 Issues on Relational Database Databases provide more outstanding flexibility than file systems when storing a large volume of data using relational databases. We can find and collect the necessary information in many application programs. As another important merit of relational database, enterprise applications have many users. It generates errors and warnings when end users operate same function in the same application concurrently and simultaneously. Relational databases help to manage these problems using concurrency control functions. Even though there s complexity in the concurrency control, it gained acceptance with the efficiency in the transaction mechanism. Finally, in an ecological system where many applications exist and collaborate mutually, relational databases successfully store many applications in the single database using the integrated DB sharing method. In this wise, relational database systems have generally been accepted by the corporations if the data didn t contain many relationships requiring joins of large tables [5]. However, relational databases have significant disadvantages: inconsistency of object relationships between the relational model and the data structure within single user interface. Another defect is the fact that relational database cannot efficiently manage
3 11 A Selection Method of Database System in a large volume of unstructured data which should be supported in clusters, while it usually deals with structured data within the server. 2.2 Advent of Cloud Computing and Internet of Things (IoT) Systems In the past, data were usually defined in advance, processed and then stored in the single database. Current IT technology is evolving into the diffusion of the cloud computing and internet of things system (Fig 1), which commercializes low-cost computing devices by interconnecting them. On the contrary, massive structured/unstructured data sources are generated from various devices and are stored in the clusters. They are social media data, streaming data, Open Graph (OG) data, and internet of things (IOT) devices data [1]. Terminal Therefore, it requires that the data model that support complex and various structures. Recently there has been interest in data stores that do not use a relational database and its manipulation language SQL exclusively. It is so called Nosql movement. Examples are Google s BigTable and Facebook s Terminal Cassandra [5]. Terminal Device Device Device Device Terminal Device Device Device Device Terminal Terminal Terminal Fig. 1 Cloud Computing and IoT Environment 2.3 Nosql Era The Nosql database provides a mechanism that employs less constrained consistency models than relational database to collect and retrieve a large amount of data. Nosql is often highly optimized key-value stores. Nosql databases are used in the various fields of traditional and emerging industries using bigdata based real-time web
4 Jongsung Hwang et al. 12 applications. The purpose of Nosql is simple and easy retrieval and appending additional data processing operations. This feature supports such technical motivations as simplicity of design, horizontal scalability, and finer control over availability. Fig 2 illustates aforementioned Nosql s advantageous features on relational databases in the bigdata utilization environment. Relational DB Node 1 Node 1 NoSQL Client A Client B Node 2 Node 3 Read Entity A Read Write Entity B Client A Client B Node 2 Node 3 Entity A Write Entity B Fig. 2 Relational DB vs. NoSQL DB Relational database horizontally distributes data load (see fig 2). This is where NoSQL solutions differ greatly. They do not distribute a logical entity across multiple tables, it s always stored in one place. They do not enforce referential integrity between these logical entities. This is what allows them to automatically distribute data across a large number of database nodes and also write them independently. If a certain user were to write entity B to a database cluster with 3 nodes, chances are he/she can evenly spread the writes across all of them. A distributed RDBMS cases, Client 1 will either not see any changes until all three nodes acknowledged a two phase commit or will be blocked until that happened. In addition to that synchronization the RDBMS also needs to read data from other nodes in order to ensure referential integrity, all that happens during the transaction and blocks Client B. NoSQL solutions do no such thing for the most part. The fact that such a solution can scale horizontally also means that it can leverage its distributed nature for high availability. This is very important in the cloud, where every single node might fail at any moment. So it is true that NoSQL solutions lack some of the features that define an RDBMS solution due to the reason of scalability. That does however not mean that they are primitive datastores only unstructured and simple. Detail explanation will be followed by the example of Document, Column Family and Graph databases which are the three representative type of the Nosql database [6].
5 13 A Selection Method of Database System in 3 Bigdata Characteristics and Databases 3.1 Diverse Database Models of Big Data Era Column-Family Stores Database Cassandra, HBase, Hypertable, and Amazon Simple DB are notable database product in column-family stores. We define column-family data stores and their structure with reference to Cassandra. Column-families are tuples that consist of key-value pairs, where the key is mapped to a value that is a set of columns (see fig 3). The values are classified into several column-families, and each column-family becomes a data map. Column-Family Stores preserve data that meet key values instead of rows [2]. Each row has many columns related to its key (Row key A in Fig 3a). Column-Family Store guarantees consistency, transaction, availability, various queries, and extensibility of the column-family data. The query functions are performed using the Cassandra Query Language (CQL) with basic query services and indexing techniques. Row Row Key A Column 1 Column 2 Column N Row Row Key B Column 1 Column 2 Column N Fig. 3a Cassandra Data Model Row Event id: fcfdjdslf34324 Name of Application Name of SubFunction User id of Application Fig. 3b Event Logging Store Example using Column-Family Stores Fig. 3 An example of Column-Family Stores Database The best uses for Column-Family Stores are event logging, content management systems, blogging-platform services, and visitor countering services. In managing big data, using column-families might allow the storage of blog entries with tags, categories, links, and trackbacks in different columns (see fig 3b). However, Column- Family Stores are inefficient in terms of complex query and its frequent modifications.
6 Jongsung Hwang et al. 14 Therefore, if measuring distance or calculating on the shortest route are as important as the data itself, using column-family store is not a good solution Document Database Organizations create and manage significant amounts of document in the administration of their business. It is impossible for general relational databases to manage and analyze such a large amount of document data directly. Among the commercialized Document DBs such as CouchDB, Terrastore, and OrientDB, MongoDB is the representative schema-free document DB written in C++ and developed in an open-source. Duplicated Set Primary Data Center Data Center (Secondary or Backup) Application Master Node Mongo A Priority=20 Slave Node Mongo B Priority=20 Mongo C Priority=10 Fig. 4a Priority Assignment Duplicated Set Primary Data Center Data Center (Secondary or Backup) Master Node Mongo A Priority=20 Application Slave Node Mongo B Priority=20 Mongo D Priority=20 Mongo C Priority=10 Fig. 4b Node Insertion Fig. 4 An Example of Document Database MongoDB closed the gap between the fast and highly scalable column-family-stores and the feature-rich traditional relational database management systems. Document databases are considered to be the next logical step of simple column-family-stores with slightly more complex and meaningful data structures. Document database encapsulates a key/value-pairs in documents. On the other hand, there is no strict schema to which documents have to conform, and this eliminates the need for schema migration efforts. Documents in the form of XML, JSON, BSON, and so on are
7 15 A Selection Method of Database System in major concepts of Document databases. When inserting a new attribute, there is no need to define the attribute and change the existing document. MongoDB guarantees consistency, atonic transaction, availability, various queries, and extensibility of document data. Fig 4a shows an example of assigning priority to each node. Fig 4b shows an example of inserting a new node MongoDB into the existing duplicated set. The best uses for Document DBs are content management systems, blogging platforms, Web analysis, real-time analysis, and applications for electronic commerce. However, Document DBs are not a solution for web site with complex transactions or queries having dynamically changing calculation structure. The document data model - Document DBs - facilitate website construction because the data model supports unstructured data natively and does not require costly and time-consuming migrations when the application requirements change while Column-Family stores are not adequate for prototyping or technical reviewing premised on the website refactoring Graph Database Bigdata analysis requires expressing various relationships between entities in graphical form. Graph databases use graph structures with nodes, edges, and properties to represent and store data. Nodes are extremely similar in nature to objects, with which object-oriented programmers are familiar. Based on graphic theory, every element contains a direct pointer to its adjacent element. In addition, no index lookups are necessary (fig 5). Compared to relational databases, graph databases are faster for associative data sets and map more directly to the structure of object-oriented applications. Because Graph databases do not typically require expensive join operations, they can scale to large data sets. In the meantime, relational databases are typically faster at performing the same operation on large numbers of data elements, but Graph databases are more suitable for managing ad hoc and changing data with te evolving schema because they depend less on rigid schemas. Similar to Column-Family Stores and Document databases, Graph databases guarantee consistency, transaction, availability, various queries, and extensibility of column-family data. The examples shown in Fig 6 are relationships with attributes and node partitions, respectively. Fig. 5 Graph Database Structure Example
8 Jongsung Hwang et al. 16 Fig. 6 Attribute Relationship Example The best uses for Graph databases are linked data, routing, dispatch, location-based services, and recommendation services. However, Graph databases do not support an optimal solution for entity updates, especially when modifying entities or attributes. Consequently, Graph databases are a powerful tool for graph-like queries, for example, computing the shortest path between two nodes in a graph. Other graph-like queries can be performed with Graph databases in a natural way, such as graph diameter computations or community detection. 3.2 Polyglot Persistency and Bigdata DB selection criteria This research introduced three representative type of Nosql database and their features. Table 1 is a list of three Nosql data models and the commercial products. The list is not inclusive of the entire database, but is the list of the Nosql databases commonly encountered. The list also includes the appropriate use cases, according to aforementioned database selection criteria such as scalability, data size, and data structure Column-Family complexity. Stores Scalability Data Size Complexity Data Document Amazon SimpleDB, Cassandra, 1: Types of Data Model Database HDB, CouchDB, OrientDB, Hyper RavenDB, MongoDB, Table Terrastore Middle High Middle High Middle Low Model Database Examples Graph Database Block Infinity OrientDB Graph, HyperGraphDB, Neo4J, Low Low High
9 17 A Selection Method of Database System in To recap previous chapter, the common characteristics of Nosql are: (1) It has no schema, and Nosql systems allow the use of SQL-like query languages. (2) Nosql works well in clusters. (3) Nosql is open source and is developed for the Web environment of the 21st century. However, we do not think that the traditional relational databases will disappear, but they will continue to be used in many private and public ICT projects. The familiarity, reliability, versatility, and technical support are all compelling factors of relational databases that the majority of projects cannot be overlooked. What is different from the past is the fact that we consider relational databases an option with regard to data storage. Such perspective is referred to as the polyglot persistency [7]. Instead of selecting RDBs simply because everyone else is, one should select data storage considering the nature and characteristic of the data. Through the following cast study, we introduce a database selection criteria in bigdata environment using polyglot persistency. 4. Case Study: Database development for Smart Education using Bigdata 4.1 Smart Education Reform and Educational Bigdata in Korea Through the diffusion of computerization in the mid 1980s, digitized educational information started to disseminate with Computer Aid Instruction (CAI) education in Korea. Later, distribution of the Internet in the 1990s led to the expansion of Webbased education. Since then, educational data format has been defined complying the Korea Data Education Metadata (KEM) belong to the Korean Industrial Standards (KS). Educational data have been classified into resource types, such as practice, simulation, questionnaire, diagram, figure, graph, index, slide, tables, statements, test, experiment, problem statements, self-diagnosis, lectures, and more, then saved and managed by relational database [3]. However, existing complex data definitions and unified storing method are too complex and inefficient considering schema-free data from multiple resources and multiple types of databases available in the bigdata environment. With these considerations, it is required to re-classify and selecting database for educational data. Educational data can largely be classified into unstructured or semi-structured teaching-learning data that is generated from teacher-student interactions, and non-teaching-learning data, such as structured educational administration data that can be saved into fixed fields. Structured data refer to the information related to the school s current state of affairs, teacher and student information, personnel-payroll information, school administration information, health and meal information, and academic achievements. Semi-structured data refers to student/teacher evaluation data/schemas, E-learning content, e-books, teacher s guides, and e-textbooks, etc. Unstructured data refer to s, blogs, social networking services (SNS), edutainment, digital textbooks, and image/video/audio data, etc.
10 Jongsung Hwang et al. 18 Since 1996, semi-structured teaching-learning data have been shared and distributed through the website by the Korean government. In addition, Education Broadcasting Station (EBS) started televising educational video clips out of unstructured data. And it established and operates the Education Data Resource Bank (EDRB, to store them. However, such services are systems only designed for search and collection, which is different from more student-specific, personalized instruction service through the appropriate mixed application of the current relational database system and Nosql technology. 4.2 Bigdata Service Model for Smart Education The Korean government is currently conducting a project to provide optional personalized learning, to amplify each student s learning potential, and to provide equal education opportunities for everyone through the Bigdata based Smart Education services. The Smart Education Strategic Promotion Plan was established aiming for 2015, and it is being used to motivate reforms to the education system. In order that the smart education service is incorporated, education data should be used to implement optimal strategies for education policy, education systems, education methods, etc. The Smart Education Strategic Promotion Plan is largely classified and promoted in four different areas: (1) Expansion of self-directed time: classes and study hours that were limited to specific times become available anytime, as required. (2) Expansion of adaptive educational capacity: utilize digital textbooks, enable online lectures/evaluation, and conduct personalized education. (3) Expansion of resource-enriched educational content: increase student creativity and problemsolving skills by providing resource-enriched learning materials through the development of digital textbooks and the active use of educational content for public purpose. (4) Expansion of technology-embedded space: promote outside-theclassroom learning at home or on the global stage by creating school infrastructures based on the cloud, and conduct online classes. Structured & Semi-structured data Learning Video Courses Learning Contents Q&A Learning Evaluation Profile Unstructured data Data Mining Statistics Session Friend Data Recommen Shopping dation Cart Data Education Service Open Martket Shopping Cart Data Session Data Education Open Market Contents Profile Price Information Online Evaluation Open Market Friend Recommendation Fig. 7 Learning Support through Personalized Education Portfolio
11 19 A Selection Method of Database System in The business model for smart education system operates based on bigdata, and performs the role of promoting the e-learning content industry through composing personal training portfolios. Various educational services, such as Session Data, Education, Open Market content, Profile, Price Information, Online Evaluation, and Open Market Friend Recommendation services are provided. Students and parents are provided with an accurate guide through the use of the educational content. Also, industry development is promoted through the use of the course data, learning content data and learning evaluation data. This model utilizes the bigdata-based personal education log, provides personalized education consulting to every student, improves equal educational opportunities and the quality of public education, and opens learning data to promote the development of educational programs necessary for students. In addition, the model provides tailored education courses, personal preferences and an appropriate learning portfolio for educational interests of individual student. 4.3 Polyglot Persistency Theory-based DB Construct Many organizations tend to use the sngle database management system to store business transactions, session management data, and other storage needs such as reporting, BI, data warehousing, or logging information (Fig 8) using with the computer programing perspective. Complex applications combine different types of problems; therefore, choosing the correct language for each job might be more productive than attempting to fit all aspects into a single language. Similarly with the database management perspective, when working on an esmart education service, using an appropriate data store for the open market content that is highly available and support scallability is important, but the same data store cannot assist in finding products bought by customers friends because this is an entirely different question. esmart-education Platform E-learning Content Shopping Cart Session Data Education Open Market Contents Profile Price Information Online Evaluation Open Market Friend Recommendation RDBMS Fig. 8 Use of RDBMS for Entire Application and Storage The relational database management system solution is advantageous when reinforcing existing relationships, but is disadvantageous when discovering new
12 Jongsung Hwang et al. 20 relationships or finding different tables within the same object. For instance, using column-family stores is logical because open market e-learning content shopping carts are usually accessed through user IDs, and once confirmed and paid by customers, can be saved in the relational database system. Similarly, the session data keyed by the session ID should be stored and managed by column-family stores. If products need to be recommended to customers when they place products into their shopping carts, for example, your friends also bought these products or your friends bought these accessories for this product, then introducing a graph data store becomes relevant (Fig 8). The application does not need to use a single data store for all its needs because different databases are built for different purposes, and not all problems can be elegantly solved by a single database. Even using specialized relational databases for different purposes, such as data warehousing appliances or analytics appliances, within the same application can be viewed as polyglot persistence. Shopping Cart Data Session Data Column-Family Store E-Smart-Education Platform Education Open Market Contents Profile Price Information Document Store RDBMS Online Evaluation Open Market Friend Recommendation Graph Store Fig. 9 Polyglot Persistence based database development for Smart Education Service 4 Conclusion Recently, Nosql became the alternative database management system for bigdata. However, Nosql and relational database has their own pros and cons respectively. In this backdrop, we suggested the relational database selection criteria according to the specific area of information system and presented its combination example for esmart education service cases in Kora. This case study was conducted prior to the full scale implementation of smart education service. Thus, this study provides a great opportunity for estimating the type and size of the bigdata related to educational activities, and to propose a desirable database system framework for smart education services.
13 21 A Selection Method of Database System in The introduction of NoSQL data storage technologies will force organization database administrators to think about how to use the new storage. Organizations are used to having uniform relational database management system environments that an organization first uses. Using polyglot persistence, database administrators will have to become more poly-skilled in order to learn how some of these Nosql technologies work, how to monitor these systems, create backups for them, and retrieve and input data into these systems. Once an organization decides to use Nosql, issues such as licensing, support, tools, upgrades, drivers, auditing, and security might arise. Many Nosql technologies are open source and have an active community of supporters; in addition, there are companies that provide commercial support. In traditional relational database management systems, data can be accessed using query tools. With NoSQL databases, there are query tools as well. However, the idea is for the Nosql database management system only managing data while doesn t take care of security. With this approach, the responsibility for the security lies with the application. Now is the time that the flexibility to compose data from various perspectives suitable to the cloud computing environment that is represented by distributed parallel processing is required. In other words, in the current bigdata era, data models must demonstrate flexibly in various ways in order to establish data models in accordance with the complex changing environment of computing systems. References [1] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers, Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, [2] J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Sixth Symposium on Operating System Design and Implementation, pp , [3] Korea Industrial Standards Commission, Information Technology Elementary and Middle School Education Information Metadata KSX7001, [4] Leavitt, N., Will NoSQL databases live up to their promise?, Computer, Vol.43, No.2, , 2010 [5] Vicknair, C., M. Macias, et al., A comparison of a graph database and a relational database: a data provenance perspective, Proceedings of the 48th annual Southeast regional conference, ACM., 2010 [6] Michael Kopp, About:Performance - Application performance, scalability and architecture NoSQL or RDBMS? Are we asking the right questions?, [7] Sadalage, P. J. and Fowler, M., 2012, NoSQL distilled - A brief guide to the emerging world of polyglot persistence, Pearson Education, Inc.
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
Big Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
NoSQL Databases. Polyglot Persistence
The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Databases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace
Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Slave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services
NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services Pouria Amirian, Adam Winstanley, Anahid Basiri Department of Computer Science,
Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
NoSQL Systems for Big Data Management
NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28 Outline 1 An Overview of NoSQL
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Understanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
The emergence of big data technology and analytics
ABSTRACT The emergence of big data technology and analytics Bernice Purcell Holy Family University The Internet has made new sources of vast amount of data available to business executives. Big data is
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
NoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
Application of NoSQL Database in Web Crawling
Application of NoSQL Database in Web Crawling College of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China doi:10.4156/jdcta.vol5.issue6.31 Abstract
Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715
Big Data Facebook Wall Data using Graph API Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715 Outline Data Source Processing tools for processing our data Big Data Processing System: Mongodb
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Domain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
CISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 [email protected] Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Understanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without
NoSQL Database Systems and their Security Challenges
NoSQL Database Systems and their Security Challenges Morteza Amini [email protected] Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2
REAL-TIME BIG DATA ANALYTICS
www.leanxcale.com [email protected] REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
Microsoft Azure Data Technologies: An Overview
David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
NoSQL Evaluation. A Use Case Oriented Survey
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
The Quest for Extreme Scalability
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #13: NoSQL and MapReduce Announcements HW4 is out You have to use the PGSQL server START EARLY!! We can not help if everyone
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies
Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment
Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Subject NoSQL Databases - MongoDB Submission Date 20.11.2013 Due Date 26.12.2013 Programming Environment
BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1
GRAPH DATABASE SYSTEMS h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1 Use Case: Route Finding Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies:
CloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli
Leveraging MySQL Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli MySQL is a popular, open-source Relational Database Management System (RDBMS) designed to run on almost
Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
MongoDB Developer and Administrator Certification Course Agenda
MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
How graph databases started the multi-model revolution
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the
X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
