Integration and Virtualization of Relational SQL and NoSQL Systems including MySQL and MongoDB
|
|
|
- Scarlett Miles
- 10 years ago
- Views:
Transcription
1 Integration and Virtualization of Relational SQL and NoSQL Systems including MySQL and MongoDB Ramon Lawrence Department of Computer Science University of British Columbia Kelowna, BC, Canada, V1V 2Z3 Abstract NoSQL databases are growing in popularity for Big Data applications in web analytics and supporting large web sites due to their high availability and scalability. Since each NoSQL system has its own API and does not typically support standards such as SQL and, integrating these systems with other enterprise and reporting software requires extra effort. In this work, we present a generic standards-based architecture that allows NoSQL systems, with specific focus on MongoDB, to be queried using SQL and seamlessly interact with any software supporting. A virtualization system is built on top of the NoSQL sources that translates SQL queries into the source-specific APIs. The virtualization architecture allows users to query and join data from both NoSQL and relational SQL systems in a single SQL query. Experimental results demonstrate that the virtualization layer adds minimal overhead in translating SQL to NoSQL APIs, and the virtualization system can efficiently perform joins across sources. Keywords: NoSQL, Big Data, MongoDB, integration, virtualization I. INTRODUCTION There is a revolution going on in database systems. For years, database systems have been primarily based on the relational model, queried using standardized SQL, and accessed using common interfaces such as ODBC and. Although each relational system vendor has minor variations in implemented features, SQL dialect, and system interfaces, relational database systems are relatively interchangeable due to their common acceptance of standards. Recently, numerous NoSQL ( Not Only SQL ) systems have been released and widely adopted in many domains. NoSQL systems have been developed to support applications not well served by relational systems, often involving Big Data processing. NoSQL systems can be categorized as key-value stores, document stores, and graph databases. Importantly, there are no common standard APIs for accessing the different NoSQL systems or standard query languages such as SQL. Typically, users must interact with these systems at a programming level with custom APIs. This reduces portability and requires system-specific code. Although most NoSQL systems do not support SQL, there is no fundamental reason why they could not. The NoSQL label is a misnomer. The value of these systems has nothing to do with SQL support, but rather on their different architectural design decisions in order to achieve scalability and performance. SQL is valuable as it is a standard that allows portability, expressiveness, and has a massive installed base of trained users. As NoSQL systems evolve, there has been recognition of the value of SQL, and several commercial systems are hybrids combining a SQL relational processor with a Big Data (MapReduce) query processor. However, many of the most popular NoSQL databases, such as MongoDB, do not have SQL interfaces for their systems. There has been prior research work on standardizing APIs and using SQL with NoSQL systems. A common programming API allowing querying Redis, MongoDB, and HBase was described in [1]. A SQL engine was integrated on top of HBase in [2], which used the Apache Derby query engine to process joins and other operators not supported by HBase. In [3], a SQL interface over SimpleSQL was defined. Partiqle [4] provides a generic SQL interface over key-value stores. The contribution of this work is a generalizable SQL query interface for both relational and NoSQL systems called Unity. Unity allows SQL queries to be automatically translated and executed using the underlying API of the data sources (NoSQL or relational). Unity is an integration and virtualization system as it allows SQL queries that span multiple sources and uses its internal query engine to perform joins across sources. The key features of Unity are: A SQL query processor and optimizer including support for push down filters and cross-source hash joins. A customizable SQL dialect translator that uses mappings to compensate for different SQL variants in relational systems. A SQL to NoSQL translator that performs a mapping from relational operators in a query plan to NoSQL programming APIs. Relational operators not supported by the NoSQL system are performed using the Unity virtualization engine. SQL function translation allows functions to be mapped between different system implementations (e.g. different order of parameters, different names). Data virtualization allowing queries and joins across both relational and NoSQL sources.
2 The organization of this paper is as follows. In Section 2 is background on relational and NoSQL systems. Motivational use cases are provided in Section 3. The Unity architecture including details on the query processor/optimizer and implementation of the SQL translator is in Section 4. Section 5 presents experimental results, and the paper closes with future work and conclusions. II. BACKGROUND Relational systems have been the dominant form of data storage and manipulation for the last 30 years. The relational model [5] is both theoretically grounded and efficient to implement. In combination with standards such as SQL for querying and interface standards such as ODBC [6] and [7], relational systems are applicable to a wide variety of data problems. Relational vendors primarily differentiate products based on cost versus performance and system administration features. Applications written for one database system are generally portable to other systems, although translating stored procedures and system-specific features is more challenging. The dominance of relational systems has been challenged in the past by object-oriented databases [8] and XML [9]. Object-oriented databases brought advantages like inheritance and type hierarchies and promised a better solution to the impedance mismatch problem where programmers must convert data in relational tables and attributes into program variables and objects. However, relational vendors modified relational systems with object-oriented features and extended SQL to capture most of the advantages of object-oriented systems while preserving the high performance and strengths of the relational model and SQL. Similarly, XML has a different model for data representation and exchange. Relational vendors extended systems to allow columns storing XML data and querying using XML languages like XQuery [10]. NoSQL systems have been proposed to tackle applications and problem domains poorly served by relational databases. These domains are primarily Big Data domains involving web data such as supporting millions of interactive users or performing analytics on terabytes of data such as web logs and click streams. The data in these domains is semistructured, variable, and massive. Creating a suitable relational model may be difficult. There are several different types of NoSQL systems [11] including key-value stores (e.g. HBase 1 ), document stores (e.g. MongoDB 2, CouchDB 3 ), MapReduce systems (e.g. Pig 4, Hive 5 ) and graph databases (e.g. Neo4J 6 ). Key-value stores provide the simplest interface allowing storing and retrieving values using a hash interface. Document stores allow a structured document to be attached to a key. For MongoDB, the representation format is BSON (Binary encoded JSON) MapReduce systems, typically based on Hadoop, allow for large-scale processing of massive data sets on a cluster. There have been a variety of systems built on top of Hadoop including Pig and Hive to make it easier to construct queries rather than writing code. For instance, Hive supports HQL, a variant of SQL. Developers have also extended MapReduce systems by supporting SQL including HAWQ 7, HadoopDB/Hadapt 8, and Phoenix 9, a SQL interface for HBase. These systems often combine relational parallel processing (MPP) technology with Hadoop to speed up query processing. NoSQL systems are often open source and are designed to handle larger data volumes at better performance than relational systems, although there is a debate on their relative performance [12]. Performance comparisons of NoSQL and SQL systems were done in [13]. The other major advantage, especially for systems like MongoDB, is better support for programmers. MongoDB allows a JSON object to be stored which can be easily converted into JavaScript objects in code. This simplicity and flexibility makes using MongoDB easier for many programmers without the need to understand SQL. There has been some prior research in standardization for a variety of NoSQL systems including the SOS Platform [1] that defined a common API for Redis, MongoDB, and HBase. A standard programming API makes it easier to switch between NoSQL systems and improves programmer productivity, but does not address compatibility with existing query and reporting software that assume SQL-based access. In [3], a relational layer supporting SQL queries and joins is added on top of Amazon SimpleDB. SQL query conversion was done using regular expressions and rules rather than using a query parser and optimizer. In [14] a framework for representing NoSQL data in a relational model is presented as well as a system to map SQL queries to NoSQL systems. The prototype system used PostgreSQL and MongoDB and allowed joining relational data with data in MongoDB using the SQL/MED wrapper [15]. In [2], a SQL engine based on Apache Derby is integrated on top of HBase. The system supports indexes and joins in the relational engine even though they are not supported by HBase. There has also been work mapping XQuery to query NoSQL stores [16]. The Partiqle system [4] part of the CloudDB project [17] supports transactional SQL querying over key-value stores. NoSQL and relational systems will likely co-exist for some time, and it is valuable to query them simultaneously. Considerable prior work went into the integration of relational systems using schema matching techniques [18], [19] and integration using view and mediators [20]. The same concepts can be applied to NoSQL systems which appear as restricted relational sources to a mediator and integration system. Our prior work on relational integration [21] has been extended to support the querying, integration, and virtualization of both relational and NoSQL systems
3 III. USE CASES AND MOTIVATION SQL Query Results There are three primary reasons for supporting SQL querying of NoSQL systems: SQL is a declarative language that allows descriptive queries while hiding implementation and query execution details. SQL is a standardized language allowing portability between systems and leveraging a massive existing knowledge base of database developers. Supporting SQL allows a NoSQL system to seamlessly interact with other enterprise systems that use SQL and /ODBC without requiring changes. The key benefit of supporting SQL is to allow for system portability. NoSQL developers accessing a database must use a custom API and query language. Although queries on keyvalue stores are simple, database systems are part of a larger enterprise ecosystem and standards facilitate intercommunication. Numerous applications such as query and reporting software access databases and rely on standards such as SQL and ODBC/. These application vendors are forced to write custom interfaces for their software. This results in slower adoption of NoSQL technologies as vendors will only support systems that get to a certain adoption scale. There are also many benefits to a virtualization system that has a relational query processor that can execute portions of SQL queries that are not executable by a NoSQL system: Allows support for operations not executable by the system such as joins and advanced filtering. Isolates users and programmers from writing complicated data manipulation code to implement operators that are not supported by the system. Allows for data movement and querying between NoSQL and relational systems. IV. ARCHITECTURE The architecture of the Unity system is in Figure 1. Unity consists of a SQL query parser that converts a SQL query into a parse tree and validates the query. A query translator converts the parse tree into a relational operator tree consisting of selection, projection, grouping, and join operators. A query optimizer determines join ordering and portions of the query plan to execute on each data source. The execution system interacts with the data sources to submit queries and retrieve results and then perform any additional operations. Each of these key components are discussed in the following sections. A. SQL Query Parser The SQL query parser implemented using JavaCC supports standard SQL-92 syntax for SELECT, INSERT, UPDATE, and DELETE statements including inner and outer joins, delimited identifiers, subqueries, and GROUP BY, ORDER BY, and HAVING. The parser validates for correct SQL syntax and throws exceptions for improper syntax. The output of the parsing step is a parse tree representing the SQL query. NoSQL API NoSQL Source B. Query Translator SQL Parser Mongo Java API API Parse tree Translator Operator tree Optimizer Execution plan Virtualization Execution Engine Unity Virtualization System MongoDB MySQL SQL Source Fig. 1. Unity Architecture The query translator takes a parse tree and converts it into a relational operator tree. During this conversion, it also validates field and table names provided against the relational schema for the source. For NoSQL systems that do not have a schema, the translator does not validate identifiers and passes them directly to the executor. Invalid identifiers are detected during query execution. Note that when a NoSQL source is accessed in a query with another relational source, the NoSQL source will have a schema defined for it. C. Schema Generator For NoSQL systems such as MongoDB that do not have a schema, the system will optionally generate a schema to describe the data. For MongoDB, this is done by sampling the data in each collection and creating the most general relational model that covers the data. For instance, if the first item in a collection has fields (a, b), the N-th item has fields (a, d, e), and the last item has fields (b, d, f), then the relational schema for the collection will be (a, b, d, e, f). A nested attribute such as c that is contained in attribute a is referenced as a.c. The number of items sampled from a collection can be configured. Once defined, a schema can be optionally stored in a MongoDB collection so that it does not have to be regenerated each time. D. Query Optimizer The query optimizer s goal is to perform as much of the query on the individual data sources, following the general
4 rule of moving the computation to the data to achieve better query performance. For a query involving a single relational source, typically the entire query plan can be executed on the relational system. For NoSQL sources, a SQL query may not be executable in its entirety. For instance key-value stores can only perform simple insertions and row selections by key. Document-value stores can perform queries with filters and some ordering. Subqueries, joins, and grouping/aggregation cannot be typically done by NoSQL systems. For queries involving multiple data sources, the query optimizer must separate out the operators that can be done by each source and push down as much of the computation as possible. Then, it plans the optimal ordering for executing the joins to combine data retrieved from the sources. There are several techniques used in the optimization: Push-down filters - A selection operator is pushed down for execution on a source if possible. Join ordering - Join ordering is determined using a costbased optimizer. The optimizer uses costs that are higher when the join is executed across systems rather than within a data source. Push-down, staged joins - A staged join across two systems will retrieve the rows from one join result and use it to modify the query sent to the other source. Push-down, staged joins are especially powerful. Typically, if a join involves two sources, the data from each source is extracted in parallel using separate threads and then joined in the virtualization engine using hash join. In most cases, this maximizes performance and minimizes data movement. In cases when the number of rows coming back from a source is much smaller than the other, a staged join will be more efficient. The staged join is like a semi-join. The virtualization engine requests and receives rows from the data source that produces the smaller result. It then modifies the query sent down to the other source to have a filter on the join key. The second source will perform the join when it executes its query and only return rows that will be in the join result. The virtualization engine then merges any attributes from the first source query back into the returned rows. The other form of push-down join pushes the entire data set from one source to the other source to have it execute the join. For relational systems, a temporary table is created. This option is selected when the data being joined is very large and the virtualization engine may be slower than leveraging the parallel computing resources of a source. During optimization, the optimizer tries to maximize the size of a query plan sent to each source. An operator cannot be sent to a source if it does not support the operator or it does not support some function used by the operator. Those operators are executed by the virtualization system. E. Execution Engine The virtualization engine has implementations of all relational operators including projection, selection, join (nestedloop, sort-merge, hash), sorting, and grouping. These operator implementations are used in query plans that span multiple sources or contain operators that cannot be executed by the underlying sources. The optimizer will have grouped subtrees in the query plan for each source. A root of a subtree can be converted into a SQL query and executed on the source. For relational systems, a SQL query is built (with the proper dialect), and submitted to the source using. Results are retrieved using a ResultSet. For NoSQL systems, a wrapper interface is constructed around the sources custom API. This wrapper API will accept a SQL query and use the virtualization engine to parse and convert it. F. MongoDB SQL/ For MongoDB, we constructed a driver that accepts SQL queries. It uses the SQL parser to produce a parse tree and relational operator tree. The converter then traverses the relational operator tree and maps the operator to a MongoDB API call or throws an exception if it cannot be executed. G. Function and Dialect Translator The virtualization engine has a database of mappings between SQL dialects. Key syntax components of SQL and common functions are identified. For each data source, a mapping is created for each function and SQL dialect feature explaining how it is supported for a database. With no mapping, the syntax is assumed to not be supported. For example, tables and fields can be aliased using AS. Unity will accept queries that use AS directly or implicitly (as it is optional). Consider a field rename like myfield AS newname. The renaming will be translated and represented in the operator tree. When the operator is converted for a particular database, the translation database is queried for AS with a database dialect id given. The dialect id allows looking up an entry for how to convert AS for that data source. Some always require AS while others do not. This feature is also used to support translation of functions. H. Example Consider a customer and order database: Customer(cid,cname,addr.street,addr.city) Orders(oid,cid,odate,total) Assume that customer information is stored in a MongoDB collection and a sample customer object is: {"cid":1, "cname":"fred", "address": {"street":"main", "city":"new York"} } The orders information is stored in a table in MySQL and a sample row is (100,1, ,35.45). This query returns the total of all orders for customer Fred since January 1st, 2013: SELECT SUM(total) as totalamount FROM mongo.customer C INNER JOIN mysql.orders O ON C.cid=O.cid WHERE C.cname = 'Fred' AND O.odate > ' '
5 Tables are prefixed with the source name. For simplicity, we have used the names mysql and mongo. This query would be optimized into the execution plan in Figure 2. cid cname='fred' customer MySQL Aggregate: SUM(total) Fig. 2. cid (Staged) cid, total odate>' ' orders MongoDB Query Execution Plan Virtualization System The filters are pushed down as each one only applies to a single source. The join and grouping is executed in the virtualization layer. This join is performed using a staged join as the number of results returned from Mongo is small which allows for the MySQL query to be modified to return fewer results. The SQL queries executed by each source are: Mongo: SELECT cid FROM customer WHERE cname = 'Fred' Mongo JSON query format: db.customer.find({"cname":"fred"}, {"cid":1}) MySQL: SELECT cid, total FROM orders WHERE odate > ' ' and cid IN (1) The virtualization system adapts to the properties of the sources. If both of these tables were in MySQL, then the entire query would be executed in MySQL since it supports all operators. If they were stored as MongoDB collections, then two source queries would be executed on MongoDB and the join would be performed at the virtualization layer. V. EXPERIMENTAL RESULTS Unity was evaluated for its performance and overhead. MongoDB was run on an Intel Xeon 2.4 GHz server with 24 GB memory. The client PC used for benchmarking was an Intel PC with 2.8 GHz processor and 8 GB memory. The performance of the SQL to NoSQL translation was tested using an orders database consisting of 15,000 records. Queries and updates were performed using both the MongoDB Java API and the Unity virtualization engine. The experiments performed 10,000 operations each for SELECT on key, INSERT, DELETE, and UPDATE. SELECT (scan) involved randomly finding a starting record and scanning the next 100 records scan operations were performed. The results in Figure 3 are the average of 3 runs. Mongo Unity % Diff. SELECT (key) % SELECT (scan) % INSERT % DELETE % UPDATE % Fig. 3. Query Execution Performance (in seconds) The Unity virtualization driver for most operations has less than 15% overhead. Some overhead is expected as Unity must translate SQL to the Mongo API while also supporting the more complex API. The only exception is INSERT where the overhead is higher. This is primarily because the insertion time is so low that the overhead is more of a factor. As the operations take more time for the server to process, the overhead becomes a smaller percentage. This means that larger data sets with more users would show better performance as the overhead is a fixed-cost independent of the query size, so the relative overhead decreases as the query time increases. The performance of cross-database joins was also tested using two databases: MySQL and MongoDB. The join combines a customer table with 1500 records with an orders table containing 15,000 records. The time to read the base relations from Mongo is 0.9 seconds and from MySQL is 1.5 seconds. MySQL takes 2.7 seconds to produce the join result. Unity can join the data where both tables come from Mongo in 1.1 seconds, and where one table comes from Mongo and the other from MySQL in 1.7 seconds. The Unity virtualization engine has comparable performance in the test query compared to executing the query entirely using MySQL. MongoDB cannot execute joins, so the virtualization engine provides an efficient way to join collections in MongoDB. VI. CONCLUSION Unity is an integration and virtualization system allowing SQL queries over both relational and NoSQL systems. The virtualization layer allows translating SQL queries to NoSQL APIs and automatically executes operations not supported by NoSQL systems. Unity allows NoSQL systems to seamlessly interact with relational databases and enterprise reporting applications. Experimental results show minimal overhead in the SQL translation process. The Unity system has been commercially released as Unity 10 and the translator for MongoDB packaged as a driver
6 Future work involves benchmarking the performance of other supported NoSQL systems such as Cassandra. We are also working on parallelizing the virtualization engine for a cluster environment. REFERENCES [1] P. Atzeni, F. Bugiotti, and L. Rossi, Uniform Access to Non-relational Database Systems: The SOS Platform, in CAiSE, 2012, pp [2] R. Vilaça, F. Cruz, J. Pereira, and R. Oliveira, An Effective Scalable SQL Engine for NoSQL Databases, in Distributed Applications and Interoperable Systems. Springer, 2013, pp [3] A. Calil and R. dos Santos Mello, SimpleSQL: A Relational Layer for SimpleDB, in ADBIS, 2012, pp [4] J. Tatemura, O. Po, W.-P. Hsiung, and H. Hacigümüs, Partiqle: an elastic SQL engine over key-value stores, in SIGMOD Conference, 2012, pp [5] E. F. Codd, A relational model of data for large shared data banks, Commun. ACM, vol. 13, no. 6, pp , [6] ODBC, in Encyclopedia of Database Systems, 2009, p [7], in Encyclopedia of Database Systems, 2009, p [8] W. Kim, Introduction to object-oriented databases. MIT press Cambridge, MA, 1990, vol. 90. [9] T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Extensible markup language (XML), World Wide Web Journal, vol. 2, no. 4, pp , [10] S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, J. Siméon, and M. Stefanescu, XQuery 1.0: An XML query language, W3C working draft, vol. 12, [11] R. Cattell, Scalable SQL and NoSQL data stores, SIGMOD Record, vol. 39, no. 4, pp , [12] M. Stonebraker, SQL databases v. NoSQL databases, Commun. ACM, vol. 53, no. 4, pp , [13] A. Floratou, N. Teletia, D. J. DeWitt, J. M. Patel, and D. Zhang, Can the Elephants Handle the NoSQL Onslaught? PVLDB, vol. 5, no. 12, pp , [14] J. Roijackers and G. H. L. Fletcher, On Bridging Relational and Document-Centric Data Stores, in BNCOD, 2013, pp [15] J. Melton, J. E. Michels, V. Josifovski, K. Kulkarni, and P. Schwarz, SQL/MED: A Status Report, SIGMOD Rec., vol. 31, no. 3, pp , Sep [Online]. Available: [16] H. Valer, C. Sauer, and T. Härder, XQuery processing over NoSQL stores, in Grundlagen von Datenbanken, 2013, pp [17] H. Hacigümüs, J. Tatemura, W.-P. Hsiung, H. J. Moon, O. Po, A. Sawires, Y. Chi, and H. Jafarpour, CloudDB: One Size Fits All Revived, in SERVICES, 2010, pp [18] P. A. Bernstein, J. Madhavan, and E. Rahm, Generic Schema Matching, Ten Years Later, PVLDB, vol. 4, no. 11, pp , [19] E. Rahm and P. A. Bernstein, A survey of approaches to automatic schema matching, VLDB J., vol. 10, no. 4, pp , [20] A. Doan, A. Y. Halevy, and Z. G. Ives, Principles of Data Integration. Morgan Kaufmann, [21] T. Mason and R. Lawrence, Dynamic Database Integration in a, in ICEIS, 2005, pp
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <[email protected]> @GEOFFLANE
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World
COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan [email protected] What is a database? A database is a collection of logically related data for
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
The Quest for Extreme Scalability
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Understanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Querying MongoDB without programming using FUNQL
Querying MongoDB without programming using FUNQL FUNQL? Federated Unified Query Language What does this mean? Federated - Integrates different independent stand alone data sources into one coherent view
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace
Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Databases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3
MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3 Copyright 2012 Manning Publications For more information on this and other Manning titles go to www.manning.com brief contents PART
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013
Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options
CloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
How To Write A Nosql Database In Spring Data Project
Spring Data Modern Data Access for Enterprise Java Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger O'REILLY* Beijing Cambridge Farnham Koln Sebastopol Tokyo Table of Contents
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
Technologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
REAL-TIME BIG DATA ANALYTICS
www.leanxcale.com [email protected] REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
MySQL for Beginners Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database.
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
A Comparison of Approaches to Large-Scale Data Analysis
A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes
Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
Understanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
A Generic Database Web Service
A Generic Database Web Service Erdogan Dogdu TOBB Economics and Technology University Computer Engineering Department Ankara, Turkey [email protected] Yanchao Wang and Swetha Desetty Georgia State University
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Fig. 3. PostgreSQL subsystems
Development of a Parallel DBMS on the Basis of PostgreSQL C. S. Pan [email protected] South Ural State University Abstract. The paper describes the architecture and the design of PargreSQL parallel database
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
NOSQL VS RDBMS - WHY THERE IS ROOM FOR BOTH
NOSQL VS RDBMS - WHY THERE IS ROOM FOR BOTH Cory Nance [email protected] Travis Losser [email protected] Reenu Iype [email protected] Gary Harmon [email protected]
BIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
Generic Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment
Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Subject NoSQL Databases - MongoDB Submission Date 20.11.2013 Due Date 26.12.2013 Programming Environment
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
MapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside [email protected] About Journey Dynamics Founded in 2006 to develop software technology to address the issues
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.
SQL Databases Course by Applied Technology Research Center. 23 September 2015 This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. Oracle Topics This Oracle Database: SQL
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
