Report Data Management in the Cloud: Limitations and Opportunities

Size: px
Start display at page:

Download "Report Data Management in the Cloud: Limitations and Opportunities"

Transcription

1 Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1] and further present some open questions, points of criticism and recent research results. The report is structured in the following way: The sections 1-3 contain the content of the article itself plus some little additional comments of mine. 1 Section 4 lists some important questions, which the article left open and for which there were no solutions at the time the article was published. Section 5 concentrates on the point of criticism, that an untrusted host is not always a problem for OLTP applications and thus, unlike the article concludes, an OLTP application could be deployed in the cloud. Finally section 6 presents some new research results, which were published in the meantime: The NoDB approach [2] and Google's ready-to-use cloud products [3, 4]. 1 Introduction Daniel J. Abadi denes cloud computing as "a general shift of computer processing, storage, and software delivery away from the desktop and local servers, across the network, and into next generation data centers hosted by large infrastructure companies" 2. For many companies (especially for start-ups) the pay-as-you-go computing model a cloud provides, is a perfect match. Therefore Abadi's article [1] explores if also database management applications can be deployed in the cloud. 2 Data Management in the Cloud 2.1 Cloud Characteristics The goal of section 2 is to decide which data management applications can be deployed in the cloud. For this purpose Abadi rst presents the three most important characteristics of a cloud computing environment "Compute power is elastic, but only if workload is parallelizable" As already mentioned in the introduction, the major benet of cloud computing, is its elasticity. With the pay-as-you-go model a company can prevent both, an under utilization of the existing capacity (see gure 1(a)) and a lost revenue due to insucient capacity (see gure 1(b)), by always allocating as much capacity as needed (see gure 1(c)). But the desired behavior is only achievable, if the workload is parallelizable. That is due to the fact that allocating more capacity does not mean getting a better server but getting more nodes (e.g., Amazon EC2 instances). If the workload cannot be distributed among the nodes, new nodes will not be benecial. For example as one can easily see in gure 2, parallel reads are easy to implement, while parallel writes are a hard task. In general, one can say that shared-nothing architectures are the best to parallelize. 1 Please note that except of some restructuring the sections 1-3 are mainly a repetition of Daniel J. Abadi's work [1]. 2 Daniel J. Abadi's denition of cloud computing [1] 1

2 (a) Under utilization (b) Lost revenue (c) Pay-as-you-go cloud computing model Figure 1: Static capacity vs. the pay-as-you-go cloud computing model 3 (a) Parallel read (b) Parallel write Figure 2: Illustrations of parallel reads and writes "Data is stored at an untrusted host" If a company stores sensitive data, the company cannot exclude the possibility that the host company access data without permission and for example steals or sells sensitive data (e.g., credit card numbers) even if this scenario sounds very unlikely. Moreover, since the data have to be stored physically in any country, it is governed by the laws of this country. For example Abadi mentions in his article [1], that "the USA PATRIOT Act allows the US government to demand access to the data stored on any computer". For some companies these two points can be a problem "Data is replicated, often across large geographic distances" Since cloud providers often own data centers all over the world, they can provide the highest possible degree of fault tolerance, by automatically replicating the data across large geographic distances. For example data is stored very safely in respect to availability and durability, if it is stored at the same time in the USA, Europe and Australia. 2.2 Data Management Applications in the Cloud OLTP vs. OLAP After dening the most important cloud characteristics, the article checks if the two data management applications - transactional data managament (OLTP) and analytical data management (OLAP) - can be deployed in the cloud Transactional Data Management (OLTP) The common applications using transactional data management (OLTP) need ACID guarantees and further include many write operations. Since the requested data is typically distributed on several sites and thus transactions cannot be limited to access only data on one site, it would neccessitate "complex distibuted locking and commit protocols" to implement transactional data management systems with a shared-nothing architecture. 3 Source: Uni Basel, Departement of Mathematics and Computer Science, cs341 Distributed Information Systems (Fall Semester 2012) lecture slides, Chapter 7: Cloud Computing & NoSQL, slides 28-30, fileadmin/lectures/hs2012/cs341/slides/07-cs341-hs12-cloud_computing-nosql.pdf 2

3 For this reason, none of the 4 big players (Oracle, IBM DB2, Microsoft SQL Server and Sybase) has a shared-nothing transactional database. Furthermore it is hard to maintain ACID guarantees in the cloud. The CAP theorem shows, that one can only choose two out of three properties: consistency, availability and tolerance to partitions. Because partitions cannot be excluded, one always needs tolerance to partitions. Hence one typically decides to disregard the consistency (C from ACID), to gain a good availability. Moreover OLTP databases typically contain all the data, i.e, also the sensitive information such as credit card numbers. Hence, Abadi argues that it is an enormous risk to store transactional data on an untrusted host and that this risk is typically unacceptable and therefore transactional data cannot be stored at an untrusted host. Due to these observations, Abadi concludes that OLTP applications are not well-suited for cloud deployment Analytical Data Management (OLAP) Abadi argues in his article [1] that since the shared-nothing architecture scales the best and due to the huge amount of data scalability is very important for OLAP systems, this architecture is a good match. Furthermore the fact that the data analysis workloads tend to be read-only with only "infrequent writes" leads to two additional advantages: Firstly data analysis workloads are easy to parallelize across nodes (see section 2.1.3) and secondly there is no need for "complex distributed locking and commit protocols". Moreover since small inconsistencies are not problematic for analytical queries (e.g., computing the average customer age), the consistency trado (CAP theorem) is no problem for OLAP applications. Finally there are multiple possibilities to handle sensitive data for the analysis on an untrusted host. Abadi proposes in his article [1] that the sensitive data can be left out, anonymized or encrypted. Furthermore he suggests the possibility to store only aggregated data (e.g. averages, sums,...). Thus untrusted hosts can be used for storing analytical data. Because of these facts, Abadi concludes that OLAP applications, in contrast to OLTP applications, are well-suited for cloud deployment. 3 Data Analysis in the Cloud The rest of the article concentrates on how to perform data analysis (OLAP) in the cloud. Thereby the Abadi focuses in his article [1] on two classes of software solutions: "MapReduce-like software" and "commercially available shared-nothing parallel databases". 3.1 Cloud DBMS Wish List Before taking a closer look at MapReduce and shared-nothing parallel DBs the article [1] lists some properties that a good solution should provide: 1. Eciency: If one only pays for what one uses, the price increases linearly with the used resources. Hence one wants to use the most ecient OLAP software solution, because more ecient software is cheaper to use. 2. Fault Tolerance: Fault tolerance in terms of read-only (OLAP) queries means, that a query does not have to be restarted if a single node involved in the query fails. The problem is, that in a cloud, where the many involved nodes (e.g., Amazon EC2 instances) have a high failure rate (customer electronic), the probability of a failure on a single node during a long query is very high. Thus the system must be able to handle single failures without restarting the whole query. 3. Ability to run in a heterogeneous environment: Due to sometimes occurring hardware failures (e.g., a failing core) cloud computing nodes are unfortunately not as homogeneous as they should be. If the work is equally distributed to all nodes, the time to complete the query will be equal to the time the slowest node needs to complete its task. Because of this a system should have the ability to handle heterogeneous environments. 4. Ability to interface with business intelligence products: Since business analysts are typically no computer scientists there are many so called "business intelligence products" which helps them to generate queries and visualize results. If the database software wants to support these tools, it has to accept SQL queries over ODBC or JDBC connections. 3

4 (a) Fault tolerance (b) Heteregeneous Figure 3: MapReduce's ability to handle faults and slow nodes 5. Ability to operate on encrypted data: As already mentioned, a possibility to solve the untrusted host problem is storing only encrypted data. Abadi [1] argues that because providing the cloud application the possibility to encrypt the data would destroy the protection and transferring the data for encryption would be to bandwidth intensive, the system should be able to operate directly on encrypted data. 3.2 MapReduce vs. Shared-Nothing parallel DBs After presenting the desired properties, Abadi checks in his article [1] how good the two available solutions satisfy these properties: "MapReduce-like software" and "commercially available shared-nothing parallel databases". Although some people say, that comparing MapReduce to database systems is like "apples-to-oranges", I agree with Abadi's position that it is warranted, because in my opinion it is justiable to compare how two approaches solve the same problem Eciency Analytical queries perform much slower in MapReduce than in alternative systems like shared-nothing parallel DBs. Abadi argues in his article [1] that the reason for that is, that MapReduce was designed for working on unstructured data for which its "brute force scan strategy" is a good idea (e.g., creating web indexes). But in analytical data stores, where the data is structured, the shared-nothing parallel databases with their typical helper structures like indexes or dimensions outperform MapReduce. Some people say, that it is a feature that MapReduce does not have such helper structures because they need time be created when data is loaded, but usually the long-time benet outweighed these creation costs. I do not agree with Abadi's opinion that MapReduce's performance is a matter of debate, because what we wanted to check is only how MapReduce performs for analytical queries on large data stores and for this application MapReduce is very inecent Fault Tolerance While MapReduce is designed to be fault tolerant, the most parallel database systems are not. MapReduces can handle a single node failure by simply reassigning the data split (task) to a new worker node (see gure 3(a)). In contrast, shared-nothing parallel DBs are designed to run on special hardware, where failures are uncommon. Consequently they are not fault tolerant and restart a query if a single node fails Ability to run in a heterogeneous environment MapReduce can also handle heterogeneous environments with some slow nodes with nearly the same mechanism. For that purpose it simply has to reassign the split assigned to the slow worker to a second worker node if the most worker nodes already nished their tasks (see gure 3(b)). Conversely, sharednothing parallel DBs cannot handle heterogeneous nodes, because, like already mentioned, they are designed to run on special hardware. Due to this a single slow node can have a huge impact to the total query execution time. 4

5 Property MapReduce Shared-nothing parallel DB 1. Eciency 2. Fault Tolerance 3. Heterogeneous environment 4. Business intelligence products 5. Encrypted data Table 1: Overview which properties are fullled by the two software solutions Ability to interface with business intelligence products While in shared-nothing parallel DBs the ability to interface with business intelligence products comes for free, MapReduce is not SQL compatible and therefore it is not easy to use existing business intelligence products with MapReduce systems Ability to operate on encrypted data None of the both software solutions, has a native ability to operate directly on encrypted data. In MapReduce the only possibility is to provide user-dened code. Similarly, if more advanced operations than moving or copying encrypted data should be performed in shared-nothing parallel DBs, user-dened functions are required. 3.3 Conclusion A call for a hybrid solution As one can easily see in table 1, neither MapReduce nor shared-nothinig parallel DBs can fulll all properties. But except of the ability to operate on encrypted data, each property is fullled by one of the two solutions. Hence Abadi proposes that a hybrid solution would be the perfect solution. There is already some recent work done, which Abadi presents in his article [1], but regrettably the recent work only focuses on language and interface issues using SQL in MapReduce and using MapReduce functions in parallel databases. Finally Abadi presents in his article [1] two research questions and his ideas how to solve these. The rst question is, how to combine MapReduce's ability to directly work with the data and the performance increase through using helper data structures. His idea how to solve this problem is an incremantal algorithm which makes progress creating helper data structures each time the data is accessed. The second problem is that fault tolerance needs saving intermediate results and this costs performance. So the question is how to balance between fault tolerance and eciency. Abadi's idea to solve this problem is to build a system which autonomous self-adjusts the level of fault-tolerance based on the observed failure rate. 4 Open Questions Daniel J. Abadi's article [1] concentrates on these two questions: What can we do in the cloud? What solutions do we want for that? Although the article discusses and answers these two questions very detailed, there are still some open questions which have to be answered before one can deploy OLAP applications in the cloud. For example: How can we use the cloud today for data warehousing? Are there any useful products today we can use? How can we implement the hybrid solution? In this section, I will take a second look on the three proposed software solutions MapReduce, sharednothing parallel DBs and the hybrid solution and present some still unsolved problems. 5

6 Figure 4: MapReduce in the cloud 4.1 MapReduce Let us assume that we decided to run a MapReduce-like software in the cloud to support the OLAP applications. If one takes a look at gure 4 which illustrates this scenario, one can see that in this case we are faced with two questions. Firstly, we need many worker nodes to compute the map and the reduce step and another node to collecting the results. The question is, what kind of server instances (or other cloud products) should be used as nodes to gain the best performance. My suggestion would be to use an Amazon EC2 instance for each node, but the article presents no evaluations which could show that this would be a good or a bad decision. And even if this rst problem is solved, there is still the problem where to store the data in the cloud. There are reams of dierent cloud products to store data (e.g., Amazon S3), but the article does not provide any recommendation which to use. 4.2 Shared-nothing parallel databases In the second example scenario (illustrated in gure 5), we assume that our company currently owns many data warehouses and now wants to use only one giant shared-nothing parallel data warehouse in a cloud. So the rst question is, whether there is any existent shared-nothing parallel data warehouse product in any cloud we can use. If this is not the case we cannot solve the task unless our company wants to implement the product on its own. But even if we assume that there is such a product, we still have to solve the problem, how to integrate the data from the local data warehouses to the new cloud product. Additional to the typical schema integration problems, we are faced with another problem: Since typically the amount of data stored in data warehouses contain several petabytes, it is a big problem how to transfer the data from the local data warehouses to the cloud. Because with a common internet connection the integration task would take too long, there have to be another solution, which the article does not provide. 4.3 Hybrid solution As a conclusion of his article [1], Daniel J. Abadi proposes an hybrid as the perfect solution for data analysis in the cloud. Furthermore he presented some ideas how to solve the remaining research questions. Although the idea of having an incremental algorithm and an autonomous self-adjusting system sounds quite nice, Abadi does not mention if there are any sophisticated concepts implemented or at least presented yet. 5 Critique: "Untrusted hosts" are usable for OLTP Daniel J. Abadi argues in his article [1], that it is an enormous risk to store transactional data on an untrusted host, because OLTP data includes sensitive data. In the discussion after my workshop 6

7 Figure 5: Shared-Nothing parallel databases in the cloud and integration of existing data warehouses presentation we came to the conclusion, that this is not totally true. There are two possibilities for unauthorized data access in the cloud listed in the article. The rst is that the cloud provider itself steals or sells the data. As Abadi already mentioned in his article [1] this is very unlikely because in this case the cloud provider would eventually lose all its business customers. As a second risk Abadi argues that the USA PATRIOT Act gives the US government the right to access data on all computers located in the US and therefore also the right to access data stored in a cloud provider's data center in the US. In our discussion we gured out, that rstly it is not the idea of the US government to spy companies and secondly Abadi's point is simply not true. The USA PATRIOT Act only says, that internet providers have to disclose their data, i.e., the US government has the possibility to monitor and access data while the data is traveling through the internet. Hence if the sensitive data (e.g., credit card numbers) is only stored in the cloud and not transferred through the internet, no company has to fear that its sensitive data will be accessed without its knowledge. Since additionally writing a shared-nothing OLTP system is only hard but not impossible and other workshop topics presented solutions for the ACID problem, in my opinion it should be possible to deploy OLTP applications in the cloud. 6 Latest research results Since Daniel J. Abadi's article [1] was published in 2009, there were some new research results presented in the meantime. In this section I want to present the NoDB approach [2] as well as Google's ready-to-use cloud solutions [3, 4]. 6.1 NoDB In the NoDB article [2] they argue, that the major problem for applications, which have to handle giant amounts of data (e.g., social networks), is that in state-of-the-art OLAP database systems this giant amount of data has to be fully loaded and initialized before any data can be accessed (see also section 3.2.1). To handle this the NoDB approach [2] introduces the idea of "adaptive data loads" which is very similar to the incremental algorithm idea by Daniel J. Abadi [1]. In the article [2] they could furthermore show that their NoDB implementation (PostgresRaw) can compete with traditional DBMSs like PostgreSQL, i.e., the TPC-H performance was equivalent or faster. 6.2 Google's Cloud Solutions Google presented ready-to-use solutions for both OLTP and OLAP applications in its own cloud. Google BigQuery [3] is Google's OLAP solution. In Google BigQuery a customer can run analytical select queries in short time but he cannot run any update or delete queries. If a customer wants to have an OLTP database in the cloud, he can use Google Cloud SQL [4] instead. 4 4 This paragraph is only a short summary. For a more detailed comparison take a look at com/bigquery/docs/overview 7

8 References [1] D. J. Abadi, Data Management in the Cloud: Limitations and Opportunities, IEEE Data(base) Engineering Bulletin, vol. 32, pp. 312, [2] I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki, NoDB: ecient query execution on raw data les, pp , [3] Google BigQuery. December [4] Google Cloud SQL. December

Daniel J. Adabi. Workshop presentation by Lukas Probst

Daniel J. Adabi. Workshop presentation by Lukas Probst Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted

More information

Data Management in the Cloud. Zhen Shi

Data Management in the Cloud. Zhen Shi Data Management in the Cloud Zhen Shi Overview Introduction 3 characteristics of cloud computing 2 types of cloud data management application 2 types of cloud data management architecture Conclusion Introduction

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Ethopian Database Management system as a Cloud Service: Limitations and advantages

Ethopian Database Management system as a Cloud Service: Limitations and advantages IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 5 (Jul. - Aug. 2013), PP 34-38 Ethopian Database Management system as a Cloud Service: Limitations

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010 Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More

More information

5-Layered Architecture of Cloud Database Management System

5-Layered Architecture of Cloud Database Management System Available online at www.sciencedirect.com ScienceDirect AASRI Procedia 5 (2013 ) 194 199 2013 AASRI Conference on Parallel and Distributed Computing and Systems 5-Layered Architecture of Cloud Database

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

Big Data Database Revenue and Market Forecast, 2012-2017

Big Data Database Revenue and Market Forecast, 2012-2017 Wikibon.com - http://wikibon.com Big Data Database Revenue and Market Forecast, 2012-2017 by David Floyer - 13 February 2013 http://wikibon.com/big-data-database-revenue-and-market-forecast-2012-2017/

More information

Data Management in the Cloud: Limitations and Opportunities

Data Management in the Cloud: Limitations and Opportunities Data Management in the Cloud: Limitations and Opportunities Daniel J. Abadi Yale University New Haven, CT, USA dna@cs.yale.edu Abstract Recently the cloud computing paradigm has been receiving significant

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Big Data & Cloud Computing. Faysal Shaarani

Big Data & Cloud Computing. Faysal Shaarani Big Data & Cloud Computing Faysal Shaarani Agenda Business Trends in Data What is Big Data? Traditional Computing Vs. Cloud Computing Snowflake Architecture for the Cloud Business Trends in Data Critical

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Architecting Your Company. Ann Winblad Co-Founder and Managing Director

Architecting Your Company. Ann Winblad Co-Founder and Managing Director Architecting Your Company Ann Winblad Co-Founder and Managing Director 1990 Embedded Systems Intel A History of Defining Software Innovation 1991 BI/ OLAP Oracle 1995 App Server Sun Est. 1989 1996 Behavioral

More information

Cloud Database Emergence

Cloud Database Emergence Abstract RDBMS technology is favorable in software based organizations for more than three decades. The corporate organizations had been transformed over the years with respect to adoption of information

More information

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Big Data and Big Analytics

Big Data and Big Analytics Big Data and Big Analytics Introducing SciDB Open source, massively parallel DBMS and analytic platform Array data model (rather than SQL, Unstructured, XML, or triple-store) Extensible micro-kernel architecture

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...

More information

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES Deploying an elastic Data Fabric with caché Deploying an elastic Data Fabric with caché Executive Summary For twenty

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

QlikView Business Discovery Platform. Algol Consulting Srl

QlikView Business Discovery Platform. Algol Consulting Srl QlikView Business Discovery Platform Algol Consulting Srl Business Discovery Applications Application vs. Platform Application Designed to help people perform an activity Platform Provides infrastructure

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Driving Peak Performance. 2013 IBM Corporation

Driving Peak Performance. 2013 IBM Corporation Driving Peak Performance 1 Session 2: Driving Peak Performance Abstract We know you want the fastest performance possible for your deployments, and yet that relies on many choices across data storage,

More information

DATAOPT SOLUTIONS. What Is Big Data?

DATAOPT SOLUTIONS. What Is Big Data? DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Elastic Data Warehousing in the Cloud Is the sky really the limit?

Elastic Data Warehousing in the Cloud Is the sky really the limit? Elastic Data Warehousing in the Cloud Is the sky really the limit? By Kees van Gelder Faculty of exact sciences Vrije Universiteit Amsterdam, the Netherlands Index Abstract... 3 1. Introduction... 3 2.

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

SQL Maestro and the ELT Paradigm Shift

SQL Maestro and the ELT Paradigm Shift SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances

More information

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html Datacenters and Cloud Computing Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html What is Cloud Computing? A model for enabling ubiquitous, convenient, ondemand network

More information

Big Data on Cloud Computing- Security Issues

Big Data on Cloud Computing- Security Issues Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now

More information

Database Management System as a Cloud Service

Database Management System as a Cloud Service Database Management System as a Cloud Service Yvette E. Gelogo 1 and Sunguk Lee 2 * 1 Society of Science and Engineering Research Support, Korea vette_mis@yahoo.com 2 Research Institute of Industrial Science

More information

Cloud Data Management Big Data

Cloud Data Management Big Data Cloud Data Management Big Data Vera Goebel Fall 2015 1 Cloud Computing The vision On demand, reliable services provided over the Internet (the cloud ) with easy access to virtually infinite computing,

More information

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement

More information

GeoKettle: A powerful open source spatial ETL tool

GeoKettle: A powerful open source spatial ETL tool GeoKettle: A powerful open source spatial ETL tool FOSS4G 2010 Dr. Thierry Badard, CTO Spatialytics inc. Quebec, Canada tbadard@spatialytics.com Barcelona, Spain Sept 9th, 2010 What is GeoKettle? It is

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University Bussiness Intelligence and Data Warehouse Schedule Bussiness Intelligence (BI) BI tools Oracle vs. Microsoft Data warehouse History Tools Oracle vs. Others Discussion Business Intelligence (BI) Products

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

Why compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge!

Why compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge! Why compute in parallel? Introduction to Data Management CSE 344 Lectures 23 and 24 Parallel Databases Most processors have multiple cores Can run multiple jobs simultaneously Natural extension of txn

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

CIO Guide How to Use Hadoop with Your SAP Software Landscape

CIO Guide How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Il mondo dei DB Cambia : Tecnologie e opportunita`

Il mondo dei DB Cambia : Tecnologie e opportunita` Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service

More information

College of Engineering, Technology, and Computer Science

College of Engineering, Technology, and Computer Science College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

CloudDB: A Data Store for all Sizes in the Cloud

CloudDB: A Data Store for all Sizes in the Cloud CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Distributed Architecture of Oracle Database In-memory

Distributed Architecture of Oracle Database In-memory Distributed Architecture of Oracle Database In-memory Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information