INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)

Size: px
Start display at page:

Download "INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)"

Transcription

1 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 ISSN (Print) ISSN (Online) Volume 5, Issue 2, May - August (2014), pp IAEME: Journal Impact Factor (2014): (Calculated by GISI) IJITMIS I A E M E LIMITATIONS OF DATAWAREHOUSE PLATFORMS AND ASSESSMENT OF HADOOP AS AN ALTERNATIVE KULDEEP DESHPANDE 1, Dr. BHIMAPPA DESAI 2 1 (Ellicium Solutions, Pune, Maharashtra) 2 (Capgemini Consulting, Pune, Maharashtra) ABSTRACT Volume and complexity of data collected in datawarehouse systems is growing rapidly. This is posing challenges to traditional datawarehouse platforms. At the same time, Hadoop ecosystem has opened new avenues for implementing datawarehouse systems on Hadoop and overcome these challenges. In this paper we survey previous studies about limitations of traditional datawarehouse platforms. Opportunities offered by Hadoop for datawarehouse implementation are discussed. This paper can give a direction to future research in the areas of Datawarehouse implementation on Hadoop platform. Keywords: Datawarehouse, Hadoop, Hive, Analytical, ETL I. INTRODUCTION The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive [5]. Data collected from web logs, social media has become important component of analytical systems. At the same time, these data sources have resulted in added complexities for datawarehouses. Post 2005, richer set of analytical database management systems have been introduced [2]. However rate of growth of data volume and complexity is posing challenges to these analytical database systems also. Companies like Yahoo, Facebook have been using Hadoop for processing large datasets [5]. However in recent period, there has been an increased interest in evaluating Hadoop as a Datawarehouse platform. In this paper we study challenges to currently available datawarehouse platforms, opportunities opened up by Hadoop and explore areas that need research. 51

2 This work is organized as follows. In section 2 we conduct a meta-analysis of two surveys about issues with current datawarehouse platforms. We summarize top issues encountered by industry practitioners in this section. This is followed by overview of Hadoop and approaches for building datawarehouse on Hadoop. In section 4, we discuss various ways in which Hadoop can be used for building a datawarehouse. This includes using Hadoop for data archival, data staging and ETL processing. Hadoop is a new technology and has many limitations that need to be overcome to make it a complete datawarehouse platform. These shortcomings have been discussed in section 5. Finally in section 6, we discuss areas of research to make Hadoop a full-fledged datawarehouse platform. II. LIMITATIONS OF DATAWAREHOUSE PLATFORMS A datawarehouse platform is the most important component of the datawarehouse / analytical system. A datawarehouse platform is defined as a collection of hardware servers, an operating system, a database management system (DBMS) and data storage [1]. Different categories of datawarehouse platforms are as follows: 1. Traditional RDBMS databases 2. Column oriented databases 3. In memory databases 4. Software only appliances 5. Software and hardware appliances 6. Cloud based datawarehouse systems 7. Hadoop based datawarehouse platforms In order to define limitations of traditional datawarehouse platforms and challenges posed by growth in data volume and variety, we have referred two previous studies. In [1], a survey of 417 business and technical executives was conducted by The Datawarehousing Institute (TDWI). Objective of this survey was to help respondents understand options available for datawarehouse platforms. From the list it is clear that poor query response, lack of support for advanced analytics and inadequate load speed are critical challenges faced by the industry with existing datawarehouse platforms. In [2] a survey regarding usage of analytical platform has been conducted. This study surveyed 223 respondents regarding their satisfaction with traditional RDBMS as platform for Datawarehouse and plans for migration from RDBMS to analytical platforms. This study found that 75% respondents have migrated to analytical platforms or are in the process of migration. This survey asked respondents issues that lead to migration away from RDBMS. 52

3 Following table shows weighted average response of these 2 studies. Problem TABLE 1: Meta-Analysis of Datawarehouse platform surveys Survey 1 (2009) % of respondents 53 Survey 2 (2010) Weighted average response % of Problem respondents Query performance / response times 60.7% 50.5% Poor query response 45.0% Can t support advanced Need for complex analytics 40.0% analysis 61.3% 47.4% Inadequate data load speed 39.0% Load times 31.5% 36.4% Cost of scaling up is too expensive 33.0% Hardware growth / cost 22.0% 29.2% Poorly suited to real-time or on demand workloads 29.0% Need for on demand capacity 49.4% 36.1% Can t support large concurrent user count 20.0% Growth in number of concurrent users 38.1% 26.3% Inadequate high availability 19.0% Availability and fault tolerance 19.0% 19.0% Other 4.0% Other 6.0% 4.7% From above analysis we conclude that following are main challenges with existing datawarehouse platforms. 2.1 Poor query performance / response In both the above mentioned surveys, poor query response is mentioned as the most important challenge with existing datawarehouse platforms. SQL is the most common language used for analysis. Poor query response reflects slow execution of SQL. Over last 10 years various approaches like 64 bit computing, increasing memory, MPP systems, and columnar databases have been implemented to solve this challenge but still poor query response remains number one challenge for datawarehouses. 2.2 No support for advanced analytics Lack of advanced analytics capabilities is cited as an important challenge for datawarehouse platforms. However there is a debate over exact definition of the term. From various studies it can be concluded that support for various forms of predictive algorithms, statistical analysis and geographic visualization can be clubbed under advanced analytics. Traditional RDBMS based datawarehouse platforms generally do not support these advanced analytic functions using SQL. Majority of organizations perform this kind of advanced analytics outside RDBMS using hand coded platforms or tools [2]. 2.3 Slow data load speed Traditional datawarehouses are batch oriented and are loaded weekly / daily / multiple times a day. However trend is to integrate datawarehouse with transactional and operational applications such as fraud detection [1]. This is making traditional daily load oriented datawarehouse applications obsolete. Majority of large corporations have their datawarehouse load processes that run throughout the night and consume hours. With increased data volumes and complexity, data load times will keep on increasing and this challenge needs a solution.

4 2.4 High hardware cost Increased data volume, complexity of data and number of users results in need for adding hardware (disk space or processor) to the datawarehouse infrastructure to support the growth. Additional hardware also requires additional cooling, space, power and increased management [2]. As per survey [1], due to recession, 57% of respondents said that organizations have reduced the budget for Datawarehousing. Thus reduced cost of hardware and support per additional volume, number of users and complexity of analysis is an important requirement that datawarehouse platforms have to satisfy. 2.5 No support for on demand workload With increased dependency on data driven decisions, need for ad hoc, one time on demand analysis is increasing. Many times these on demand analysis requires analyzing very large volume of data, combining of datawarehouse data with external data and usage of archived data for analysis. All this requires Datawarehouse to scale up and make higher memory and capacity available to specific analysis. With traditional RDBMS based datawarehouse platforms, scaling up on demand requires adding costly hardware, adding memory, long times to obtain the hardware etc. Lack of ability to scale up on demand with minimal cost and ramp up time is a major challenge to existing datawarehouse platforms. III. HADOOP AS AN ALTERNATIVE DATAWAREHOUSE PLATFORM In this section we discuss fundamentals of Hadoop and early usage of Hadoop as a datawarehouse platform. What is Hadoop? The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets [14]. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. HDFS applications need a write-onceread-many access model for files. A file once created, written, and closed need not be changed. A computation requested by an application is much more efficient if it is executed near the data it operates on. HDFS provides interfaces for applications to move themselves closer to where the data is located [14]. MapReduce is a programming model on top of HDFS for processing and generating large data sets [13]. MapReduce for Datawarehouse MapReduce is programming models on top of HDFS for processing and generating large data sets which was developed as an abstraction of the map and reduce primitives present in many functional languages [10]. In [8] a datawarehouse framework developed at Turn Inc. has been discussed. This framework makes use of MapReduce for processing the data. Framework demonstrated in this paper can benefit from massively parallel execution of programs and scalability of MapReduce. Also ability of MapReduce to process non-relational data / unstructured data can add value to the Datawarehouse. Thus problems such as Text tokenization, indexing and search, data mining and machine learning, handling of high number of hops can be handled easily in MapReduce. In this paper a specialized data model for Hadoop based datawarehouse has been proposed. As per this design a virtual schema in 54

5 which fact and dimension tables are pre-joined is created. This schema abstracts joins of tables from users and simplifies the query language. Since the implementation of JOIN operator is complex in MapReduce, this data model simplifies the MapReduce program. This paper demonstrated how Cheetah framework can process data at 1GB / Sec. Hive One of the first usages of Hadoop for Datawarehouse was reported by Facebook team. Hive is an open source datawarehousing solution built on top of Hadoop. Hive was built in 2007 by Facebook team and was open sourced in 2008 [5]. MapReduce is programming extensive and not very suitable for end users to query and analyze data. Purpose of Hive was to bring the concepts of tables, columns and SQL to Hadoop and still retain flexibility of Hadoop [5]. In 2010, Facebook Hive implementation had tens of thousands of tables and contained 700 TB data [5]. It supported 200 users. Hive query language (HQL) is a subset of SQL. This makes it easy for SQL oriented users to analyze data in Hadoop. Advanced analysis in the form of MapReduce can be plugged into HiveQL. This enables complex analysis such as text mining, pattern matching etc to be done in MapReduce and results can be explored using SQL using Hive. Hive compiles SQL into MapReduce batch jobs. Thus Hive was designed for complex, batch oriented jobs and not for low latency queries [13]. Hive works well as a solution for complex datawarehouse analysis scenario but does not perform well for dashboard and real time analysis. Other examples of Hadoop based DW implementation In previous sections we discussed usage of Hadoop for data storage in datawarehouse applications where data volumes are very large or data is unstructured. Few other studies have reported usage of Hadoop for specific other use cases. Sensors have been widely applied to various fields. Time series data generated by sensors has large demand for data storage and analysis. Most of the time series datawarehouse solutions available currently use RDBMS systems. Usage of Hbase database which is a NoSQL database on Hadoop for time series datawarehouse is reported in [9]. This experimental study conducted stress test on 400 million time series records and concluded that Hbase has a good read performance for time series data. Usage of Hadoop for high performance datawarehouse and OLAP has been demonstrated in [6]. In this experiment, a Hadoop cluster consisting of 18 nodes with 36 cores was constructed. This experiment made use of MapReduce for cube construction. It also provided XMLA API so that standard BI tools can access cube data. All studies described so far have described usage of Hadoop for very large scale datawarehouses. However midsized organizations may also benefit from usage of Hadoop by reducing data storage costs for datawarehouse. In [13], an evaluation of Hadoop for small and medium sized datawarehouse systems has been performed. This study has compared MySQL, Hadoop + MapReduce and Hive for data sizes ranging from 200 MB to 10 GB. This study found that up to 1 GB data size, MySQL performs better than Hadoop and Hive. Between 1 to 2 GB, Hadoop + MapReduce outperforms Hive and MySQL. Beyond 2 GB, Hive outperforms Hadoop + MapReduce and MySQL. Work done in this study needs to be extended further. Firstly, this study did not run extensive analytic queries to confirm the conclusion. Secondly, low latency queries were not tested. Despite of these issues, this study opens new opportunities for small and medium industries to leverage Hadoop for low cost Datawarehouse. 55

6 IV. DIFFERENT WAYS IN WHICH HADOOP CAN BE USED FOR DATAWAREHOUSE In the previous section we discussed how Hadoop can be used as a data storage platform for Datawarehouse. There are other means of using Hadoop for Datawarehouse implementation. Lot of data processing happens in the data staging area to prepare the data to be loaded in the datawarehouse [7]. Size of staging area is many times that of datawarehouse. Usage of Hadoop has been reported in literature to take advantage of low cost, linear scalability, facility with file-based data, and ability to manage unstructured data [7]. By doing this Datawarehouse servers are utilized only for the purpose of loading cleaned data and for the purpose of end user access to data. Enormous amount of data is gathered over a period of time in the datawarehouse system. Not all of this data may be required for reporting and analytics. However legal requirements stipulate organizations to retain data beyond its usage life. Also in many cases source data has to be retained as is along with the cleaned datawarehouse data. Retaining older datawarehouse data and raw source data for long period can be expensive [7] and can put burden on datawarehouse server. Similar to datawarehouse staging, data archival can be migrated to Hadoop. Migration of ETL processing to Hadoop can achieve benefits of reduced cost as well as processing times. Cheap storage, massive scalability, ability to handle complex logic and manage unstructured data are reasons due to which Hadoop can be an ideal ETL platform. It is recommended to identify top 20% ETL workloads for migration to Hadoop to achieve maximum benefits [4]. Following ETLs qualify for migration to Hadoop [4]: Relatively high elapse processing times Very complex scripts like change data capture, joins and cursors. File processing and semi structured data ETLs causing high impact on resource utilization Unstable or error prone code Thus apart from using Hadoop as a datawarehouse storage platform, it can also be used for data archival, data staging and processing ETL programs. V. LIMITATIONS OF HADOOP TO BE A DATAWAREHOUSE PLATFORM Despite of advantages of Hadoop as a Datawarehouse platform stated in the previous section, it has certain limitations. Most of these limitations are features that Hadoop lacks compared to mature RDBMS. In this section we discuss limitations of Hadoop as a Datawarehouse platform. 1. Low latency data access and queries MapReduce is a batch oriented programming paradigm. Hence MapReduce is not best suited for real time and speedy queries. However newer query engines like Impala and Apache Drill are providing faster query processing capabilities to data stored in Hadoop [16]. Usage of Hadoop as a Datawarehouse platform in future will depend to a great extent on maturity of these query engines and how fast these query engines acquire RDBMS functionality. 56

7 2. Inserts and updates Hadoop does not support ACID compliant insert and update queries. Even Hive does not currently support insert or update queries [5]. This makes it difficult to use Hadoop for dimensional tables in datawarehouse that require updates for slowly changing dimension tables. 3. Granular security Row level or field level security like a RDBMS is absent in Hadoop [7]. Only basic checks like file permission checks are present in Hadoop [7]. 4. SQL based analytics Any mature datawarehouse solution consists of end users writing complex SQLs for data analysis. Hadoop based databases like Hive and Impala have limited support for ANSI standard SQL [7]. Hive does not support correlated sub queries which are commonly used in most traditional warehouse queries. However databases like IBM BigSQL and GreenPlum HAWQ are Hadoop based databases and aim to support ANSI SQL. Ability of Hadoop based query engines to match ANSI SQL capabilities will speed up usage of Hadoop as datawarehouse platform. VI. DIRECTIONS FOR FUTURE RESEARCH As discussed in previous section, certain areas need further research to make Hadoop a viable alternative platform for datawarehouse implementations. In this direction, following research is necessary: 1. Adoption of Hadoop for datawarehouse implementations will depend to a great extent on maturity of SQL engines on Hadoop and compliance of these SQL engines to ANSI SQL. Further research is required to qualify which ANSI SQL features are lacking in Hive (or Impala) that will make these SQL engines mature datawarehouse platform. 2. Suitability of Hadoop for managing large datasets is well established. [13] Has discussed suitability of Hadoop for mid and small sized datasets. This aspect of suitability to small implementations needs to be further explored and detailed guidelines need to be developed to analyze suitability of Hadoop for small datasets. This will make Hadoop as a more feasible alternative for traditional DW platforms for smaller organizations. 3. Application of traditional dimensional or ER modeling techniques for datawarehouses on Hadoop needs to be studied. If these approaches are found unsuitable, alternative modeling methodology needs to be developed for modeling datawarehouse on Hadoop. 4. Various studies have proposed usage of Hadoop for processing ETL for transformations such as lookup, joins etc. However detailed benchmarks on when can these transformations benefit from Hadoop based implementation need to be developed. 5. A set of comprehensive guidelines / framework needs to be developed for evaluating whether a datawarehouse will benefit from Hadoop. VII. CONCLUSION In this paper we analyzed shortcomings of traditional datawarehouse platforms. This paper analyzed two surveys and conducted meta-analysis to report problems with current datawarehouse problems. A survey of various experiments on Hadoop based datawarehouse was reported in this paper. 57

8 Main objective of this paper is to analyze possibility of using Hadoop as datawarehouse platform and areas of research that will make Hadoop as a strong alternative to traditional datawarehouse platforms. As reported in section 7, further research is required to make SQL engines on Hadoop more mature. Also there is a need of a comprehensive framework to judge datawarehouses that will benefit from implementation on Hadoop platform. VIII. REFERENCES [1] Philip Russom, Next generation Datawarehouse platforms, The Datawarehousing Institute, 2009 [2] Merv Adrian and Colin White,Analytic Platforms: Beyond the Traditional Data Warehouse,BeyeNETWORK Custom Research Report, 2010 [3] Philip Russom, Analytic Databases for Big Data, The Datawarehousing Institute, 2012 [4] Offload your Datawarehouse with Hadoop, Syncsort publication, 2014 [5] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu and Raghotham Murthy, Hive A Petabyte Scale Data Warehouse Using Hadoop, IEEE, [6] Jinguo You1, Jianqing Xi1, Chuan Zhang1, Gengqi Guo1, HDW: A High Performance Large Scale Data Warehouse, Computer and Computational Sciences IMSCCS '08, [7] Philip Russom, Evolving Data Warehouse Architectures, The Datawarehousing Institute, 2014 [8] Songting Chen, Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce, 36th International Conference on Very Large Data Bases, 2010 [9] Wen-Yuan Ku, Tien-Yin Chou, Lan-Kun Chung, The Cloud-Based Sensor Data Warehouse, International Symposium on Grids & Clouds, [10] T.K.Das and Arati Mohapatro, A Study on Big Data Integration with Data Warehouse, International Journal of Computer Trends and Technology (IJCTT) volume 9, number 4, Mar [11] Charles Loboz, Slawek Smyl, Suman Nath, DataGarage: Warehousing Massive Performance Data on Commodity Servers, 36th International Conference on Very Large Data Bases, [12] Sanjeev Khatiwada, Architectural Issues in Real-time Business Intelligence, 2012 [13] Marissa Rae Hollingsworth, Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study, Boise State University, [14] Dhruba Borthakur, 2007, The Hadoop Distributed File System: Architecture and Design, Apache foundation. [15] Donald Feinberg, DBMS Infrastructure for the Modern Data Warehouse, Business Intelligence Summit, [16] [17] Kuldeep Deshpande and Dr. Bhimappa Desai, A Critical Study of Requirement Gathering and Testing Techniques for Datawarehousing, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 5, Issue 1, 2014, pp , ISSN Print: , ISSN Online:

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in. by shatha muhi CS 6030 1 q Big Data: collections of large datasets (huge volume, high velocity, and variety of data). q Apache Hadoop framework emerged to solve big data management and processing challenges.

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Enhancing Massive Data Analytics with the Hadoop Ecosystem

Enhancing Massive Data Analytics with the Hadoop Ecosystem www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

CREDIT CARD DATA PROCESSING AND E-STATEMENT GENERATION WITH USE OF HADOOP

CREDIT CARD DATA PROCESSING AND E-STATEMENT GENERATION WITH USE OF HADOOP CREDIT CARD DATA PROCESSING AND E-STATEMENT GENERATION WITH USE OF HADOOP Ashvini A.Mali 1, N. Z. Tarapore 2 1 Research Scholar, Department of Computer Engineering, Vishwakarma Institute of Technology,

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

A SURVEY ON HADOOP TECHNOLOGY TO DEVELOP ETL FOR EFFICIENT DATAWAREHOUSE

A SURVEY ON HADOOP TECHNOLOGY TO DEVELOP ETL FOR EFFICIENT DATAWAREHOUSE INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 6405(Print),

More information

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014 Hadoop and Hive Introduction,Installation and Usage Saatvik Shah Data Analytics for Educational Data May 23, 2014 Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 1 / 15

More information

Alternatives to HIVE SQL in Hadoop File Structure

Alternatives to HIVE SQL in Hadoop File Structure Alternatives to HIVE SQL in Hadoop File Structure Ms. Arpana Chaturvedi, Ms. Poonam Verma ABSTRACT Trends face ups and lows.in the present scenario the social networking sites have been in the vogue. The

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

Play with Big Data on the Shoulders of Open Source

Play with Big Data on the Shoulders of Open Source OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19

More information

Il mondo dei DB Cambia : Tecnologie e opportunita`

Il mondo dei DB Cambia : Tecnologie e opportunita` Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

A Study on Big Data Integration with Data Warehouse

A Study on Big Data Integration with Data Warehouse A Study on Big Data Integration with Data Warehouse T.K.Das 1 and Arati Mohapatro 2 1 (School of Information Technology & Engineering, VIT University, Vellore,India) 2 (Department of Computer Science,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

A Survey on Big Data Analytical Tools

A Survey on Big Data Analytical Tools A Survey on Big Data Analytical Tools Mr. Mahesh G Huddar Sr. Lecturer Dept. of Computer Science and Engineering Hirasugar Institute of Technology, Nidasoshi, Karnataka, India Manjula M Ramannavar Asst.

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

Data Migration from Grid to Cloud Computing

Data Migration from Grid to Cloud Computing Appl. Math. Inf. Sci. 7, No. 1, 399-406 (2013) 399 Applied Mathematics & Information Sciences An International Journal Data Migration from Grid to Cloud Computing Wei Chen 1, Kuo-Cheng Yin 1, Don-Lin Yang

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Parallel Data Warehouse

Parallel Data Warehouse MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Trustworthiness of Big Data

Trustworthiness of Big Data Trustworthiness of Big Data International Journal of Computer Applications (0975 8887) Akhil Mittal Technical Test Lead Infosys Limited ABSTRACT Big data refers to large datasets that are challenging to

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Big Data and Your Data Warehouse Philip Russom

Big Data and Your Data Warehouse Philip Russom Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,

More information

Evolving Data Warehouse Architectures

Evolving Data Warehouse Architectures Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014 TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Can Drive the Business and IT to Evolve and Adapt Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1

www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1 Welcome to Today s Web Seminar! March 15, 2011 12:00PM ET Sponsored by: Hosted by: Eric Kavanagh is the host of DM Radio and Information Management's Webcasts. He is a veteran journalist and consultant

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

CIO Guide How to Use Hadoop with Your SAP Software Landscape

CIO Guide How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases INDUS / AXIOMINE Adopting Hadoop In the Enterprise Typical Enterprise Use Cases. Contents Executive Overview... 2 Introduction... 2 Traditional Data Processing Pipeline... 3 ETL is prevalent Large Scale

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information