Testing Big data is one of the biggest

Size: px
Start display at page:

Download "Testing Big data is one of the biggest"

Transcription

1 Infosys Labs Briefings VOL 11 NO Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing a structured testing technique Testing Big data is one of the biggest challenges faced by organizations because of lack of knowledge on what to test and how much data to test. Organizations have been facing challenges in defining the test strategies for structured and unstructured data validation, setting up an optimal test environment, working with non-relational databases and performing non-functional testing. These challenges are causing in poor quality of data in production and delayed implementation and increase in cost. Robust testing approach need to be defined for validating structured and unstructured data and start testing early to identify possible defects early in the implementation life cycle and to reduce the overall cost and time to market. Different testing types like functional and non-functional testing are required along with strong test data and test environment management to ensure that the data from varied sources is processed error free and is of good quality to perform analysis. Functional testing activities like validation of map reduce process, structured and unstructured data validation, data storage validation are important to ensure that the data is correct and is of good quality. Apart from functional validations other nonfunctional testing like performance and failover testing plays a key role to ensure the whole process is scalable and is happening within specified SLA. Big data implementation deals with writing complex Pig, Hive programs and running these jobs using Hadoop map reduce framework on huge volumes of data across different nodes. Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers. Hadoop uses Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. Hadoop utilizes its own distributed file system, HDFS, which makes data available to multiple computing nodes. Figure 1 shows the step by step process on how Big data is processed using Hadoop ecosystem. First step loading source data into 65

2 1 Loading Source 2 Perform Map data files into HDFS Reduce operations 3 Extract the output results from HDFS Figure 1: Big Data Testing Focus Areas Source: Infosys Research HDFS involves in extracting the data from different source systems and loading into HDFS. Data is extracted using crawl jobs for web data, tools like sqoop for transactional data and then loaded into HDFS by splitting into multiple files. Once this step is completed second step perform map reduce operations involves in processing the input files and applying map and reduce operations to get a desired output. Last setup extract the output results from HDFS involves in extracting the data output generated out of second step and loading into downstream systems which can be enterprise data warehouse for generating analytical reports or any of the transactional systems for further processing BIG DATA TESTING APPROACH As we are dealing with huge data and executing on multiple nodes there are high chances of having bad data and data quality issues at each stage of the process. Data functional testing is performed to identify these data issues because of coding errors or node configuration errors. Testing should be performed at each of the three phases of Big data processing to ensure that data is getting processed without any errors. Functional Testing includes (i) validation of pre-hadoop processing; (ii), validation of Hadoop Map Reduce process data output; and (iii) validation of data extract, and load into EDW. Apart from these functional validations non-functional testing including performance testing and failover testing needs to be performed. Figure 2 shows a typical Big data architecture diagram and highlights the areas where testing should be focused. Validation of Pre-Hadoop Processing Data from various sources like weblogs, social network sites, call logs, transactional data etc., is extracted based on the requirements and loaded into HDFS before processing it further. Issues: Some of the issues which we face during this phase of the data moving from source 66

3 Big Data Analytics 1 Reporting using BI Tools 25% 25% 25% 25% Bar graph hadoop Pre-Hadoop process validation Web Logs 4 5 ReportsTesting Pig HBase (NoSQL DB) Big Data Testing Focus Areas Enterprise Data Warehouse HIVE Map Reduce (Job Execution) HDFS (Hadoop Distributed File System) Streaming Data Processed Data Data Load using Sqoop Social Data 3 2 ETL Process validation ETL Process Map-Reduce process validation Transactional Data (RDBMS) Non-FunctionalT esting (Performance, Fail over testing) 4 Figure 2: Big Data architecture Source: Infosys Research systems to Hadoop are incorrect data captured from source systems, incorrect storage of data, incomplete or incorrect replication. Validations: Some high level scenarios that need to be validated during this phase include: 1. Comparing input data file against source systems data to ensure the data is extracted correctly 2. Validating the data requirements and ensuring the right data is extracted, 3. Validating that the files are loaded into HDFS correctly, and Validation of Hadoop Map Reduce Process Once the data is loaded into HDFC Hadoop map-reduce process is run to process the data coming from different sources. Issues: Some issues that we face during this phase of the data processing are coding issues in map-reduce jobs, jobs working correctly when run in standalone node, but working incorrectly when run on multiple nodes, incorrect aggregations, node configurations, and incorrect output format. Validations: Some high level scenarios that need to be validated during this phase include: 4. Validating the input files are split, moved and replicated in different data nodes. 1. Validating that data processing is completed and output file is generated 67

4 2. Validating the business logic on standalone node and then validating after running against multiple nodes 3. Validating the data load in target system 4. Validating the aggregation of data 3. Validating the map reduce process to verify that key value pairs are generated correctly 4. Validating the aggregation and consolidation of data after reduce process 5. Validating the data integrity in the target system. Validation of Reports Analytical reports are generated using reporting tools by fetching the data from EDW or running queries on Hive. 5. Validating the output data against the source files and ensuring the data processing is completed correctly 6. Validating the output data file format and ensuring that the format is per the requirement. Issues: Some of the issues faced while generating reports are report definition not set as per the requirement, report data issues, layout and format issues. Validations: Some high level validations performed during this phase include: Validation of Data Extract, and Load into EDW Once map-reduce process is completed and data output files are generated, this processed data is moved to enterprise data warehouse or any other transactional systems depending on the requirement. Issues: Some issues that we face during this phase include incorrectly applied transformation rules, incorrect load of HDFS files into EDW and incomplete data extract from Hadoop HDFS. Validations: Some high level scenarios that need to be validated during this phase include: 1. Validating that transformation rules are applied correctly 2. Validating that there is no data corruption by comparing target table data against HDFS files data Reports Validation: Reports are tested after ETL/transformation workflows are executed for all the sources systems and the data is loaded into the DW tables. The metadata layer of the reporting tool provides an intuitive business view of data available for report authoring. Checks are performed by writing queries to verify whether the views are getting the exact data needed for the generation of the reports. Cube Testing: Cubes are testing to verify that dimension hierarchies with pre-aggregated values are calculated correctly and displayed in the report. Dashboard Testing: Dashboard testing consists of testing of individual web parts and reports placed in a dashboard. Testing would involve ensuring all objects are rendered properly and the resources on the webpage are current and latest. The data fetched from various web parts is validated against the databases. 68

5 VOLUME, VARIETY AND VELOCITY: HOW TO TEST? In the earlier sections we have seen step by step details on what need to be tested at each phase of the Big data processing. During these phases of Big data processing the three dimensions or characteristics of Big data i.e. volume, variety and velocity are validated to ensure there are no data quality defects and no performance issues. Volume: The amount of data created both inside corporations and outside the corporations via the web, mobile devices, IT infrastructure, and other sources is increasing exponentially each year [3]. Huge volume of data flows from multiple systems which need to be processed and analyzed. When it comes to validation it is a big challenge to ensure that whole data setup processed is correct. Manually validating the whole data is a tedious task. We should use compare scripts to validate the data. As data is stored in HDFS is in file format scripts can be written to compare two files and extract the differences using compare tools [4]. Even if we use compare tools it will take a lot of time to do 100% data comparison. To reduce the time for execution we can either run all the comparison scripts in parallel on multiple nodes just like how data is processed using Hadoop mapreduce process or sample the data ensuring maximum scenarios are covered. Figure 3 shows the approach on how voluminous amount of data is compared. Data is converted into expected result format and then compared using compare tools with actual data. This is a faster approach but involves initial scripting time. This approach will reduce further regression testing cycle time. When we don t have time to validate complete data, sampling can be done for validation. Variety: The variety of data types is increasing, namely unstructured text-based data and semistructured data like social media data, locationbased data, and log-file data. Structured Data is data which is in defined format which is coming from different RDBMS tables or from structured files. The data that is of transactional nature can be handled in files or tables for validation purpose. Map Reduce Jobs Testing Scripts to validate data in HDFS Unstructured (data) testing Structured data 1 SD Test 2 SD1 Test1 Unstructured to Structured Unstructured (data) testing Custom scripts to convert unstructured data to structured data Structured data testing Map Reduce Jobs run in test environment to generate the output Raw Data to Expected Results format Structured data testing Scripts to convert data to expected results data Expected Results Output Data Files Tool to compare the files Actual Results File by File Comparison Discrepancy Report Expected Results Figure 3: Approach for High Volume Data Validation Source: Infosys Research 69

6 Semi-structured data does not have any defined format but structure can be derived based on the multiple patterns of the data. Example of data is extracted by crawling through different websites for analysis purposes. For validation data need to be first transformed into structured format using custom built scripts. First the pattern need to be identified and then copy books or pattern outline need to be prepared, later this copy book need to be used in scripts to convert the incoming data into a structured format and then validations performed using compare tools. Unstructured data is the data that does not have any format and is stored in documents or web content, etc. Testing unstructured data is very complex and is time consuming. Automation can be achieved to some extent by converting the unstructured data into structured data using scripting like PIG scripting as showing in Figure 3. But the overall coverage using automation will be very less because of unexpected behavior of data; input data can be in any form and changes every time new test is performed. We need to deploy a business scenario validation strategy for unstructured data. In this strategy we need to identify different scenarios that can occur in our day to day unstructured data analysis and test data need to be setup based on test scenarios and executed. Velocity: The speed at which new data is being created and the need for real-time analytics to derive business value from it -- is increasing thanks to digitization of transactions, mobile computing and the sheer number of internet and mobile device users. Data speed needs to be considered when implementing any Big data appliance to overcome performance problems. Performance testing plays an important role to identify any performance bottleneck in the system and the system can handle high velocity streaming data. NON-FUNCTIONAL TESTING In the earlier sections we have seen how functional testing is performed at each phase of Big data processing, these tests are performed to identify functional coding issues, requirements issues. Performance testing and failover testing need to be performed to identify performance bottlenecks and to validate the non-functional requirements. Performance Testing: Any Big data project involves in processing huge volumes of structured and unstructured data and is processed across multiple nodes to complete the job in less time. At times because of bad architecture and poorly designed code, performance is degraded. If the performance is not meeting the SLA, the purpose of setting up Hadoop and other Big data technologies is lost. Hence, performance testing plays a key role in any Big data project due to huge volume of data and complex architecture. Some of the areas where performance issues can occur are imbalance in input splits, redundant shuffle and sorts, moving most of the aggregation computations to reduce process which can be done at map process. [5]. These performance issues can be eliminated by carefully designing the system architecture and doing performance test to identify the bottlenecks. Performance testing is conducted by setting up huge volume of data and an infrastructure similar to production. Utilities like Hadoop performance monitoring tool can be used to capture the performance metrics and identify the issues. Performance metrics like 70

7 job completion time, throughput, and system level metrics like memory utilization etc. are captured as part of performance testing. Failover Testing: Hadoop architecture consists of a name node and hundreds of data notes hosted on several server machines and each of them are connected. There are chances of node failure and some of the HDFS components become non-functional. Some of the failures can be name node failure, data node failure and network failure. HDFS architecture is designed to detect these failures and automatically recover to proceed with the processing. Failover testing is an important focus area in Big data implementations with the objective of validating the recovery process and to ensure the data processing happens seamlessly when switched to other data nodes. Some validations that need to be performed during failover testing are validating that checkpoints of edit logs and FsImage of name node are happening at a defined intervals, recovery of edit logs and FsImage files of name node, no data corruption because of the name node failure, data recovery when data node fails and validating that replication is initiated when one of data node fails or data become corrupted. Recovery Time Objective (RTO) and Recovery Point Objective (RPO) metrics are captured during failover testing. TEST ENVIRONMENT SETUP As Big data involves handling huge volume and processing across multiple nodes, setting up a test environment is the biggest challenge. Setting up the environment on cloud will give us the flexibility to setup and maintain it during test execution. Hosting the environment on the cloud will also help in optimizing the infrastructure and faster time to market. Key steps involved in setting up environment on cloud are [6]: A. Big data Test infrastructure requirement assessment 1. Assess the Big data processing requirements 2. Evaluate the number of data nodes required in QA environment 3. Understand the data privacy requirements to evaluate private or public cloud 4. Evaluate the software inventory required to be setup on cloud environment (Hadoop, File system to be used, No SQL DBs, etc). B. Big data Test infrastructure design 1. Document the high level cloud test infrastructure design (Disk space, RAM required for each node, etc.) 2. Identify the cloud infrastructure service provider 3. Document the SLAs, communication plan, maintenance plan, environment refresh plan 4. Document the data security plan 5. Document high level test strategy, testing release cycles, testing types, volume of data processed by Hadoop, third party tools required. 71

8 C. Big data Test Infrastructure Implementation and Maintenance Create a cloud instance of Big data test environment Install Hadoop, HDFS, MapReduce and other software as per the infrastructure design Perform a smoke test on the environment by processing a sample map reduce, Pig/Hive jobs functional and non-functional requirements. Applying right test strategies and following best practices will improve the testing quality which will help in identifying the defects early and reduce overall cost of the implementation. It is required that organizations invest in building skillset both in development and testing. Big data testing will be a specialized stream and testing team should be built with diverse skillset including coding, white-box testing skills and data analysis skills for them to perform a better job in identifying quality issues in data. Deploy the code to perform testing. BEST PRACTICES Data Quality: It is very important to establish the data quality requirements for different forms of data like traditional data sources, data from social media, data from sensors, etc. If the data quality is ascertained, the transformation logic alone can be tested, by executing tests against all possible data sets. Data Sampling: Data sampling gains significance in Big data implementation and it becomes the testers job to identify suitable sampling techniques that includes all critical business scenarios and the right test data set. Automation: Automate the test suites as much as possible. The Big data regression test suite will be used multiple times as the database will be periodically updated. Hence an automated regression test suite should be built to use it after reach release. This will save a lot of time during Big data validations. CONCLUSION Data quality challenges can be encountered by deploying a structured testing approach for both REFERENCES 1. Big data overview, Wikipedia.org at 2. White, T. (2010), Hadoop- The Definitive Guide 2nd Edition, O Reilly Media. 3. Kelly, J. (2012), Big data: Hadoop, Business Analytics and Beyond, A Big data Manifesto from the Wikibon Community. Available at wikibon.org/wiki/v/big_data:_ Hadoop,_Business_Analytics_and_ Beyond, Mar Informatica Enterprise Data Integration (1998), Data verification using File and Table compare utility for HDFS and Hive tool. Available at https://community. informatica.com/solutions/ Bhandarkar M. (2009), Practical Problem Solving with Hadoop, USENIX 09 annual technical conference, June Available at event/usenix09/training/tutonefile.html. 6. Naganathan, V. (2012), Increase Business Value with Cloud-based QA Environments, Available at infosys.com/it-services/independentvalidation-testing-services/pages/ cloud-based-qa-environments.aspx. 72

9 Author s Profile s MAHESH GUDIPATI is a Project Manager with the FSI business unit of Infosys. He can be reached at SHANTHI RAO is a Group Project Manager with the FSI business unit of Infosys. She can be contacted at NAJU D MOHAN is a Delivery Manager with the RCL business unit of Infosys. She can be contacted at NAVEEN KUMAR GAJJA is a Technical Architect with the FSI business unit of Infosys. He can be contacted at For information on obtaining additional copies, reprinting or translating articles, and all other correspondence, please contact: Infosys Limited, 2013 Infosys acknowledges the proprietary rights of the trademarks and product names of the other companies mentioned in this issue of Infosys Labs Briefings. The information provided in this document is intended for the sole use of the recipient and for educational purposes only. Infosys makes no express or implied warranties relating to the information contained in this document or to any derived results obtained by the recipient from the use of the information in the document. Infosys further does not guarantee the sequence, timeliness, accuracy or completeness of the information and will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions of, any of the information or in the transmission thereof, or for any damages arising there from. Opinions and forecasts constitute our judgment at the time of release and are subject to change without notice. This document does not contain information provided to us in confidence by our clients.

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Trustworthiness of Big Data

Trustworthiness of Big Data Trustworthiness of Big Data International Journal of Computer Applications (0975 8887) Akhil Mittal Technical Test Lead Infosys Limited ABSTRACT Big data refers to large datasets that are challenging to

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Cloud Integration and the Big Data Journey - Common Use-Case Patterns Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Data Domain Profiling and Data Masking for Hadoop

Data Domain Profiling and Data Masking for Hadoop Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

Oracle s Big Data solutions. Roger Wullschleger.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

CA Big Data Management: It s here, but what can it do for your business?

CA Big Data Management: It s here, but what can it do for your business? CA Big Data Management: It s here, but what can it do for your business? Mike Harer CA Technologies August 7, 2014 Session Number: 16256 Insert Custom Session QR if Desired. Test link: www.share.org Big

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics P.Saravana kumar 1, M.Athigopal 2, S.Vetrivel 3 Assistant Professor, Dept

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Data Challenges in Telecommunications Networks and a Big Data Solution

Data Challenges in Telecommunications Networks and a Big Data Solution Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

CIO Guide How to Use Hadoop with Your SAP Software Landscape

CIO Guide How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs

More information

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization A Potential Antidote for Big Data Growing Pains perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources

More information

Powerful Management of Financial Big Data

Powerful Management of Financial Big Data Powerful Management of Financial Big Data TickSmith s solutions are the first to apply the processing power, speed, and capacity of cutting-edge Big Data technology to financial data. We combine open source

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Performance Testing of Big Data Applications

Performance Testing of Big Data Applications Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies mbatterywala@impetus.co.in Shirish Bhale: Director of Engineering

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information