W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Size: px
Start display at page:

Download "W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract"

Transcription

1 W H I T E P A P E R Building your Big Data analytics strategy: Block-by-Block! Abstract In this white paper, Impetus discusses how you can handle Big Data problems. It talks about how analytics on Big Data is changing the way in which technology is evolving, as well as bringing in new challenges. It discusses best practices for addressing Big Data analytics concerns and how strategies can be created to cope up with these challenges. The white paper guides you on choosing the right strategy with the optimal technology stack to address Big Data analytics problems. Impetus Technologies, Inc. February

2 Table of Contents Introduction... 2 Building a Big Data strategy... 3 Big Data and the three Vs... 3 Volume... 3 Variety... 3 Velocity... 4 The Big Data Analytics lifecycle... 4 Key concerns in Big Data Analytics lifecycle... 4 Selecting the right Big Data Analytics strategy... 6 The Hadoop eco-system... 8 Putting it all together... 9 Conclusion Introduction With the number of software, Internet, and mobile users growing exponentially, there is a huge demand on software infrastructure to deal with the voluminous data being created, and that too, at high speed. Ever increasing Internet access bandwidth is allowing huge data sets to flow into the WWW. Just ten years ago, Gigabytes and Petabytes were terms meant only for academics and R&D experts. Now Gigabytes are eaten up by even small hand held devices. All of this has resulted in a unique problem. Not only do we have to deal with this very large data, we have to do it quickly, and do it in a manner that we gain business insights from it. 2

3 Building a Big Data strategy With almost all software strategies, the first step in building a Big Data plan is to gather the requirements correctly, so that the business problems can be thoroughly understood. This means that we identify and define what needs to be done and lay down the expectations from the solution. The next step is to assess and select the right strategy. This involves finding the right patterns and best practices to architect, design, and implement the relevant solution. Another important step is to determine the right tool-sets and/or technology stacks that will fulfill all the defined requirements as well as support the best practices. Finally, it is important to implement the chosen strategy and resolve the business problems at hand in a cost-effective manner. Big Data and the three Vs One of the major problems faced by data architects and stakeholders is ascertaining whether their problem is actually a Big Data issue. Unfortunately, there is no magical volume limit or an algorithm to help you decide when a data problem evolves into a big data problem. The usual trend is to define Big Data in terms of data volumes or sizes, Also, whenever regular RDBMS breakdown with excessive data, big data solutions appear as the right choices. However, a better way of classifying Big Data is by understanding the concept of the 3Vs model Variety, Volume, and Velocity of data. Volume Volume is simply the data size that we are capturing and is measured in bytes of data. While earlier, the Gigabyte was supposed to be Big, today, terms like Terabytes, Petabytes, and Exabytes are heard in the context of Big Data. Variety Variety means the different kinds of data that we are trying to capture. A simple example of variety can be a social web site capturing data from its own site, as well as drawing inputs from Twitter or Facebook using Google analytics and internally using data from other third party products. This will result in a number of data formats that may vary from text to audio to video to databases to log files to web services call, and so on. 3

4 Velocity The last V stands for velocity of data which means the speed at which the data is being captured. Again, using the earlier example, a feed from Twitter might be 10s of tweets being fired for a user, while some keyword-driven feeds can reach viral status, with thousands of tweets getting fired simultaneously. Therefore, when we are classifying a data problem as a Big Data problem, we need to consider all the three factors in the 3V model that is velocity, volume and variety. A simple volume problem might not be a Big Data problem at all, but even a marginally large data problem might get converted into a Big Data problem if velocity and variety are important parts of that issue. The Big Data Analytics lifecycle Every software product has a life cycle and same holds true for Big Data. This life cycle starts with the Creation of data. It can be created in multiple ways and have several formats. The second step after creation is Ingestion, where the data may undergo complex transformation, filtering or enrichment before it becomes suitable for the third stage, which is Analytics. Analytics may also call for some processing of data before it can be understood, to derive valuable insights from it. Visualization is the final aspect of understanding data Analytics, and hence is an important step. Key concerns in Big Data Analytics lifecycle There are underlying concerns and problems that that need to be address in every stage of the Big Data Lifecycle. The data is usually created as part of external systems like RDBMSs or server logs or audio/video streams or third party data sources. 4

5 Since the data volumes are huge, there is need to address concerns such as how to store the data, how to optimize and compress it in the data creation stage. We also need to monitor the data creation phase and take important decisions such as using Cloud-like elasticity, as well as data back up and disaster recovery strategies. The Ingestion phase has its own challenges, where various transformations and integrations play a major role. The data warehousing industry has been traditionally using ETL (extract, transformation, and load) techniques which help overcome most of the challenges related to the Ingestion phase. Here, some of the key decisions involve finding the right tools and technologies. The same holds good for the Analysis phase too, and suitable tools and technology decisions need to be taken. These decisions may involve addressing the classical build versus buy question, and assessing how existing investments can be re-used. Moreover, the data may have hidden trends and traits that are immensely useful. Statistical data mining, machine learning and NLP (natural language processing) are becoming essential parts of the Analytical phase today. The last and very critical phase is the packaging or presentation of Analytics. Here too, tools and technologies play a pivotal role and standardization is one of the key requirements. With so many new modes of data delivery available, the visualization for various channels also needs to be considered seriously. For example, it is important to understand whether a graphical view would be a better depiction or a classical tabular report, a better representation of the given problem. Similarly, mobile and other handheld devices may require a different representation. Addressing the 3Vs with the Big Data Analytics lifecycle Having understood the 3Vs as well as about how the foundation of the Big Data Analytics lifecycle has to be laid, it is important to see how these can be combined together to define and create a Big Data strategy. Organizations can begin by creating a matrix where it can capture answers to straightforward questions related to volume, variety and velocity of data against each phase of the Big Data life cycle. These questions can be as simple as how much, what type and at what rate. Once the relevant columns of the matrix are filled up, the matrix can be used as a foundation to create a strategy that can address Big Data problems. 5

6 Selecting the right Big Data Analytics strategy Impetus Technologies has been working with Big Data problems for many years now and has fashioned a master strategy that can address almost all the major Big Data issues and problems. Impetus has been using this strategy successfully for many of its customers, providing them with a failure-proof solution to their pertinent Big Data challenges. The fact is that an ideal Big Data Analytics solution needs to be able to scale easily to support the large data, which can be in Terabytes or Petabytes. The system should also be distributed across geographically unaware processors. It should be able to respond quickly to highly complex queries as well as support a wide variety of data types, including images and arbitrary data structures. The ideal analytical solution should be able to provide data scientists with all the necessary tools, using which they can explain the significance of data in a manner that is easily understood by others. An analytical solution should have the ability to incorporate machine learning, providing recommendations, and executing analytics on real-time incoming data such as logs, as well as providing domain specific canned reports. 6

7 It should also be able to handle data from heterogeneous sources, whether structured or unstructured, while providing a high rate for loading and analysis, as well as the ability to handle software or hardware failures. A Big Data Analytics strategy therefore involves creating a platform or a solution that covers all aspects of the Big Data lifecycle as well as manages the 3Vs variety, volume and velocity of data. The ideal solution for the strategy can be a platform that allows different kinds of data to be ingested. One of the ways of implementing such a solution is to utilize the Service Oriented Architecture (SOA) in the form of an extensible connector based mechanism. This connector mechanism can then allow new connectors to be added or modified, thereby making it possible to cater to new kinds of data sources efficiently and in a fool proof way. Another requirement, which is gradually gaining importance, is real-time analytics. The ideal solution should also facilitate complex, Real-time processing and transformation before the data is used for complex analytics. Complex event processing and rule engine integration is a related requirement and can be used to solve a variety of real world problems. Hence, the ideal solution should also provide CEP (Complex Event Processing) support. The analytical phase should enable easy data modeling and transformation, helping data scientists to derive the maximum value. Therefore, the solutions need to have user-friendly interfaces for data modeling as well as offer easy-touse configurable workflow management interfaces. And of course, the interaction with the existing visualization tools completes the entire life cycle. The solution must therefore allow easy integration with visualization tools, which will enable analytical data to be understood easily and also provide deeper insights into sparse or complex data sets. In order to create the ideal Big Data Analytics strategy and achieve the most optimum results, users will need to handpick the tools and technologies. They must also create a framework that uses a leading open source solution Apache Hadoop for solving Big Data problems. 7

8 The Hadoop eco-system Hadoop has certainly come a long way from its humble origin. It was initially introduced as a simple file system in the Apache Nutch project, a massive web crawler which needed a file system to store large volumes of data across the Internet. There are several tools and components that are an integral part of the Hadoop ecosystem. These tools and components are aligned with the Big Data Lifecycle and are serving different purposes for Creation, Ingestion, Analytics and Visualization. Sqoop, Flume or Chukwa allow users to procure the data to be ingested and place it in a Hadoop-based data warehouse. The Ingestion and Analytical phases may utilize Hive, PIG or programmatic processing, or workflow systems like Oozie for data transformation and enrichment. Apache Mahout can be used for a wide range of machine learning and data mining algorithms including clustering, classification, collaborative filtering and frequent pattern mining. These will also cover the advanced data analytics requirements in the Analytics phase. 8

9 Currently, Hadoop is the leader in the Open source Big Data technology world However, there are many other products and initiatives, both commercial and Open Source, that are foraying this space. There have been attempts and even some successes in running Hadoop or similar distributed processing technologies faster and also adding real-time processing support. MapR, DataRush, Hstreaming, HPCC, Platform computing, Datastax etc. are the examples of faster technologies that can serve as alternatives to Apache Hadoop. The major database and dataware house vendors like Oracle, IBM, Microsoft, HP and EMC have also jumped on to the Big Data bandwagon and come up with their own customized solutions which are usually categorized as MPP (Massively parallel processing) databases. NoSQL is another important Big Data Technology. While some call NoSQL No to SQL, Impetus prefers terming it Not only SQL, due the fact that slowly but surely, the gap between regular RDBMSs and the NoSQL world is getting reduced. There are other options, such as graphical databases like Neo4j, which can help users address Big Data issues emerging as part of exploding social media data. There are also faster versions of SQL databases such as VoltDB, which bring together the capabilities of RDBMS ACID with the power of Big Data. Hardware or appliance-based solutions also offer alternative solutions for Big Data problems. Putting it all together Now that we have the strategy, tools and technologies in place, it is all a matter for putting them together. Essentially, this is about using Hadoop as the Big Data Analytics solution. As explained earlier, Hadoop is an excellent Big Data technology that is slowly becoming the de-facto leader with the Open Source Big Data domain. There are multiple ways in which the power of Hadoop can be used or combined in the Analytical and Visualization phases of the Big Data Lifecycle. Impetus has been using Hadoop for cleaning/transforming the data into a structured form, and then loading the same into the RDBMS databases. Here, Hadoop capabilities are being harnessed to handle Ingestion and some part of the Analytical phase. On the other hand, some analytical processing is handled at the RDBMS level as the data sinks. It is now possible to use any existing visualization technique or tool from the rich world of RDBMS 9

10 visualization products. The Visualization phase can therefore be handled by existing toolsets. Hadoop can efficiently access the data between the RDBMS data sources and Hadoop systems through DBInputFormat and DBOutputFormat interfaces. Once the unstructured data is processed, it can be pushed to an RDBMS database, which can subsequently act as a data source for any BI solution. This approach provides the end-user with the flexibility of parallel processing with Hadoop and an SQL interface at the summarized data level. It is good when the summarized data is not big enough to pose a challenge for the RDBMS database being used. This solution is not as expensive as some of the other options. This approach is also suitable for the high touch queries where the user wants to perform real- time, ad hoc analytics as most of the RDBMS databases come with a comprehensive set of performance enhancement techniques. However, when the summarized data is very large, this approach might fail to deliver. Also, if batch analysis is the key requirement, then moving data to an RDBMS database could be a redundant activity. Take the instance of a scenario where the processed and summarized data, which in itself is very huge, is placed on the Hadoop system. Therefore, what can be done in a situation where there is need to use the summarized data for batch reporting without getting into the complications of moving the data out of the Hadoop system either to a MPP DW or an RDBMS? This can be done by using Hive as an interface for the data present on the Hadoop system. Hive provides a very promising interface for executing the SQLlike queries by converting them into MR (MapReduce) jobs. These MR Jobs are executed on the Hadoop clusters for the data that is itself present on Hadoop. This approach allows users to do batch and asynchronous analytics over the same data present in the Hadoop system. It is very cost-effective as it does not involve managing separate data sources, other than the existing Hadoop System. It also provides users with the flexibility to scale to any level with their summarized data. Today, several options are available in the market that allows the integration of Massively Parallel Processing Data warehouses (MPP DWs) with Hadoop. This is worth considering if you have a large amount of data even after applying summarization over it. 10

11 Using Hadoop for cleaning/transforming the data into a structured form allows users to load the data into any of the available options of MPP DWs. While the data is being uploaded, they can write User Defined Functions to perform database level analytics and then integrate the same with Business Intelligence (BI) solutions using ODBC/JDBC connectivity for end-user analytics and reporting. Also, using MPP Data Warehouses will allow users to deploy various performance enhancement techniques like index compression, materialized views, result set caching and I/O sharing. Alternatively, some of the MPP DWs may also provide users with a good framework that supports MR jobs executions within their own clusters at MPP levels providing them with second levels of parallel processing. This feature is really good for working with high touch queries and also provides an excellent framework for end user ad hoc analytics. However, the disadvantage of using this approach could be the cost involved. Most of the MPP DWs are expensive to acquire and some also require high-end servers for deployment, which could be expensive. Using this approach also calls for an expert team that has hands-on experience on MPP Data warehouse management and development. This could turn out to be a challenge in itself in today's rapidly changing technology space where Open Source technologies like Hadoop are getting widely accepted and adopted. 11

12 Conclusion In summary it can be said that an ideal Big Data strategy can lead users to create a platform or solution that covers all the aspects of the Big Data Lifecycle and manage these as well. Organizations are using the Hadoop ecosystem or a blend of alternate technologies, including FOSS and commercial technologies such as NoSQL DataRush HStreaming etc. to address Big Data problems today. There are three strategies involved in using Hadoop as the Big Data Analytics Solution. The first option is indirect analytics over Hadoop, which provides the flexibility of parallel processing of Hadoop and an SQL interface at the summarized data level. This solution is not very expensive when compared with other options. The second option is direct analytics over Hadoop, which allows you to perform batch and asynchronous analytics over the same data present over the Hadoop system. It is a very cost-effective approach as it does not involve any expense in managing the separate data sources. The third option is integrating MPP DWs with Hadoop when there is a large amount of data. This is an expensive option when compared with the two approaches discussed above. Impetus has successfully used the Hadoop ecosystem to create a comprehensive Big Data platform that provides the capabilities required to solve all concerns in the various stages of Big Data Lifecycle. About Impetus Impetus Technologies offers Product Engineering and Technology R&D services for software product development. With ongoing investments in research and application of emerging technology areas, innovative business models, and an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility Solutions, Test Engineering, Performance Engineering, and Social Media among others. Impetus Technologies, Inc Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA Tel: Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad To know more visit: Disclaimers The information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus Technologies Inc. 12

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

G-Cloud Big Data Suite Powered by Pivotal. December 2014. G-Cloud. service definitions

G-Cloud Big Data Suite Powered by Pivotal. December 2014. G-Cloud. service definitions G-Cloud Big Data Suite Powered by Pivotal December 2014 G-Cloud service definitions TABLE OF CONTENTS Service Overview... 3 Business Need... 6 Our Approach... 7 Service Management... 7 Vendor Accreditations/Awards...

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

WHITE PAPER. Four Key Pillars To A Big Data Management Solution WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization A Potential Antidote for Big Data Growing Pains perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved. Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Introduction to Big Data and the Lambda Architecture

Introduction to Big Data and the Lambda Architecture Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 1 What

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES [ Consumer goods, Data Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES QUICK FACTS Objectives Develop a unified data architecture for capturing Sony Computer Entertainment America s (SCEA)

More information

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

White Paper: Evaluating Big Data Analytical Capabilities For Government Use CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

Informatica and the Vibe Virtual Data Machine

Informatica and the Vibe Virtual Data Machine White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013 An Oracle White Paper October 2013 Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics Introduction: The value of analytics is so widely recognized today that all mid

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

CIO Guide How to Use Hadoop with Your SAP Software Landscape

CIO Guide How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

WINDOWS AZURE DATA MANAGEMENT

WINDOWS AZURE DATA MANAGEMENT David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

DATAOPT SOLUTIONS. What Is Big Data?

DATAOPT SOLUTIONS. What Is Big Data? DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big

More information

Integrate and Deliver Trusted Data and Enable Deep Insights

Integrate and Deliver Trusted Data and Enable Deep Insights SAP Technical Brief SAP s for Enterprise Information Management SAP Data Services Objectives Integrate and Deliver Trusted Data and Enable Deep Insights Provide a wide-ranging view of enterprise information

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Big data for the Masses The Unique Challenge of Big Data Integration

Big data for the Masses The Unique Challenge of Big Data Integration Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...

More information