ACHIEVING BUSINESS VALUE WITH BIG DATA. By W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved

Size: px
Start display at page:

Download "ACHIEVING BUSINESS VALUE WITH BIG DATA. By W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved"

Transcription

1 ACHIEVING BUSINESS VALUE WITH BIG DATA By W H Inmon

2 First there were Hollerith punched cards. Then there were magnetic tape files. Then there was disk storage followed by parallel disk storage. With each advance in storage technology came new capabilities speed, capacity, direct access of data - and a drop in the unit cost of storage. Now there is Big Data, where we can store even greater volumes of data at even lower cost per unit of storage. Big Data has opened the door to the collection of volumes of data and types of data that previously have gone unnoticed in the corporation. Now organizations can afford to capture and store data like never before. But with the advent of Big Data comes a related issue - how do we get business value out of the data collected into Big Data? Fig 1 shows that ultimately business value needs to be derived from Big Data in order for Big Data to be adopted and used by organizations. Big Data Business value Big Data has several characteristics that mark it a differently from other types of data. Some of those characteristics are - The unit cost of storage of Big Data is low (as compared to other forms of storage) - Big Data is collected and stored in an unstructured fashion - Big Data is accessed and managed in an indirect manner that does not lend itself to high performance transaction processing - The volumes of data stored in Big Data are orders of magnitude greater than that that was possible with other technologies. There are undoubtedly other characteristics of Big Data that set it apart from earlier storage technologies. So what are organizations technicians and business managers focusing on today as Big Data becomes an accepted part of the technical landscape? Fig 2 shows some of the aspects of Big Data that people are focusing on

3 Hive Hadoop Pig Map Reduce Cirro Hbase In Fig 2 are found such topics as Mongo, Cirro, PIG, Hadoop, Hive, HBase, MapReduce and other aspects and features of Big Data. These and other technological aspects are part of the technological landscape of Big Data and each of these topics are of interest and are deserving of study. But these are technologies and as interesting as they are do not directly address the topic of deriving business value out of Big Data. Another way to understand the phenomena of Big Data and deriving business value is shown by Fig 3. Big Data Unstructured Data Business Value Fig 3 makes the point that in order to get to business value out of Big Data that you have to go through another entirely different technological barrier, and that barrier is making sense of the unstructured data raw text - that comes with Big Data. The technology of Big Data is one challenge to be conquered, but once you have met the Big Data challenge, you are automatically confronted with the challenge of making sense of unstructured data, because all data in Big Data is unstructured. It is noted that languages that access Big Data NoSQL certainly have a place but any access language - by itself - does not address the fundamental challenges of making sense of unstructured data. Instead unstructured data needs to pass through a separate transformation process before the unstructured data found in Big Data can be used to achieve business value. Stated differently, in order to achieve business value you have to address two very different technological challenges the technology of Big Data and the transformations that come with unstructured data. In a way Big Data merely introduces unstructured text. Once introduced, unstructured text then must be transformed in order to achieve business value. TRANSFORMATIONAL CHALLENGES

4 In order to illustrate the challenges of transformation of unstructured data, there are several important issues that arise with trying to make sense of unstructured data. Some of those many issues of transformation are - The need for context - The need for interpretation - The need for simple editing - The need for certain forms of standardization - The need for complex editing - The need for acronym resolution, and so forth. This list merely scratches the surface of the many and diverse transformations that need to be done to unstructured data in order to unlock the business value of unstructured data. For further explanation of many of these transforms refer to the web site However, this list is a good starting point to explain why raw text cannot be used to achieve business value by itself. CONTEXT The first and most obvious shortcoming of raw text is that it needs context in order to be used to achieve business value. The raw text found in Big Data has either no context or very limited context. Contrast the raw text found in Big Data with the classical records of data typically found in a dbms. In a classical dbms there are records. A record is made up of keys and attributes. The attributes found in the dbms provide context to the data contained in the dbms. As a simple example, suppose that in Big Data we find the string of text -.Joe Foster.. Fig 4 shows the unstructured string of text that contains the name Joe Foster....Joe Foster... What is the meaning of Joe Foster in Big Data? Is Joe a customer? A policeman? A preacher? A race car driver? We just don t know much about Joe Foster when Joe is found in a raw unstructured string of data. But suppose we have a dbms record, and in the dbms record data is recorded as Customer number Customer name Joe Foster Customer address 635 Adrian Lane, Tuscaloosa, Alabama..

5 Once we find the record for Joe Foster in a dbms, the attribute metadata tells us a lot about Joe Foster. In a dbms there is context about Joe that allows us to use Joe accurately and succinctly in making business decisions in the computer. So one of the fundamental differences between unstructured strings of text found in Big Data and the records of text found in a dbms is the lack of context that exists in a string of unstructured data. Now consider the types of query that can be done on a string of unstructured text. In a query against the unstructured string we can ask does the string Joe Foster exist? Fig 5 shows we can ask if Joe Foster is found in Big Data. Simple query Can you find Joe Foster?...Joe Foster......Joe Foster... The query found in Fig 5 is a simple query. We can look through huge amounts of data in Big Data and find out if Joe Foster is found there. But because the data is all unstructured text we can t find out much else. We can t find out if Joe is a policeman, a soldier, a customer or a newly born baby. Because there is no (or very little context) found in unstructured text, the types of query that we can do against unstructured strings of data is very limited. And where the type of query is limited, the business value is limited. In order to do a more sophisticated query we need context to be able to understand the data that we operate on. The need for context in the process of query can be stated very simply as seen in Fig 6. simple query sophisticated query raw text raw text+ context Indeed, trying to make business decisions on text that does not have context can be anything but positive. As a simple example of the importance of context, consider two men that are standing on a street corner and a young lady passes by. One man says to the other She s hot. What is meant by the expression She s hot? Is the young lady attractive and one of the men wishes that he could have a date with the lady? Does she s hot mean that the lady is attractive? Or are the men on a street corner in Houston, Texas in July and the temperature is 99 degrees and the humidity is 100%. The lady is sweating from every pore in her body and she is physically very hot. Or has the young lady been just given a traffic ticket and she is mad? Does she s hot mean that the lady is irritated? Or are there even more interpretations of what is meant by she s hot?

6 Unless we know the context of what is being said, we can make a very wrong and embarrassing business decision. Not understanding context leads to incorrect interpretation, and incorrect interpretation leads to wrong assumptions. You cannot possibly make good business decisions when you do not understand interpretation and underlying assumptions of raw text. Furthermore, the need to understand the context of information is hardly limited to the words she s hot. ALL words ALL raw text - are open to interpretation and need context in order to be understood properly. And Big Data is nothing but raw text. The problem is that when raw text is found in the unstructured strings that are found in Big Data there is little or no context that comes with the raw text. The lack of context of raw text GREATLY limits the sophistication of the queries that can be done against unstructured data. Only the most basic of queries can be done as long as the data being queried has no context. And there is little business value in being able to make only unsophisticated queries. In order to achieve business value you need to be able to make sophisticated queries. INTERPRETATION AND COMPONENT CONTEXTUALIZATION As important as context is in making sophisticated queries of data, context is not the only element that needs to be considered. Another equally important element of making sophisticated queries is the ability to interpret raw text. Suppose that the following string of data is found in raw text f450-dnv-chi lybag cno po ag This kind of string might be found in a log record. What kind of query can be done against this text? A simple query can be done to find if the string f450 is present. And indeed Big Data can conduct such a simple search. But what business value is there is finding the value f450? The answer is not much business value here. In order to unlock the business value of the raw text, two activities need to be performed against the string f450-dnv-chi lybag cno po ag The first activity is that the raw string must be interpreted. Then after it is interpreted it needs to be component contextualized. As an example of interpretation the string can be interpreted to mean f450-dnv-chi lybag cno po ag December 15, 2012 flight 450 Frontier Airlines Denver to Chicago domestic flight lost yellow bag claim number po agent Ann Maeda. Then, after interpretation of the raw text is done, component contextualization parses the string and creates a context identification that might look like date December 15, 2012, airline Frontier, flight number 450, flight type domestic, action item lost item, item type bag, bag description yellow, flight source Denver, flight destination Chicago, claim number po , agent number 12356, agent name Ann Maeda. Once the component contextualization is done and is described to the system holding the data, all sorts of analytical process can be done against the raw string of data. Queries such as - How many bags are lost on domestic flights? - How many incidents occur in the month of December? - How many times has agent Ann Maeda had to track lost bags? Lost yellow bags?

7 - Of bags that are lost, how many are yellow? And so forth. Once interpretation and component contextualization are done, then sophisticated queries can be done. But until the raw text is properly interpreted, no sophisticated analytical processing can occur. And as long as no sophisticated analysis can occur, only limited business value can be attained. Once again, there is very little business value making queries that are simple. In order to achieve business value, you need to be able to create sophisticated queries. And in order to create sophisticated queries, you need to transform the raw text. SIMPLE STANDARDIZATION Contextualization and interpretation are absolutely essential in order to take raw unstructured text (like that found in Big Data) and turn it into a form that can be analyzed in a sophisticated manner. But these are hardly the only activities that need to be done to the raw text found in unstructured data (i.e., Big Data) to transform the unstructured data into a form that yields business value. Another simple but important activity that must be done in the transformation of unstructured textual information is that of standardization of certain types of data. As an example of the need for standardization, date values need to be standardized. Suppose that the following unstructured strings of data are found in Big Data March 13, , 2012/27/01 and.12/31/ To the readers eye it is obvious that these unstructured strings of text contain date values. But can the computer do comparison processing against these dates? Indeed does the computer even understand that these are date values? The problem is that these dates are in different formats. Trying to get the computer to do a comparison against these unstructured forms of date will yield unpredictable and unuseful results. To the computer these strings just look like raw text and are not particularly meaningful. In order to be useful, the unstructured form of dates must be converted into a standardized format. In this case the following transformation of date must be done March 13, 1999 == date /27/01 == date /31/2010 == date Once the raw text is recognized as dates, once the dates have their values recognized, and once the dates are converted into a common format, then the dates can be meaningfully compared to each other. But until a common standardized format is achieved, no meaningful comparison of dates can be done. And unless a comparison of dates can be achieved, limited business value can be achieved from the raw unstructured text.

8 Once again, until the raw text passes through a fundamental transformation, the queries that are done against it are not very sophisticated. And as long as queries are not sophisticated, there is limited business value. Until queries are sophisticated, achieving business value is not able to be done. OTHER TRANSFORMS Certainly contextualization, interpretation, component contextualization and standardization are absolutely necessary to be done in order to turn raw text into a form that can be used for sophisticated analytical processing. And in order to achieve business value it is necessary to do sophisticated analysis. But these transforms are hardly the only type of transforms that are needed in order to turn unstructured text into a form of data that can be analyzed in a sophisticated manner. Indeed, as important as these transforms are, they barely scratch the surface for what is needed in order to turn raw text into useful information from which business value can be derived. In order to see a discussion of more transforms, refer to Textual ETL as described on the web site With a complete and mature set of transforms, it is seen that raw text can be turned into a form of text that can yield business value, as depicted by Fig 7. Big Data textual transformation Business value AN ARCHITECTURAL RENDERING From an architectural perspective, how then does the business derive value from Big Data? Fig 8 shows that raw text is placed into Big Data, as the first step. raw text Big Data After raw text is placed into Big Data, one possibility is to use textual ETL in order to transform the raw text into a transformed state which is then placed into a standard dbms. Fig 9 shows this possibility.

9 raw text Textual ETL Standard dbms Big Data The raw text is transformed into a form that is able to support sophisticated queries and analytical processing. The transformation is done by textual ETL. However the output from textual ETL does not have to go into a standard dbms. An alternative is to send the output i.e., the transformed text - back into Big Data. By sending the output of textual ETL back into Big Data, whole new query possibilities are opened up. Now it is possible to use NoSQL for sophisticated query of data. Fig 10 shows the possibilities for different query types when textual ETL is used and where the output from Textual ETL is placed in either standard dbms or Big Data. raw text Textual ETL simple query Standard dbms sophisticated query Big Data transformed text Business value sophisticated query The Inmon/Krishnan Big Data Architecture copyright Forest Rim Technology, 2012, all rights reserved Business value In Fig 10, simple queries can be made on raw text found in Big Data or sophisticated queries can be made from standard dbms or from transformed text found in Big Data that has arrived via textual ETL. And once the queries are made against transformed text, then those queries can be very sophisticated. And once sophisticated queries can be run against the data that originates from Big Data, then business value is easily achievable. INTEGRATING BIG DATA AND CORPORATE DATA Another way to look at the value of Textual ETL is to consider the questions how is it possible to combine the data found in Big Data and the existing organizational environment. With current technology there are analytical tools for Big Data and there are analytical tools for the existing corporate environment. However for all practical purposes the different environments are about as different as night and day. However with textual ETL there is a way to integrate and incorporate data.

10 Big Data Textual ETL Corporate systems In Fig 11 it is seen that with Textual ETL data can be exchanged and integrated freely between the Big Data environment and the Corporate Systems environment. Bill Inmon the father of data warehousing works for Forest Rim Technology, a company dedicated to the technology for the management and usage of unstructured text. Forest Rim Technology is located in Castle Rock, Co. Forest Rim has a web site forestrimtech.com.

DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH

DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH By W H Inmon and Deborah Arline First there was data warehouse. Then came Big Data. Some of the proponents of Big Data have made the proclamation When

More information

The growth of computing can be measured in two ways growth in what is termed structured systems and growth in what is termed unstructured systems.

The growth of computing can be measured in two ways growth in what is termed structured systems and growth in what is termed unstructured systems. The world of computing has grown from a small, unsophisticated world in the early 1960 s to a world today of massive size and sophistication. Nearly every person on the globe in one way or the other is

More information

ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION. By W H Inmon

ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION. By W H Inmon ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION By W H Inmon With the rising costs of medicine and the advent of an aging population, there has never been a better time

More information

DATA WAREHOUSING IN THE HEALTHCARE ENVIRONMENT. By W H Inmon

DATA WAREHOUSING IN THE HEALTHCARE ENVIRONMENT. By W H Inmon DATA WAREHOUSING IN THE HEALTHCARE ENVIRONMENT By W H Inmon For years organizations had unintegrated data. With unintegrated data there was a lot of pain. No one could look across the information of the

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

PARALLEL PROCESSING AND THE DATA WAREHOUSE

PARALLEL PROCESSING AND THE DATA WAREHOUSE PARALLEL PROCESSING AND THE DATA WAREHOUSE BY W. H. Inmon One of the essences of the data warehouse environment is the accumulation of and the management of large amounts of data. Indeed, it is said that

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

ITG Software Engineering

ITG Software Engineering Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015 COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

REAL-TIME BIG DATA ANALYTICS

REAL-TIME BIG DATA ANALYTICS www.leanxcale.com info@leanxcale.com REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing Big Data Rethink Algos and Architecture Scott Marsh Manager R&D Personal Lines Auto Pricing Agenda History Map Reduce Algorithms History Google talks about their solutions to their problems Map Reduce:

More information

Move Data from Oracle to Hadoop and Gain New Business Insights

Move Data from Oracle to Hadoop and Gain New Business Insights Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides

More information

MapReduce With Columnar Storage

MapReduce With Columnar Storage SEMINAR: COLUMNAR DATABASES 1 MapReduce With Columnar Storage Peitsa Lähteenmäki Abstract The MapReduce programming paradigm has achieved more popularity over the last few years as an option to distributed

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data.

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data. EC Wise Report: Unlocking the Value of Deeply Unstructured Data Feedback from the Market: Forest Rim enables significant improvements in the quality of semantic information derived from text data. This

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig Introduction to Pig Agenda What is Pig? Key Features of Pig The Anatomy of Pig Pig on Hadoop Pig Philosophy Pig Latin Overview Pig Latin Statements Pig Latin: Identifiers Pig Latin: Comments Data Types

More information

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 Executive Summary Big Data projects have fascinated business executives with the promise of

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER INTELLIGENT BUSINESS STRATEGIES WHITE PAPER Improving Access to Data for Successful Business Intelligence Part 2: Supporting Multiple Analytical Workloads in a Changing Analytical Landscape By Mike Ferguson

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Can Drive the Business and IT to Evolve and Adapt Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING Inmon Consulting SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING Inmon Consulting PO Box 210 200 Wilcox Street Castle Rock, Colorado 303-681-6772 An Inmon Consulting White Paper By W H Inmon By

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

COMPUTER AND COMPUTERISED ACCOUNTING SYSTEM

COMPUTER AND COMPUTERISED ACCOUNTING SYSTEM MODULE - 2 Computer and Computerised Accounting System 12 COMPUTER AND COMPUTERISED ACCOUNTING SYSTEM With the expansion of business the number of transactions increased. The manual method of keeping and

More information

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

TRANSACTION DATA ENRICHMENT AS THE FIRST STEP ON THE BIG DATA JOURNEY

TRANSACTION DATA ENRICHMENT AS THE FIRST STEP ON THE BIG DATA JOURNEY TRANSACTION DATA ENRICHMENT AS THE FIRST STEP ON THE BIG DATA JOURNEY A key part of its industry-leading platform for digital financial services, the new Yodlee TransactionDataEnrichment solution enables

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12 Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language

More information

A Study on Big Data Integration with Data Warehouse

A Study on Big Data Integration with Data Warehouse A Study on Big Data Integration with Data Warehouse T.K.Das 1 and Arati Mohapatro 2 1 (School of Information Technology & Engineering, VIT University, Vellore,India) 2 (Department of Computer Science,

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

Generating the Business Value of Big Data:

Generating the Business Value of Big Data: Leveraging People, Processes, and Technology Generating the Business Value of Big Data: Analyzing Data to Make Better Decisions Authors: Rajesh Ramasubramanian, MBA, PMP, Program Manager, Catapult Technology

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

... Foreword... 17. ... Preface... 19

... Foreword... 17. ... Preface... 19 ... Foreword... 17... Preface... 19 PART I... SAP's Enterprise Information Management Strategy and Portfolio... 25 1... Introducing Enterprise Information Management... 27 1.1... Defining Enterprise Information

More information

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Introduction to Big Data the four V's

Introduction to Big Data the four V's Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

How To Create A Visual Analytics Tool

How To Create A Visual Analytics Tool W H I T E P A P E R Visual Analytics for the Masses 1 State of Visual Analytics Visual analytics, in the field of business intelligence, is the integration of data visualization and interactive visual

More information

A capacity planning / queueing theory primer or How far can you go on the back of an envelope? Elementary Tutorial CMG 87

A capacity planning / queueing theory primer or How far can you go on the back of an envelope? Elementary Tutorial CMG 87 A capacity planning / queueing theory primer or How far can you go on the back of an envelope? Elementary Tutorial CMG 87 Ethan D. Bolker Departments of Mathematics and Computer Science University of Massachusetts

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

OCR LEVEL 2 CAMBRIDGE TECHNICAL

OCR LEVEL 2 CAMBRIDGE TECHNICAL Cambridge TECHNICALS OCR LEVEL 2 CAMBRIDGE TECHNICAL CERTIFICATE/DIPLOMA IN IT UNDERSTANDING BIG DATA K/505/5383 LEVEL 2 UNIT 29 GUIDED LEARNING HOURS: 60 UNIT CREDIT VALUE: 10 Understanding Big Data K/505/5383

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Large Scale/Big Data Federation & Virtualization: A Case Study

Large Scale/Big Data Federation & Virtualization: A Case Study Large Scale/Big Data Federation & Virtualization: A Case Study Vamsi Chemitiganti, Chief Solution Architect Derrick Kittler, Senior Solution Architect Bill Kemp, Senior Solution Architect Red Hat 06.29.12

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Barriers. So what is Big Data?!

Barriers. So what is Big Data?! Barriers So what is Big Data?! Big Data is the modern scale at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something

More information

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst ESG Brief IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst Abstract: Many enterprise organizations claim that they already

More information

Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment. by Bill Inmon. INTEGRITY IN All Your INformation

Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment. by Bill Inmon. INTEGRITY IN All Your INformation INTEGRITY IN All Your INformation R TECHNOLOGY INCORPORATED Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment by Bill Inmon WPS.INM.E.399.1.e Introduction In a few short

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Big Data. Dr.Douglas Harris DECEMBER 12, 2013

Big Data. Dr.Douglas Harris DECEMBER 12, 2013 Dr.Douglas Harris DECEMBER 12, 2013 GOWTHAM REDDY Fall,2013 Table of Contents Computing history:... 2 Why Big Data and Why Now?... 3 Information Life-Cycle Management... 4 Goals... 5 Information Management

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in. by shatha muhi CS 6030 1 q Big Data: collections of large datasets (huge volume, high velocity, and variety of data). q Apache Hadoop framework emerged to solve big data management and processing challenges.

More information