DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH



Similar documents
ACHIEVING BUSINESS VALUE WITH BIG DATA. By W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved

ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION. By W H Inmon

TEXTUAL ETL THE COMPONENTS. A WHITE PAPER BY W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved

DATA WAREHOUSING IN THE HEALTHCARE ENVIRONMENT. By W H Inmon

Apache Hadoop Patterns of Use

Implementation of Model-View-Controller Architecture Pattern for Business Intelligence Architecture

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data.

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING

A Comparison of System Dynamics (SD) and Discrete Event Simulation (DES) Al Sweetser Overview.

Best Practices in Leveraging a Staging Area for SaaS-to-Enterprise Integration

The growth of computing can be measured in two ways growth in what is termed structured systems and growth in what is termed unstructured systems.

XML enabled databases. Non relational databases. Guido Rotondi

Traditional BI vs. Business Data Lake A comparison

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

IST722 Data Warehousing

Architecture Artifacts Vs Application Development Artifacts

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

Data Warehousing and Data Mining in Business Applications

Apache Hadoop: The Big Data Refinery

OLAP AND DATA WAREHOUSE BY W. H. Inmon

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

DATA MINING AND WAREHOUSING CONCEPTS

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon

A Design and implementation of a data warehouse for research administration universities

Big Data: Rethinking Text Visualization

Agile Business Intelligence Data Lake Architecture

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

BI, Analytics and Big Data A Modern-Day Perspective

The GOBIA Method: Towards Goal-Oriented Business Intelligence Architectures

Deriving Business Intelligence from Unstructured Data

LOVE IT, LOATHE IT OR LEVERAGE IT

A Data-Warehouse Architecture supporting Energy Management of Buildings

A Model-driven Approach to Predictive Non Functional Analysis of Component-based Systems

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

The Data & Analytics Opportunity. Mike Flannagan, Vice President, Data & Analytics June 8-9, 2015

Master Data Management Architecture

Business Rules and Business

Sistemi ICT per il Business Networking

INTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY

BBBT Podcast Transcript

TOTAL DATA INTEGRATION

Master Data Management and Data Warehousing. Zahra Mansoori

Dr. Pedro Basagoiti, Jose Frias Software AG Espana

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Automated Test Approach for Web Based Software

IN-MEMORY DATABASES, INDUSTRY KNOW-HOW, AND USABILITY: WHAT REALLY MATTERS IN SUPPLY CHAIN PLANNING

Big Data - Infrastructure Considerations

Making Data Work. Florida Department of Transportation October 24, 2014

Dashboards as a management tool to monitoring the strategy. Carlos González (IAT) 19th November 2014, Valencia (Spain)

A Case Study of Hadoop in Healthcare

Metadata Management for Data Warehouse Projects

Big Data Integration: A Buyer's Guide

A Knowledge Management Framework Using Business Intelligence Solutions

Big Data Discovery: Five Easy Steps to Value

9.3 Case study: University of Helsinki, Finland

Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment. by Bill Inmon. INTEGRITY IN All Your INformation

ANALYTICS STRATEGY: creating a roadmap for success

Computer Aided Call Handling: Front End of Dispatch

Business Intelligence and Decision Support Systems

Data Warehouse (DW) Maturity Assessment Questionnaire

Big Data for the Rest of Us Technical White Paper

Business Intelligence for Big Data

MORE CONTROL LESS RISK

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

January Fast-Tracking Data Warehousing & Business Intelligence Projects via Intelligent Data Modeling. Sponsored by:

Data Warehouse Automation A Decision Guide

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

A Survey on Data Warehouse Architecture

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Rich Traceability. by Jeremy Dick Quality Systems and Software Ltd. Abstract

14. Data Warehousing & Data Mining

OnX Big Data Reference Architecture

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

B.Sc (Computer Science) Database Management Systems UNIT-V

Primary Key Associates Limited

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Research into competency models in arts education

Big Data and Analytics

BIG DATA GOVERNANCE: BALANCING BIG DATA VELOCITY & INFORMATION GOVERNANCE

Formal Methods for Preserving Privacy for Big Data Extraction Software

An Introduction to Master Data Management (MDM)

Outline. What is Big data and where they come from? How we deal with Big data?

90% of your Big Data problem isn t Big Data.

Information Systems and Technologies in Organizations

Choosing the right enterprise resource

Data Warehouse: Introduction

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Top 10 Business Intelligence (BI) Requirements Analysis Questions

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Software Development Life Cycle & Process Models

Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain

Data Governance for Regulated Industries

Requirements in Functional IT Management

Transcription:

DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH By W H Inmon and Deborah Arline

First there was data warehouse. Then came Big Data. Some of the proponents of Big Data have made the proclamation When you have Big Data, you won t need a data warehouse, such was the enthusiasm for Big Data. Indeed there is much confusion and much misunderstanding of information with regard to Big Data and data warehouse. In this paper it will be seen that data warehouse and Big Data indeed are separate environments and that they are complementary to each other. This paper takes an architectural view. AN ARCHITECTURAL PERSPECTIVE In order to understand the complex and symbiotic relationship between data warehouse and Big Data it is necessary to have some foundational groundwork laid. Without the groundwork the final solution will not make much sense. The starting point is that data warehouse is an architecture and Big Data is a technology. And as is the case with all technologies and all architectures, there may be some overlap but a technology and an architecture are essentially different things. WHAT IS BIG DATA? A good starting point is what is Big Data? Fig 1 shows a representation of Big Data. In Fig 1 we see Big Data. So what is Big Data? Big Data is technology that is designed to - Accommodate very large almost unlimited amounts of storage - Use inexpensive storage for the housing of data - Manage the storage using the Roman census method - Store data in an unstructured manner There are other definitions of Big Data but for the purpose of this paper this will be our working definition. Big Data centers around a technological component known as Hadoop. Hadoop is technology that satisfies all these conditions of the definition of Big Data. Many vendors have built suites of tools surrounding Hadoop. Big Data then is a technology that is useful for the storage and management of large volumes of data.

WHAT IS A DATA WAREHOUSE? So what is a data warehouse? A data warehouse is a structure of data where there is a single version of the truth. The essence of a data warehouse is the integrity of data contained inside the data warehouse. When an executive wants information that can be believed and trusted, the executive turns to a data warehouse. Data warehouses contain detailed historical data. Data in a data warehouse is typically integrated, where the data comes from different sources. From the standpoint of a definition of what a data warehouse is, the definition of a data warehouse has been established from the very beginning. A data warehouse is a Subject oriented Integrated Non volatile Time variant collection of data In support of management s decisions. Fig 2 depicts a representation of a data warehouse. In order to achieve the integrity of data that is central to a data warehouse, a data warehouse typically has a carefully constructed infrastructure, where data is edited, calculated, tested, and transformed before it enters the data warehouse. Because data going into the data warehouse comes from multiple sources, data typically passes through a process known as ETL (extract/transform/load). OVERLAP From a foundational standpoint, how much overlap is there between a data warehouse and Big Data? The answer is that there is actually very little overlap between a data warehouse and Big Data. Fig 3 shows the overlap.

In Fig 3 it is seen that sometimes a data warehouse contains a reasonably large amount of data. And of course, Big Data can certainly accommodate a reasonably large amount of data. So there is some overlap between a data warehouse and Big Data. But the overlap between the two entities is remarkable in how little overlap there really is. Another way to look at the overlap between a data warehouse and Big Data is seen in Fig 4. data warehouse and no Big Data data warehouse and Big Data Big Data and no data warehouse Big Data and data warehouse Fig 4 shows that there is no necessary overlap between a data warehouse and Big Data. A data warehouse and Big Data are COMPLETELY, mutually exclusive of each other. A NON TRADITIONAL VIEW In order to understand how Big Data and a data warehouse interface, it is necessary to look at Big Data in a non traditional way. There are indeed many different ways that Big Data can be analyzed. The way suggested here is only one of many ways. One way that Big Data can be sub divided is in terms of data and non data. Fig 5 notionally shows this sub division. yes yes yes yes non Repetitive unstructured data is data that occurs very frequently and has the same structure and often times the same content. There are many examples or unstructured data. One example of unstructured data is the record of phone calls, where the length of the call, the date of the call, the caller and the callee are noted. Another example of unstructured data is metering data. Metering data is data that is gathered each week or month where there is a register of the activity or usage of energy at a particular location. In metering data there is a metered amount, an account number, and a date. And there are many, many occurrences of metering records. Another type of data is oil and gas exploration data. There are in fact many examples of Big Data. The other type of Big Data is non unstructured data. With non unstructured data there often are many records of data. But each record of data is unique in terms of structure and

content. If any two non unstructured records are similar in terms of structure and content, it is an accident. There are many forms of non unstructured data. There are emails, where one email is very different from the next email in the queue. Or there are call center records, where a customer interacts with an operator representing a company. There are telephone conversations, sales calls, litigation records, and many, many different types of non unstructured data. So Big Data can be divided into this classification of data unstructured data and non unstructured data. Admittedly there are many different ways to sub divide Big Data. But for the purpose of defining the relationship between a data warehouse and Big Data, this division is the one we will use. CONTEXT When dealing with data any data it is useful to consider the of the data. Indeed, using data where the is unknown is a dangerous thing to do. An important point to be made is that for unstructured data, identifying the of data comes very easily and naturally. Consider the diagram seen in Fig 6. content Fig 6 shows that there are many records in a Big Data environment. But the records are essentially records and the and meaning of each record is clear. That is because when it comes to unstructured data there records are essentially very records. Determining in a environment is a very easy and natural thing to do. Now consider in the non environment. There is plenty of to be found in the non unstructured environment. The problem is that is embedded in the document itself. The is found in a million different places and in a million different ways in the non unstructured environment. Sometimes is buried in the text of the document. Sometimes is inferred in the external characterization of the document. Sometimes is found in the words of the document itself. There are literally a million ways that is found in the non unstructured environment. In order to derive the inherent to a non unstructured document, it is necessary to use technology known as textual disambiguation (or textual ETL.) Fig 7 shows that textual disambiguation is used to derive from non unstructured data.

non textual disambiguation ANALYTIC PROCESSING So how is analytic processing done from Big Data? There are several ways that analytic processing can be done. One way is through technology. This approach is seen in Fig 8. non In Fig 8 it is seen that technology works well on unstructured Big Data. Simple technology works well where there is obvious and easily derived. The problem is that technology does not work well in the face of non unstructured data. In order for technology to work well, there must be and obvious of the data the processing is operating on. But it is possible to use textual disambiguation to derive from non unstructured data and then to replace the data back into the Big Data environment. In this case it is said that the Big Data environment has been enriched. Fig 9 shows this enrichment. non textual disambiguation In Fig 9 it is seen that non unstructured data is read and passed through textual disambiguation. Then the output is placed back into the Big Data environment but it is placed into Big

Data in a enriched state. After the data is placed back in Big Data in a enriched state, a tool can be used to analyze the data. THE DATA WAREHOUSE/Big Data INTERFACE The actual interface between data warehouse and Big Data is seen in Fig 10. direct raw Big Data distill unstructured data base classical data warehouse non enriched Big Data textual disambiguation of unstructured ualized data combined of enriched Big Data In Fig 10 it is seen that raw Big Data can be divided into data and non data, as has been discussed. Repetitive data can be directly analyzed or can be ed by a tool. Non data is accessed by textual disambiguation. When non data passes through textual disambiguation, the of the data is derived. Once the has been derived, the output can be placed either in a standard data base format or into en enriched Big Data environment. If data is

placed in a data base format, the data can be easily accessed and analyzed in conjunction with existing data warehouse data. In addition, data can be distilled and placed into a standard data base if desired. One interesting feature of this diagram is that the different kinds of that are done throughout the environment are quite different. The type of that is done is profoundly shaped by the data that is available for. Forest Rim Technology is located in Castle Rock, CO. Forest Rim Technology produces textual ETL, a technology that allows unstructured text to be disambiguated and placed into a standard data base where it can be analyzed. Forest Rim Technology was founded by Bill Inmon. Deborah Arline is. -