Healthcare, transportation,



Similar documents
Tapping the benefits of business analytics and optimization

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Overview. The Knowledge Refinery Provides Multiple Benefits:

Tap into Big Data at the Speed of Business

The Next Wave of Data Management. Is Big Data The New Normal?

BANKING ON CUSTOMER BEHAVIOR

SAP Thought Leadership Business Intelligence IMPLEMENTING BUSINESS INTELLIGENCE STANDARDS SAVE MONEY AND IMPROVE BUSINESS INSIGHT

Global Headquarters: 5 Speen Street Framingham, MA USA P F

ANALYTICS STRATEGY: creating a roadmap for success

Big Data and Healthcare Payers WHITE PAPER

Making Data Work. Florida Department of Transportation October 24, 2014

Big Data Integration: A Buyer's Guide

CDC UNIFIED PROCESS PRACTICES GUIDE

Data Refinery with Big Data Aspects

Accenture and Oracle: Leading the IoT Revolution

Making critical connections: predictive analytics in government

PRACTICAL USE CASES BPA-AS-A-SERVICE: The value of BPA

Digital Business Platform for SAP

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Implementing Topic Maps 4 Crucial Steps to Successful Enterprise Knowledge Management. Executive Summary

BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance

IBM Unstructured Data Identification and Management

Welcome to the Data Analytics Toolkit PowerPoint presentation on data governance. The complexity of healthcare delivery, the exploding demand for

An Enterprise Framework for Business Intelligence

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

Reaping the Rewards of Big Data

The Business Case for Using Big Data in Healthcare

SCALABLE ENTERPRISE BUSINESS INTELLIGENCE

Before You Buy: A Checklist for Evaluating Your Analytics Vendor

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Enterprise Information Management Capability Maturity Survey for Higher Education Institutions

The 3 questions to ask yourself about BIG DATA

SUSTAINING COMPETITIVE DIFFERENTIATION

Anatomy of a Decision

Empower loss prevention with strategic data analytics

white paper Big Data for Small Business Why small to medium enterprises need to know about Big Data and how to manage it Sponsored by:

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

BUSINESS RULES AND GAP ANALYSIS

How To Understand The Benefits Of Big Data

The Future of Business Analytics is Now! 2013 IBM Corporation

Predicting the future of predictive analytics. December 2013

Data Discovery, Analytics, and the Enterprise Data Hub

ICD-10 Advantages Require Advanced Analytics

INFO What are business processes? How are they related to information systems?

Getting started with a data quality program

EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT

Master big data to optimize the oil and gas lifecycle

Beyond the Data Lake

Big Data for Investment Research Management

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

IBM Analytics Make sense of your data

ERP. Key Initiative Overview

BIG DATA & DATA SCIENCE

Cognos e-applications Fast Time to Success. Immediate Business Results.

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Questionnaire on the European Data-Driven Economy

RC & CREATING DATA PRIVACY OPPORTUNITIES USING BIG IN EUROPE DATA AND ANALYTICS. risk compliance RISK & COMPLIANCE MAGAZINE.

Making Critical Connections: Predictive Analytics in Government

Business Architecture: a Key to Leading the Development of Business Capabilities

TopBraid Insight for Life Sciences

!!!!! White Paper. Understanding The Role of Data Governance To Support A Self-Service Environment. Sponsored by

Essential Elements of an IoT Core Platform

Healthcare Content Management: Achieving a New Vision of Interoperability and Patient-Centric Care

Next Generation Business Performance Management Solution

COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY WITH PRACTICAL OUTCOMES

Using Tableau Software with Hortonworks Data Platform

I D C T E C H N O L O G Y S P O T L I G H T

redesigning the data landscape to deliver true business intelligence Your business technologists. Powering progress

BUSINESS INTELLIGENCE: IT'S TIME TO TAKE PRIVATE EQUITY TO THE NEXT LEVEL. by John Stiffler

Executive summary. Table of contents. Four options, one right decision. White Paper Fitting your Business Intelligence solution to your enterprise

Fitting Your Business Intelligence Solution to Your Enterprise

Using business intelligence to drive performance through accuracy in insight

Smart Grid. System of Systems Architectures

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

THE MANAGEMENT OF INTELLECTUAL CAPITAL

ebook 4 Steps to Leveraging Supply Chain Data Integration for Actionable Business Intelligence

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

TRADITIONAL ERP ERP FOR ECOMMERCE?

July New Entrants: Charting the Health Industry s Risk and Regulatory Landscape Where Risk Meets Opportunity

How To Handle Big Data With A Data Scientist

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

ORACLE HEALTHCARE ANALYTICS DATA INTEGRATION

Designing a Customized E-learning Solution for a Worldwide IT Company

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Transcription:

Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental sustainability, and homeland security are but a few of society s grand challenges that look to information systems for efficient management and, more importantly, quality outcomes and solutions. Regardless of the specific challenge, underlying technologies and evolving user requirements continue to expand both data volume and variety. Data is coming from every imaginable source, often in real time, and the stakeholder demand for quality outcomes has never been higher. Given this perfect storm of expected outcomes, enabling technologies, and demanding stakeholders, enterprises face the daunting task of blazing a path that goes from raw data to quality outcomes. This task is made more complex with larger, faster, and more varied data, which doesn t automatically translate into more useful information. With exponential growth in data, enterprises must act to make the most of the vast data landscape to thoughtfully apply multiple technologies, carefully select key data for specific investigations, and innovatively tailor large integrated datasets to support specific queries and analyses. All these actions will flow from a data value chain a framework to manage data holistically from capture to decision making and to support a variety of stakeholders and their technologies. Defining a Data Value Chain Over a decade ago, Michael E. Porter introduced the value-chain concept, describing it as a series of activities that create and build value. Eventually, these activities culminate in total value, which the organization then delivers to its customers. 1 Figure 1 shows a proposed data value chain, which aims to manage and coordinate data across the service continuum from data generators to information consumers seeking to make decisions; form a collaborative partnership and coordinate data collection from various stakeholders and analyze that data to optimize service delivery and quality decisions; streamline data management activities to enable positive outcomes for all relevant stakeholders; and establish a portfolio-management approach to invest in people, processes, and technology that maximize the value of the combined data and inform decisions that enhance the organization s performance. These goals are accomplished through data discovery, integration, and exploitation. Data Discovery Before an organization can perform the analyses needed to support informed decision-making, it needs to know what data resources are available. Discovery includes not only inventorying data assets but also preparing and organizing these assets. Collect and Annotate The first link in the chain involves creating an inventory of available data sources and the metadata that describes the quality of those sources in terms of completeness, validity, consistency, timeliness, and accuracy. The emphasis is on turning unstructured data into structured data associated with valid metadata. Two techniques are suitable for data collection and annotation. The first is the Dublin Core, which 1520-9202/13/$31.00 2013 IEEE Published by the IEEE Computer Society computer.org/itpro 57

Smart IT Data discovery Data integration Data exploitation Collect and annotate Create an inventory of data sources and the metadata that describe them. Prepare Enable access to sources and set up access-control rules. Organize Integrate Analyze Visualize Identify syntax, structure, and semantics for each data source. Establish a common data representation of the data. Maintain data provenance. Analyze integrated data. Present analytic results to a decision maker as an interactive application that supports exploration and refinement. Make decisions Determine what actions (if any) to take on the basis of the interpreted results. Figure 1. The data value chain. The chain provides a framework with which to examine how to bring disparate data together in an organized fashion and create valuable information that can inform decision making at the enterprise level. focuses on supplementing metadata vocabulary terms with existing methods to describe, search, and index Web-based metadata. The second is the Department of Defense Discovery Metadata Specification, which focuses both on the process of developing a central taxonomy for metadata and defining a way to discover resources using that taxonomy. Historically, data standards have been haphazardly adopted to organize, represent, and encode information, which has prevented information sharing and data reuse later in the data value chain. Prepare The next task is to establish access to the data sources by copying them into a shared system and setting up access-control rules that is, security and privacy restrictions for data use. Massively parallel distributed storage systems, such as Hadoop Distributed File System, Big Table, and MongoDB, enable the storage of terabytes or more of data, regardless of structure. Tools for providing data access include representational state transfer, application programming interfaces, Web Services Description Language, and Open Database Connectivity/Java Database Connectivity. The extensible Access Control Markup Language provides a mechanism for specifying security and privacy policies. Languages for access-control policies have been around for decades, and role-based access control is well understood. Defining these roles across enterprise boundaries remains a challenge. Attribute-based access-control policies are less understood, but relevant standards are emerging. Standards for expressing and enforcing privacy policies are lacking. General-purpose tools enforcing privacy policies don t exist, and commercial packages tend to be tailored to specific environments. Organize The data source developer makes deliberate organizational choices about the data s syntax, structure, and semantics and makes that information available either from schemata or from a metadata repository. Either mechanism can provide the basis for tracking the shared semantics needed to organize the data before integrating it. Metadata repositories are commercially available, and numerous generic metamodels exist, many of which rely on Extensible Markup Language Metadata Interchange (XMI). However, because of XMI s generality, each tool provides customized extensions, which can lead to vendor lock, problems sharing schemata among participants, and other tool-interoperability issues. Analysts often skip formal data organization because they re more focused on their own data needs than on considering how to share data. However, sharing knowledge about internal data organization can enable more seamless integration with data providers environments (upstream) and data consumers environments (downstream). Data Integration The properly organized data is then ready to be combined into a common representation that suits a particular analysis. Each integration effort constitutes mappings that define how the data sources relate to the common representation. Metadata repositories need to be able to track these mappings to facilitate future analyses. Regardless of the particular representation as a community website or formal repository, such as a data warehouse combining disparate data sources delivers new, undiscovered information. Analysts can discover novel relationships between stakeholders or patterns that can point to abuses, such as fraud. Integration can be either virtual, such as through a federated model, 58 IT Pro January/February 2013

or physical, such as through a data warehouse. Traditional data federation technologies and emerging Semantic Web technologies support the integration and querying of combined data resources. Relational databases are suitable for most kinds of tabular data, while the Semantic Web is more compatible with nontabular, nonnumeric data, defined by rich networked relationships. Combining the two technologies will give data analysts a comprehensive toolkit for dataset exploration and for discovering the knowledge within integrated datasets. Data Exploitation Once the data has been gathered and integrated, an organization is ready to exploit it to make informed decisions. Decision makers rely on a combination of analyses that tease information out of the underlying data visualizations that convey those insights to the human. Analyze Integrated data sources are then ready for analysis, which includes maintaining the provenance between the input and results and maintaining metadata so that another analyst can recreate those results and strengthen their validity. Popular data analysis techniques, such as MapReduce, enable the creation of a programming model and associated implementations for processing and generating large datasets. This link, at the heart of the value chain, is perhaps the most mature in terms of available tools and techniques. Given this crowded marketplace, new offerings can more easily distinguish themselves not only on the basis of incremental analytic power, but also by providing strong integration with the links preceding and following analysis. By making it easier for analysts to access relevant data, for example, tool vendors provide differentiated value. These tools should also maintain the provenance among inputs and results so that other analysts can also understand and validate the results. Visualize Visualization involves presenting analytic results to decision makers as a static report or an interactive application that supports the exploration and refinement of results. The goal is to provide key stakeholders with meaningful information in a format that they can readily consume to make critical decisions. Industries, such as media and training, have a wealth of data visualization techniques, which others could adopt. Virtual and augmented realities, for example, enhance the user experience and make it easier to grasp information that s elusive in two-dimensional media. Although this technology has promising implications, virtual and augmented reality systems continue to be viewed as only suitable for training, education, and other highly customized uses. Make Decisions The final link of the data value chain is to determine what action is necessary given the visualized results. As supporting documentation, provenance information provides traceability to the original sources and their quality annotations, and the integration mappings and analysis metadata describe how analysts obtained the results. Key stakeholders can use the visualized results to change a negative behavior or reward a positive one. Understanding the underlying details of a particular problem and what contributes to that problem as well as what motivates the various constituents will inform stakeholders about required changes. Building on the existing analysis of data and incorporating additional data sources might reveal ways to more efficiently make decisions and take action. Data fragmentation is a significant obstacle to realizing value, and most data-driven enterprises don t give stakeholders any incentive to share their data. The proposed data value chain recognizes the relationship between stages, from raw data to decision making, and how these stages are interdependent. It s naïve to think that merely connecting data will reveal its wisdom. Low-quality data will not yield useful results, regardless of how clever the integration or query might be. Thus, the enterprise requires a plan that considers the entire continuum from the beginning of data collection to the final decisions based on that data. With more pressure to integrate data across the stakeholder continuum, the data value chain will support collaboration and data sharing and will provide structure and value even as the number and diversity of stakeholders grows. Furthermore, as stakeholders realize the benefits of data sharing, the entire enterprise and all stakeholders should begin to see improved operational quality and reduced costs. Reference 1. M.E. Porter, Competitive Strategy: Techniques for Analyzing Industries and Competitors, Free Press, 1998. H. Gilbert Miller is a member of IT Professional s advisory board and is corporate vice president and chief technology officer at Noblis. Contact him at hgmiller@ noblis.org. Peter Mork is a principal at Noblis. His experience includes information management, especially data integration, data discovery, data architecture, and information privacy. Contact him at peter.mork@noblis.org. computer.org/itpro 59