Big Data 101: Harvest Real Value & Avoid Hollow Hype



Similar documents
Hadoop in the Hybrid Cloud

Building Successful Big Data Solutions

How To Handle Big Data With A Data Scientist

THE FUTURE OF CODING IS NOW

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Databricks. A Primer

Databricks. A Primer

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Big Data and the Data Lake. February 2015

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Turnkey Hardware, Software and Cash Flow / Operational Analytics Framework

Technical Management Strategic Capabilities Statement. Business Solutions for the Future

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

Cisco Data Preparation

Ten Things You Need to Know About Data Virtualization

Traditional BI vs. Business Data Lake A comparison

Data Mining for Successful Healthcare Organizations

The 2-Tier Business Intelligence Imperative

Customer Insight Appliance. Enabling retailers to understand and serve their customer

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data for Investment Research Management

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM

Information Architecture

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Advanced Analytics. The Way Forward for Businesses. Dr. Sujatha R Upadhyaya

/ WHITEPAPER / THE BIMODAL IT

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief

CitusDB Architecture for Real-Time Big Data

Why your business decisions still rely more on gut feel than data driven insights.

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

Big Data for the Rest of Us Technical White Paper

Learn How to Leverage System z in Your Cloud

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Accenture and SAP: Delivering Visual Data Discovery Solutions for Agility and Trust at Scale

Adopting Site Quality Management to Optimize Risk-Based Monitoring

The Purview Solution Integration With Splunk

Elastic Private Clouds

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Big Data at Cloud Scale

Organizational Intelligence, Scalability, and Agility

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

The Future of Business Analytics is Now! 2013 IBM Corporation

Ironfan Your Foundation for Flexible Big Data Infrastructure

SQL Server 2012 Performance White Paper

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

SUSTAINING COMPETITIVE DIFFERENTIATION

Understanding the Value of In-Memory in the IT Landscape

Extending Legacy Applications to Consume Web Services. OpenSpan White Paper Series: Extending Legacy Applications to Consume Web Services

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Test Data Management Concepts

Whitepaper: 7 Steps to Developing a Cloud Security Plan

Mitel Professional Services Catalog for Contact Center JULY 2015 SWEDEN, DENMARK, FINLAND AND BALTICS RELEASE 1.0

Internet of Things. Opportunity Challenges Solutions

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

Predicting From the Edge in an

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

ElegantJ BI. White Paper. The Enterprise Option Reporting Tools vs. Business Intelligence

IBM's Fraud and Abuse, Analytics and Management Solution

Secure Data Transmission Solutions for the Management and Control of Big Data

Big Data Integration: A Buyer's Guide

Role of Analytics in Infrastructure Management

Vinay Parisa 1, Biswajit Mohapatra 2 ;

Advanced In-Database Analytics

Four Things You Must Do Before Migrating Archive Data to the Cloud

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Data Virtualization A Potential Antidote for Big Data Growing Pains

HOW TO DO A SMART DATA PROJECT

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Welcome to. Business Intelligence 101

Cray: Enabling Real-Time Discovery in Big Data

Presented By: Leah R. Smith, PMP. Ju ly, 2 011

Advanced Big Data Analytics with R and Hadoop

Deploying an Operational Data Store Designed for Big Data

We are Big Data A Sonian Whitepaper

All-in-one, Integrated HIM Workflow Solution

Informatica Application Information Lifecycle Management

Transcription:

Big Data 101: Harvest Real Value & Avoid Hollow Hype

2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on information. It is equally probable that you are struggling with the challenges of crafting and perhaps even executing a strategy to capitalize on big data opportunities. As recently as 2000, only 25 percent of the world s information was digital; today, 98 percent of the world s information is digital 1. With this ever increasing diversity and abundance of data (1,200 exabytes worth 2 ) bursting from the digital age, your ability to harvest real value from big data and avoid the pitfalls of hollow hype will determine your organization s success. The big data market is poised to reach $16.9 billion by 2015 and the broader market of business analytics solutions is forecast to reach $50.7 billion in 2016. Yet interestingly enough, only four percent of the 400 global companies surveyed by Bain & Company in 2013 believed that they are converting their investments in big data tools into meaningful business insights that improve decision making and financial performance. 3 From Atigeo s customer implementation experience, we believe success depends on your approach. Big data requires adoption of revolutionary technology that evolves faster than most companies can keep pace. However, many companies still attempt to use traditional IT planning, where migration to a new paradigm is slow and technology components are adopted in piecemeal fashion. This approach takes several years to complete and does not have any guarantee of ROI until the new solution is in production. By that time, it is very difficult to iterate to improve results or even change course. This whitepaper provides suggestions on how to select big data analytic solutions for your enterprise, introduces Atigeo s xpatterns platform, and provides xpatterns deployment examples. The U.S. will face a deficit of over 1.5 million data analysts 4 to help bridge the gaps. This shortage is already triggering a cascade of failed attempts at big data analytics using traditional approaches. Meanwhile, data growth already outstrips the ability for people and 20th Century technology to make sense of it all. Success in big data is no longer about data collection or data hoarding, which through commodity storage is easy for any enterprise to implement. The real return on any big data investment depends on analytical performance. This will determine how enterprises deliver differentiating and actionable insights and useful applications for end users (internal or external end to the enterprise). Data itself is not useful unless it is applied correctly to solve real business problems. Designing a great product has not changed even though data availability has; it is still about knowing and understanding users needs. Most important is correctly identifying when big data solutions are needed vs. conventional approaches. 1 The Rise of Big Data, Foreign Affairs, June, 2013 2 The Rise of Big Data, Foreign Affairs, June 2013 3 Big Data, Big Choices, Bain & Co., November 2013 4 Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May 2011

3 Here are three key questions enterprises should ask about their end- users: 1. Does the end user understand the difference between BI and advanced analytics? 2. Does the end user need to control the output (expert knowledge) or let the data identify insights without expert knowledge? 3. Does the end user care about exactness of the output as a trade- off of detail level insights? The figures below show how enterprises should determine the type of analytics that should be applied to their big data to satisfy end users needs. There are situations where optimizing an existing business intelligence solution has value, but a transition to a big data approach adds new layers of insight. For example, a growing concern in healthcare is population health management (PHM) which is concerned with the health of individuals in a group and how health outcomes are distributed in a group. A hospital can use conventional BI techniques on insurance claims data to generate historical reports of health outcomes. However, a big data approach based on the same data can produce forward- looking predictive analytics as well as more detailed inferences. This type of result would be valuable in PHM and many other enterprises, but represents a big change to the conventional, historically focused approach. Thus, big data analytics greatly enhance BI and enterprises should carefully examine the impact and potential of metric and reporting changes in order to adopt them. At Atigeo, we recommend enterprises build big data solutions through an iterative process to improve analytics models over time and provide compelling evidence of improvement in order to gain end user trust. This is a key example of why Atigeo s xpatterns is an ideal platform for enterprises to build models that automatically learn over time. Introducing Atigeo xpatterns What is xpatterns? As the only "end to end" big data analytics platform available today, xpatterns allows you to utilize your existing resources with a secure, enterprise- ready system that requires no datacenter build- out. With xpatterns you can seamlessly create a scalable, private virtual cloud and through xpatterns patented collection of intelligent algorithms you can access all of your data in real- time, leading to measurably better and faster answers.

4 xpatterns has a novel architecture that integrates state- of- the- art components across three logical layers: Infrastructure, Analytics, and Applications. xpatterns can act as a virtual abstraction layer across any IT system, extracting value from both legacy and new technologies immediately extending the life, value, and intelligence of an ecosystem. The Infrastructure layer offers remarkably fast integration without requiring a costly data warehouse implementation. It can quickly adapt to new technologies, allowing you to leverage and extend existing IT investments. xpatterns delivers managed cloud services and safeguards the privacy of your data, satisfying a broad range of regulatory, financial and legal requirements. The Analytics layer consists of a wealth of proprietary advance analytics algorithms that automatically build the best model for the questions being asked. This unique ability is made possible through the xpatterns Cooperative Distributed Inferencing (CDI) engine. In addition, xpatterns learns over time, self- optimizing through a hybrid approach of optimizing hard rules and soft rules, both supervised and unsupervised. For data scientists, who like to design their own model and run experiments, xpatterns provides easy to use analytics, automated experimentation and feature generation tools and many

5 other ready to use components to make modeling and experimenting in a distributed environment effortless. The Application layer includes visualization tools that allow enterprises to immediately visualize their big data in xpatterns platform and publish applications, all without integrating with any other software. The full workflow of building your own big data application can be done right from xpatterns Management Console in the cloud. Design Tenets xpatterns is the fastest, best- performing, and lowest- risk big data intelligence platform available today: All- inclusive: A complete platform for building applications and running advanced analytics on very large datasets. It provides integrated software across all three layers required for big data analytics: data ingestion, analytics and application development. Cutting- edge analytics: Includes a wide range of advanced intelligence components that run the gamut from market- tested to beta to just- out- of- research. Components include: machine learning, data mining, natural language understanding, dynamic ontologies, search, inference, and other analytics components. Intelligence technology R&D is Atigeo s prime directive, and we innovate continuously in this space. Cloud- based: Delivered via the cloud, meaning no hardware needs to be installed. Storage and compute capacity are managed by the platform, and can scale up and down easily. Fastest- to- market: From ramp- up time for new adopters of an xpatterns solution to delivery time via the cloud, all xpatterns design considerations are made to enable users to get their business results fastest to market. Enterprise- grade: Designed to build production- quality, line- of- business applications, the platform meets the following quality attributes: performance, scalability, high availability, reliability, security, manageability, extensibility, modularity, interoperability, testability, documentation, instrumentation and monitoring, backup and restore, disaster recovery and diagnostic tools. Compliant: Security, privacy, compliance and audit are built into the platform. In addition to software compliance, Atigeo s procedures for managing the cloud and our teams in charge of carrying them out also adhere to a corresponding set of compliance requirements. We enable cloud applications for the highest compliance standards, including HIPAA. Integrator: Includes a toolbox of choices for the infrastructure, analytics and application layers. Since different problems require different solutions, each customer leverages a subset of the tools that best fits their needs. The toolbox includes open source, commercial and Atigeo- designed components. Developer- ready: Currently, the APIs number over 100, covering the gamut from data ingestion to analytical processing, data updates, real- time queries and configuration. The APIs are authenticated over a secure channel, using standard Internet authentication protocols. The APIs are scalable, instrumented and monitored. API access is role- based, and roles can be configured for both developers and applications.

6 Fully- managed: Operates the cloud environment for you. Customers can rely on Atigeo s expertise to launch production applications quickly, at a known cost, without having to ramp up their IT, committing to long- term consulting engagements, or taking risks on the readiness of new technology. Who uses xpatterns? Layer by Layer Each of xpatterns 3 layers was built around business objectives, and align with the different roles and functions in your organization. Over time and via many customer projects, we have found these three roles are required to build end- to- end, intelligent big data applications: Data Analyst/ETL Analyst/Data Integration Engineer: Builds the quality and integration pipelines connecting many corporate systems to an xpatterns cloud. xpatterns tools support editing a data ingestion workflow; testing and scheduling data integration, and monitoring operations. Data Scientist An expert in statistics, machine learning and/or data mining, who uses the data products from the ETL Analyst to model, query and experiment on data. The tools include an integrated development environment (IDE) for creating rankers, classifiers, topics, queries and models. Application Engineer Builds user applications with data and models from the Data Scientist with application- specific tools. For dashboard applications, xpatterns has a turnkey dashboard studio tool. Today s evolving big data infrastructure has many other roles and tools, but we believe many of these will fade away as big data best practices mature. xpatterns abstracts away complexity for our clients by managing the cloud environment for them, and by orchestrating the software and tools according to these two principles: Façade Each person should see an optimized but minimal set of tools, data and software required for their job. Anything more distracts and reduces productivity; under the hood, advanced tools are there for any users who want them. Choice of tools While xpatterns comes with pre- packaged tools, every role should be able to pick their own. For example, if a data scientist prefers SAS or R, they should be able to easily and securely install big data connectors for them within an xpatterns cloud. xpatterns Deployment Examples Infrastructure technology should not drive or constrain applications A Fortune 500 company faced a predictive analytics challenge: make informed business decisions based on 10s of terabytes of data from multiple sources and systems. The company had data assets in the range of 5-10 billion customer behavior records. Their existing technology infrastructure produced conventional BI results: historical charts, tables, and dashboards showing what customers were doing in the past. Worse still, their predictive analytics were able to work on only a sample of the data, using only about 5 million records, or about 0.1% of the total set. The company applied models to their data sample which had become standard for their industry, based on academic research on even smaller datasets of 1,000 to about 100,000 records.

7 Adding xpatterns to their existing technology took only a few days of engineering work, rather than the typical months of infrastructure planning, in- house expertise and custom integration required by traditional approaches. Most significantly, with xpatterns the company was able to develop new big data models that leverage their entire data set of 5-10 billion customer behavior records. This produced an improvement over the best available academic model of 75%, creating an invaluable resource out of what had been a burdensome dataset that could only be sampled. Advanced analytics and modeling quality should be bound by computing power, not manual labor by data scientists Another major US- based data company was doing statistical modeling with software packages running on single machines with small data samples. The company s time to market was delayed as data analysts made hard choices partitioning the data. Their products data models changed and caused many further delays based on different data samples and lack of visibility across the many sample datasets also compromised the model s validity. With few data scientists, the company s incorrect data- based assumptions came at a high cost and ROI could not be realized. With the xpatterns analytics optimization engine, this company s data analysts could focus on designing models based on all the data, and most importantly - - run multiple experiments in parallel. The company redesigned their data model and increased computing capacity for a one- week experiment, where they applied xpatterns optimization engine to the entire dataset, running hundreds of concurrent experiments, and producing an optimal production model, that would have otherwise taken months of manual labor to come across. In addition, xpatterns easily allowed them to decommission existing computing clusters. Additionally, with conventional approaches you are forced to clean the data as part of the ETL. However, xpatterns easily handles noisy and dirty data and learns from data that is ingested as is. In this use case, the precision/recall curve below illustrates that with xpatterns the more data is used for training, the more accurate the results. 1 0.95 0.9 0.85 0.8 100% data 40% data 20% data 5% data 0.75 0.7 0 0.2 0.4 0.6 0.8 1

8 These are two ways that the xpatterns platform increases success of big data initiatives: xpatterns analytics layer has tools to make data scientists as efficient as they can be, and the xpatterns capabilities automate and improve the production model without data scientist intervention. Text analytics should learn: Semantics and learning make a difference A major healthcare company was using slow legacy systems to process large amounts of unstructured text: files with no indication of meaning, subjects, or categorization of the file contents. Nearly all enterprises have unstructured data, which in healthcare includes physicians encounter notes doctors notes taken during examinations. The company needed to improve their long and costly unstructured data processing to augment their bottom line. In their line of business, this means decreasing insurance claims processing time by swiftly and accurately adding standardized medical codes for procedures and diagnoses. Using xpatterns text analytics, the healthcare company was able to add correct medical codes in spite of differences in individual physicians use of language and jargon, and different semantic contexts. Among many other semantic capabilities, xpatterns is able to discern negations ( the patient never broke her leg in childhood ) and able to distinguish the context of a phrase, such as family history and physical exam. xpatterns capabilities also include the correct detection of conditional and hypothetical statements ( if lab results are positive, the diagnosis is kidney failure ). xpatterns also continually learned from what it found in the company s data, basing new inferences on that feedback. This means that xpatterns is able to continue operating successfully as new sources of unstructured data are encountered. Conclusion Companies in all sectors are increasingly realizing that they are effectively big data companies by virtue of their massive enterprise and customer data repositories. While acutely aware of the critical need for analytical insight into both stored and streaming data, a number of factors impede progress toward surfacing intelligent solutions: the state of the data, lack of resources and expertise, lack of infrastructure, state of tools and solutions, and the quality and evaluation of results. If you have a tough data problem, not easily solved by current methodologies, xpatterns can positively impact your organization in widely influential ways, as the fastest, highest performing, and lowest risk way of building intelligent big data applications. By uncovering more relevant connections in data at game- changing speed workflow procedures are streamlined, development cycles are reduced, and customer and patient needs are anticipated more accurately. From healthcare, energy, security and beyond if it requires information to do its job, xpatterns makes it intelligent. As the only "end to end" big data analytics platform available today, xpatterns allows you to utilize your existing resources, and seamlessly create a scalable, private virtual cloud. Through its patented collection of intelligent algorithms, xpatterns advanced analytics gives you access to all of your data in real time, and leads to better, faster answers.