Profit from Big Data flow. Delivering Big Data Success With the Signal Hub Platform



Similar documents
Signal Hub for Wealth Management

Big Data Integration: A Buyer's Guide

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

How To Turn Big Data Into An Insight

The Six Critical Considerations of Social Media Threat Intelligence

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Direct-to-Company Feedback Implementations

Supply Chain: improving performance in pricing, planning, and sourcing

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Adobe Insight, powered by Omniture

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

birt Analytics data sheet Reduce the time from analysis to action

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Cray: Enabling Real-Time Discovery in Big Data

Best Practices for Hadoop Data Analysis with Tableau

IBM WebSphere Business Monitor, Version 6.1

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

OPERA SOLUTIONS CAPABILITIES. ACH and Wire Fraud: advanced anomaly detection to find and stop costly attacks

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage

Consistent, Reusable Analytics for Big Data: The Hallmark of Analytic Applications

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

The Future of Business Analytics is Now! 2013 IBM Corporation

Big Data at Cloud Scale

Social Media Implementations

Navigating Big Data business analytics

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Analance Data Integration Technical Whitepaper

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Services From Hitachi Data Systems

ANALYTICS STRATEGY: creating a roadmap for success

Big Data Analytics Nokia

Predictive Analytics

Data Mining Solutions for the Business Environment

SAP Predictive Analytics

Databricks. A Primer

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

How To Create A Data Science System

ACCESS INTELLIGENCE. an intelligent step beyond Access Management. White Paper

Big Data. Fast Forward. Putting data to productive use

Extend your analytic capabilities with SAP Predictive Analysis

Apache Hadoop: The Big Data Refinery

Ganzheitliches Datenmanagement

VIEWPOINT. High Performance Analytics. Industry Context and Trends

The Power of Predictive Analytics

PRIME DIMENSIONS. Revealing insights. Shaping the future.

This Symposium brought to you by

Architecting an Industrial Sensor Data Platform for Big Data Analytics

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

ORACLE UTILITIES ANALYTICS

Oracle Real Time Decisions

Modern Payment Fraud Prevention at Big Data Scale

An Enterprise Framework for Business Intelligence

Modern Payment Fraud Prevention at Big Data Scale

III JORNADAS DE DATA MINING

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

Hospital Staffing Optimizer. Forecasting patient demand for better hospital staffing. Profit from Big Data flow

Business Process Management In An Application Development Environment

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Databricks. A Primer

How To Understand The Power Of Decision Science In Insurance

Three Open Blueprints For Big Data Success

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

How To Handle Big Data With A Data Scientist

[callout: no organization can afford to deny itself the power of business intelligence ]

Analance Data Integration Technical Whitepaper

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

TEXT ANALYTICS INTEGRATION

MDM and Data Warehousing Complement Each Other

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Machine Data Analytics with Sumo Logic

Understanding the Value of In-Memory in the IT Landscape

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator

Integrating a Big Data Platform into Government:

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

The IBM Cognos Platform

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

BIG DATA What it is and how to use?

How To Understand The Benefits Of Big Data

Interactive data analytics drive insights

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

POLAR IT SERVICES. Business Intelligence Project Methodology

Azure Machine Learning, SQL Data Mining and R

ElegantJ BI. White Paper. The Enterprise Option Reporting Tools vs. Business Intelligence

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

Business Intelligence Meets Business Process Management. Powerful technologies can work in tandem to drive successful operations

AtScale Intelligence Platform

Data2Diamonds Turning Information into a Competitive Asset

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Transcription:

Profit from Big Data flow Delivering Big Data Success With the Signal Hub Platform

2 The Big Data Challenge The business opportunities resulting from Big Data represent a disruptive force that, if properly harnessed, will change how companies operate and compete. While legacy data mining and reporting tools will continue to play a vital role in large enterprises, they were not architected nor intended to capture the value of Big Data. A new type of technology is now needed technology that s as fresh and innovative as the use cases it was designed to solve use cases that didn t exist even five years ago. The organizations grasping these opportunities are drawing new insights from underutilized data assets and then making this breadth of data accessible to broader audiences. And it s Opera Solutions Signal Hub that s behind the scenes of these success stories. It s helped global companies tackle their most complex challenges with methods that are better, faster, and cheaper than applying legacy technology to present-day opportunities. Legacy Solutions Are Insufficient Big Data is an evolution of enterprise analytics and data management processes. It s a huge leap forward but still part of a natural evolution. To understand the role of new Big Data technologies, it s important to understand the ongoing role of legacy tools such as business intelligence (BI) tools and associated data warehouses. BI tools are primarily used for two purposes. First, they help trained data analysts perform ad hoc analysis of historical data. Second, they are used to deliver predefined, standardized business reports and dashboards to executive- and management-level users. These dashboards, or reporting applications, include key performance indicators and charts focused on corporate operations, sales performance, supply chain, and others while allowing end users to drill-down into greater data granularity. For example, a sales VP may want to review national sales metrics and then tease the data apart by region, district, city, and individual store reports. While the market for these BI technologies is well structured and stable, the marketplace for Big Data solutions is relatively undefined and immature. Most products touting Big Data capabilities, while valuable, are merely components of the overarching architecture needed to capture data from a growing number of sources and deliver new types of insights. Working with these new technologies is highly specialized technical work, but just incorporating these new skill sets into a Big Data infrastructure does not guarantee Big Data business results. Often these technology solutions merely shift the burden of delivering business value from IT and software engineers to data scientists. Data scientists represent a necessary yet rare commodity that is a prerequisite for Big Data success.

3 Signals Big Noise to Small Data Just because there s suddenly a glut of data doesn t make that data particularly interesting or valuable, at least in its raw form. Like raw crude oil full of impurities, sludge, and gunk, putting it into the machines of commerce unrefined will just gum up the works. And yet, hidden in this unrefined, untamed flow of the world s data is more predictive information than has ever been available before. This is what we call Signals the valuable patterns, connections, and correlations that, if properly extracted, allow us to predict behavior and outcomes far more accurately than we could in the past. Signals are the data elements, patterns, and calculations that have, through scientific experimentation, been proven valuable in predicting a particular outcome. And it s these Signals not the Big Data where they re hidden that hold the real value. Purchase Patterns Payment Patterns Shut off Purchase Payment Behavior Over Time Credit line Increase Request Alert Signals are increasingly important in a Big Data world, where data is a fast-flowing, evergrowing, heterogeneous, and exceedingly noisy input. Big Data s sheer size as well as other statistical properties makes it difficult, if not impossible, to use as is. Transforming data into Signals is absolutely critical; we can rarely use untransformed, raw data successfully. High-quality Signals are necessary to distill the relationships among all of the entities surrounding a problem and across all of the attributes (including their time dimension) associated with these entities. In effect, Signals capture underlying drivers and patterns to create useful, accurate inputs that are capable of being processed by a machine into algorithms. Indeed, for most problems, high-quality Signals are certainly as important in generating an accurate prediction as the underlying machine-learning algorithm that acts upon these Signals in creating the prescriptive action. Signals are key ingredients to solving an array of problems, including classification, regression, clustering (segmentation), forecasting, collaborative filtering, and optimization. Signals are hierarchical. That is, within the Signal Hub, the Signal array might include simple Signals that can be used not only by themselves to predict behavior (e.g., customer behavior powering a recommendation) but can also be used as inputs into more sophisticated predictive models. These models, in turn, generate second-order, highly refined Signals. These Signals typically serve as inputs to business-process decision points. Signals can be both descriptive and predictive and can provide a multi-dimensional view of specific data types.

4 Signals can be categorized into classes. Here are a few examples: 1. Sentiment: Captures collective prevailing attitude about an entity, given a context. An entity can be a company, market, country, etc. Typically, sentiment Signals have discrete states, such as positive, neutral, or negative. (Example: Current Sentiment on X Corporate Bonds is Positive. ) 2. Behavior: Captures an underlying fundamental behavioral pattern for a given entity (e.g. consumer) or a given dataset. These Signals are most often a time series and depend on the type of behavior being tracked and assessed. Examples of behavior Signals include aggregate money flow into ETFs, number of 30 days past due in last year for a credit card account, and propensity to buy a given product. 3. Event/Anomaly: Discrete in nature and used to trigger certain actions or alerts when a certain threshold condition is met. Examples include ATM withdrawal that exceeds 3X the daily average or a bond rating downgrade by a rating agency. 4. Membership/Cluster: Designate where an entity belongs, given a dimension. For example, gaming establishments create clusters of their customers based on spend high rollers, casual gamers, etc. Wealth management firms can create clusters of their customers based on monthly portfolio turnover such as frequent traders, buy and hold, etc. 5. Correlation: Continuously measure the correlation of various entities and their attributes throughout a time series of values between 0 and 1. Examples include correlation of stock prices within a sector, unemployment and retail sales, interest rates and GDP, or home prices and interest rates. Signals have attributes based on their representation in time or frequency domains. In a time domain, a Signal can be continuous-time or discrete-time. An output from a blood pressure monitor is an example of a continuous-time Signal; the daily market close values of the Dow Jones Index is an example of a discrete-time Signal. Within the frequency domain, Signals can be defined as high or low frequency. For example, the asset allocation trends of a brokerage account can be measured every 15 minutes, daily, and monthly. Depending on the frequency of measurement, a Signal derived from the underlying data can be fast-moving or slow-moving. Figure 1: Advanced techniques used in Signal discovery

5 Identifying, extracting, and calculating Signals at scale from noisy Big Data requires a set of predefined Signal schema and a variety of algorithms. A Signal schema is a specific type of template used to transform data into Signals. Different types of schema may be used, depending on the nature of the data, the domain, and the business environment. Figure 1 details some of the techniques we use for initial Signal discovery. Signal Hub Platform Opera Solutions Signal Hub integrates Big Data from both inside and outside the enterprise; provides the technology to identify, extract, and store Signals; and supports deployment of Big Data applications. It addresses Big Data challenges in a consistent and repeatable way, which greatly accelerates the delivery of business value. From a technical perspective, the easiest way to understand the architecture depicted in Figure 2 is to follow the full lifecycle of data as it is processed by the platform, organized into three major themes: batch processing, interactive processing, and analytic development. K L J H P G I O B F A C E N D M Figure 2: Signal Hub Reference Architecture

6 Batch Processing Much of the heavy lifting in a Signal Hub is handled using batch processes. While users often ask about real-time or just-in-time processing, the reality is that many state-of-the-art algorithms are fed by enterprise systems that use batch processing in order to integrate with other batch systems. The Signal Hub receives batch data via SFTP or a landing zone directory and real-time data via HTTP or MQ. Ultimately, the work of the Signal Hub begins once data is made available either through file transfer or via an API. The components of batch processing include the following: A. The base layer of the batch-processing stack is a workflow engine that is configured to execute all batch-processing work streams. It ensures that all new data is properly processed and all exceptions and alerts are handled. Processing is depicted in the reference architecture from bottom to top with the workflow engine coordinating across this entire lifecycle. This processing can run at any required frequency and can be triggered by the arrival of data to the landing zone directory, according to a schedule, or by a defined event. B. The data flow engine layer is the transformation workhorse of the batch system. Data flows are configured declaratively, specified in a specialized language and leveraged by an internally maintained common library of data operators and connectors. The engine is responsible for executing specific data flows on specific data when initiated by the workflow engine. These processes also produce metadata used to feed downstream processes. This abstraction provides two important capabilities: a. Common elements of data processing are extracted into reusable operators and connectors, which allow the flow specification to be tailored for each Signal Hub. b. Flexible execution environments allow the data flow engine to operate against different data infrastructures and storage systems without requiring rewriting of the flow definitions or operators. This allows Signal Hubs to grow from simple flat-file processing to scaled-out systems like Hadoop. It also allows us to push processing down to the underlying infrastructure, thus leveraging existing capabilities that might exist at a customer site. C. The data management layer is the backbone of the system and is decoupled from the processing logic because we employ a variety of technologies. For example, Hadoop is used for the largest input data sets, where extensive transformation is required. We also leverage columnar and in-memory data stores for certain workloads. In some cases, even traditional relational database technology is sufficient. The data management layer is fed by the variety of connectors in the data flow engine layer and present prepared data sets via uniform interfaces to the Signal processing logic. We leverage industry-standard interfaces such as JDBC, where applicable, and have developed our own abstractions for less standardized technologies such as column-family data stores. It is not uncommon to see a mix of such technologies at various stages of processing within a single Signal Hub. D. Intelligent ETL handles all of the data quality management, mapping, linking and structuring of the data that arrives. The intelligence comes from a deep understanding of the data

7 sources, enabling monitoring for statistical deviations and a system for alerting when they occur. The results are clean data that are put into the data management layer for subsequent processing. E. The SigGMS layer is responsible for calculating all of the Signals from Big Data and also managing Signal metadata. In batch mode, SigGMS is executed in a data flow, streaming data from the data management layer through the Signal code. In real-time mode, Signal services read and write data directly from the data management layer using the appropriate data management APIs for random access (e.g. JDBC, Key/Value store, column-family). F. The batch analytics layer contains a variety of machine-learning and predictive modeling capabilities that we employ to service applications. These models consume Signals generated by SigGMS and similarly process data either in data flow streams (batch) or as callable services. G. Batch data is either kept in the batch infrastructure for long-term use by the interactive layer or staged in a dedicated data-store in the interactive layer. The distinction is made based on the requirements of the interactive layer. Often a great deal of data decimation occurs during the transfer. Also, the shape of the data storage may be changed at this point to optimize for ad-hoc queries in the interactive layer. Interactive Processing Everything depicted above the batch layer collectively forms the interactive layer. Sometimes referred to as real-time, the distinguishing feature of the interactive layer is that it is invoked on-demand rather than a-priori. Actual quality-of-service requirements vary from milliseconds to several seconds depending on integration requirements and the types of interactions that occur. For example, a credit card fraud decision must occur as part of a larger overall processing chain and must occur in milliseconds, while a request from an interactive Website can take 100ms before crossing the human perception threshold. Some interactive services, such as executing what-if simulations based on user input may take tens of seconds to execute but are still returned interactively rather than queued for a nightly batch. Ultimately, the timing requirements drive how much work can be done and how much computing power is needed to do that work in the time allowed. The steps involved in interactive processing include the following: A. The interactive layer s data access API abstracts any differences between storage technologies. This API is very similar to the batch storage API in that it leverages standards where possible but allows us to define extended APIs for nonstandard technologies. B. The online scoring portion of the architecture is an optional set of services that are instantiated for use cases where models need to be invoked on-demand. This is needed for cases where real-time information is needed to formulate a response such as a fraud score for a credit card transaction. C. The Signal Hub API is the interaction gateway to the Signal Hub. It is realized as a Java

8 Application Server that includes the ability to handle all real-time transaction and event flows, real-time queries of Signals and model data, and live feedback from the external systems. By default, we provide these services through a RESTful API but can also provide connectors for SOAP based integration or MQ-style integration. D. Applications and data visualization live outside of the Signal Hub but consume Signal Hub content via the Signal Hub API. The creation of these connections is not typically part of a Signal Hub engagement, but they should be designed and created in a way that ensures proper integration and adoption. E. Signal Monitor is an out-of-the-box application provided by Opera Solutions, which leverages information within the Signal Hub to help monitor Signals for exceptional situations and continued relevance of predictive power. Architecturally, the Signal Monitor lives outside of the Signal Hub, interacting through the same API available to any application, but it delivers important functionality, which is generally useful in Signal Hub deployments. Analytic Development While the batch and interactive systems are an integration of many enabling technologies that form a Signal Hub, the task of defining precisely how a Signal Hub should operate in any given data environment cannot be removed from the equation through any amount of engineering or automation. It is the way in which this system supports the first-time and ongoing development of Signals, which help realize the promise of Big Data. Our data scientists can tap into the data flow at various points, as depicted in the reference architecture. In doing so, they accomplish the following: A. Incorporate new data types by enhancing the meta data used to drive Intelligent ETL. B. Discover and implement novel Signals based on new experiments or adapt existing Signals to new business domains. C. Retrain models to capture nonstationary aspects of problem domain, such as accounting for drift in input data. D. Get feedback from the interactive layer about how Signals are being used to better inform the aforementioned continuous improvement. The symbiotic nature of the services provided by the Signal Hub and the scientists who constantly mine vast and disparate data sources for value is the way in which Opera Solutions is able to offer a data science platform that continually adapts to real-world complexity and an evolving data landscape, all without placing unrealistic demands on our customers.

9 Profit from Big Data flow Jersey City Boston San Diego London Shanghai New Delhi ABOUT OPERA SOLUTIONS, LLC Opera Solutions (www.operasolutions.com, @OperaSolutions) combines advanced science, technology, and domain knowledge to extract predictive intelligence from Big Data and turn it into insights and recommended actions that help people make smarter decisions, work more productively, serve their customers better, grow revenues, and reduce expenses. Its hosted solutions, delivered as a service, are today delivering results in some of the world s most respected organizations in financial services, healthcare, hospitality, telecommunications, and government. Opera Solutions is headquartered in Jersey City, NJ, with other offices in North America, Europe, and Asia. For more information, visit the website or call 1-855-OPERA-22.