Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand



Similar documents
Big Data at Cloud Scale

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Buying vs. Building Business Analytics. A decision resource for technology and product teams

Embedded Analytics Vendor Selection Guide. A holistic evaluation criteria for your OEM analytics project

Eliminating Complexity to Ensure Fastest Time to Big Data Value

Architected Blended Big Data with Pentaho

Product Innovation with Big Data

Blueprints for Big Data Success

Eliminating Complexity to Ensure Fastest Time to Big Data Value

The SMB s Blueprint for Taking an Agile Approach to BI

The Ultimate Guide to Buying Business Analytics

Performance and Scalability Overview

The Ultimate Guide to Buying Business Analytics

Performance and Scalability Overview

Your Path to. Big Data A Visual Guide

Three Open Blueprints For Big Data Success

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

IP Expo 2014 Pentaho Big Data Analytics Accelerating the time to big data value London, UK

Big Data Use Cases. To Start Today. Paul Scholey Sales Director, EMEA. 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)

Informatica PowerCenter The Foundation of Enterprise Data Integration

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Hadoop in the Hybrid Cloud

Datameer Cloud. End-to-End Big Data Analytics in the Cloud

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Tap into Big Data at the Speed of Business

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Deploying an Operational Data Store Designed for Big Data

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Ganzheitliches Datenmanagement

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

ANALYTICS STRATEGY: creating a roadmap for success

How to Enhance Traditional BI Architecture to Leverage Big Data

How To Use Hp Vertica Ondemand

Accenture and SAP: Delivering Visual Data Discovery Solutions for Agility and Trust at Scale

Cisco Data Preparation

ICD-10 Advantages Require Advanced Analytics

The Clear Path to Business Intelligence

The IBM Cognos Platform

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Solve Your Toughest Challenges with Data Mining

Data Integration Hub

Getting Started & Successful with Big Data

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

Actuate for: Financial Management Reporting Applications

IBM Cognos Controller

Delivering Real-World Total Cost of Ownership and Operational Benefits

90% of your Big Data problem isn t Big Data.

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Solve your toughest challenges with data mining

Actian SQL in Hadoop Buyer s Guide

SimCorp Solution Guide

SQL Server 2012 Business Intelligence Boot Camp

Data Integration Checklist

CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Native Connectivity to Big Data Sources in MSTR 10

Datosphere Platform Product Brief

ElegantJ BI. White Paper. Considering the Alternatives Business Intelligence Solutions vs. Spreadsheets

Luncheon Webinar Series May 13, 2013

Cloudera Enterprise Data Hub in Telecom:

6 Steps to Faster Data Blending Using Your Data Warehouse

Sybase Solutions for Healthcare Adapting to an Evolving Business and Regulatory Environment

The Business Analyst s Guide to Hadoop

Data virtualization: Delivering on-demand access to information throughout the enterprise

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Why Big Data in the Cloud?

Reporting and Analytics Solution for Next-Gen Mobile PaaS Company

Extend your analytic capabilities with SAP Predictive Analysis

O p t i m i z i n g t h e N e t w o r k t o M e e t T o m o r r o w ' s I C T D e m a n d s

Strengthen security with intelligent identity and access management

The Inside Scoop on Hadoop

A ROAD MAP FOR GEOSPATIAL INFORMATION SYSTEM APPLICATIONS ON VBLOCK INFRASTRUCTURE PLATFORMS

The Clear Path to Business

The Future of Data Management

An Oracle White Paper January Access Certification: Addressing & Building on a Critical Security Control

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Sisense. Product Highlights.

WHITE PAPER Get Your Business Intelligence in a "Box": Start Making Better Decisions Faster with the New HP Business Decision Appliance

Protecting Big Data Data Protection Solutions for the Business Data Lake

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Using Tableau Software with Hortonworks Data Platform

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Workflow: More than Just Intelligent Workload Management for Big Data

A complete platform for proactive data management

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Implementing Software- Defined Security with CloudPassage Halo

BusinessObjects XI. New for users of BusinessObjects 6.x New for users of Crystal v10

Transcription:

Build a Streamlined Data Refinery An enterprise solution for blended data that is governed, analytics-ready, and on-demand

Introduction As the volume and variety of data has exploded in recent years, putting that data to business use has been a complex undertaking. Extracting value from Big Data has been a substantial challenge in and of itself, but it has been compounded by new preferences for data consumption. Basic batch reporting doesn t cut it anymore information consumers want analytics that they can explore in their favorite format on-demand, often in the context of the other software applications they use every day. All of this leaves IT organizations strained just to keep up. New data visualization tools have helped line of business groups help themselves ; however, demands have only partially been met. At best, business users have access to a subset of data but they are stuck between the competing dilemmas of not trusting the data and not being able to wait for it to be stamped with the IT seal of approval. At worst, users simply cannot access the data they want when they want it this is often the case if they require Big Data or unstructured data to meet their goals. In light of these circumstances, many organizations can benefit from a Streamlined Data Refinery architecture, which represents a flexible, economical way to process and automate delivery of information to large numbers of users for a variety of analytic purposes. A Streamlined Data Refinery is powered by on-demand orchestration for blending traditional data and Big Data, and it is a first step toward Governed Data Delivery (see below). In this brief, we will explore the relevant use cases where a Streamlined Data Refinery can deliver substantial value and break down the solution architecture to its component parts. Governed Data Delivery is defined as the delivery of blended, trusted and timely data to power analytics at scale regardless of data source, environment, or user role. It lays the groundwork for seamless end user exploration and analysis of validated data blends from across the organization. PENTAHO 1

The Business Problem Given the need to deliver more and more diverse data to users in ever-narrower time windows, we have identified the following use cases where a premium is placed on timely delivery of custom data blends that are fully governed and analytics-ready. This is of course not an exhaustive list. On-demand Data for Business Analysts & Researchers These roles often need to go beyond traditional SQL-based approaches to retrieving data from individual databases. Researchers often need deep slices of information from unwieldy sources, includ-ing machine/ sensor data, weblog data, and unstructured text, which are often archived in Hadoop. One solution is providing an easy ability to request custom data sets on-demand and dropping blended Big Data sets off in a convenient location (i.e. FTP server or collaboration portal) and in a ready-to-use format (i.e. Excel or CSV). Further, data sets can be staged in an analytical database like HP Vertica to offload complex query workloads from Hadoop. Controlled Delivery of Data Sets to Auditors & Regulators Companies in highly regulated industries like financial services, healthcare, and energy are pressured to prove they are in compliance with government regulations. This often requires them to combine data from multiple sources, run statistics and prove that their data management practices meet specif-ic standards. For example, the Dodd-Frank Act mandates capital reserve ratios for bank holding companies and savings & loans. Stress tests must be run on a variety of bank operational data sources to assess compliance and results of these tests must be auditable. Forensic Analysis in Responses to Exceptional Business Events The scale of Big Data often prevents organizations from preintegrating it into a data warehouse using traditional ETL approaches. Organizations increasingly rely on predictive analytics to screen for anomalies (such as financial fraud or network security threats) and to generate alerts that indicate the need for detailed forensic research by analysts. This can be optimized and accelerated by automating the preparation of analytic datasets for end users. PENTAHO 2

Data Blending as a Service for Customers & Partners Data products are an emerging source of revenue for SaaS vendors, and many traditional corporations are incorporating analytics into their customer and partner facing applications to boost stakeholder relationships. In addition to providing raw data feeds to third parties, companies can offer data blending as a value-added service. In this scenario, users upload data to a site where it can be combined with the hosting organization s data and then returned as an enriched data set. Among these use cases, there are three common primary needs that present opportunities to drive additional productivity and business value for the organization in question. They are as follows: NEED FUNCTIONAL DESCRIPTION ASSOCIATED VALUE On-Demand Orchestration Proper Data Governance Blended Data in Format of Choice Users need to be able to easily request complex data sets, and the resultant data delivery must happen in a just-in-time fashion which requires the triggering of data processing, blending, and modeling on demand. An automated process must be implemented to facilitate this on-demand orchestration. This analytics-ready data must adhere to all relevant governance rules, ensuring that data is trusted, compliant, up-to-date, and properly combined. Analytics users require the blending and enrichment of multi-source data, delivered in a consumable way, whether that is in an ad hoc analysis tool or in a specific file format and location. This accelerates time to value in analytics projects and simplifies the process for end users by hiding complex details of underlying systems, empowering the business to respond to changing conditions quickly. Further, IT saves time by addressing many requests with one automated process for enduser self-service. This minimizes risk and ensures confidence in business decisions made based on all enterprise data. This makes users more productive in getting insight from raw data. PENTAHO 3

The Streamlined Data Refinery Solution A Streamlined Data Refinery architecture meets all of the core requirements of the use cases described above, providing for a userdriven trusted data delivery process. At its core, the design pattern accommodates an on-demand process of user-initiated data requests, blending and refining of any data, automatic analysis schema generation, and publishing of analytic data sets in the format of choice. It consists of several key components. Scalable Data Processing Hub Usually Hadoop, this store is meant to house and manage a variety of structured and unstructured data from across the organization. In the diagram, Hadoop serves as the landing zone for data across the web, social media, transactional systems, and machines/sensors. High Performance Database The database chosen must facilitate high performance queries for analytics and visualization. When scale is required, an analytical database such as HP Vertica is a solid choice. Pentaho Data Integration Pentaho s highly scalable data integration engine, managed through its intuitive end user interface, provides the glue between the different data sources and stores in this architecture. The entire process outlined here can be triggered on-demand via PDI: Blending & Orchestration: PDI ingests data from virtually any data source, including both traditional systems and Big Data stores and then processes, cleanses, and blends the data in the required combinations to drive insight. Automatic Modeling & Publishing: As part of the data orchestration process, PDI automatically creates an OLAP schema and publishes it to the Pentaho Business Analytics server for end user exploration and visualization. Governance: PDI s robust functionality enables IT to quickly and easily validate data sources being blended at the source allowing for the right measure of control, without creating unnecessary frictions to end user data access. PENTAHO 4

Streamlined Data Refinery Architecture Diagram Architected Blends Data developers leverage the power of Pentaho Data Integration to create a data blending process that users can execute at run time. This partnership between IT and Business ensures governed data blends on demand through self-service data requests. Self-Service Data Request Users can request analytics-ready data delivery on demand via a webbased interface, created with Pentaho s CTools framework for 100% custom analytics user experiences. Through such an online interface, users can enter parameters (i.e. data fields, source systems, time ranges, etc.) quickly and easily. They can also select whether data should be populated as a governed data source in Pentaho Business Analytics or in another format (Excel, CSV, etc.) in a different target location. Pentaho Business Analytics Represented by Analyzer and Reports in the diagram, Pentaho Business Analytics is a flexible toolset for data exploration, visualization, and consumption. Users leverage Pentaho here to access the automatically generated data models for interactive analysis. PENTAHO 5

Customer Example: Financial Regulatory Body Goal Empower analysts to identify suspicious patterns among billions of market transaction records per day. Pentaho Solution Users explore summary data with the ability to request detailed data sets on the fly for drill-down through multi-dimensional models. Architecture Leverages Hadoop with Amazon Elastic MapReduce and Hive; uses Amazon RedShift as a high-performance analytical database in the cloud. Advantages of the Streamlined Data Refinery In addition to delivering unprecedented access to diverse, governed data for analytics on-demand, Pentaho s Streamlined Data Refinery solution architecture provides a number of other benefits. Data Sets Are Virtualized and Managed Through Logical Service Endpoints This means that the implementation is hidden from users, allowing IT to re-platform or refactor the underlying data infrastructure without affecting LOB users. Security is Centrally Enforced All requests are made through a common application (the Pentaho User Console), which means that access can be revoked simply by disabling a user account or removing their membership from a role. Additionally, Pentaho controlled access can be configured to map to existing enterprise security schemes. PENTAHO 6

Storage, Data Transformations, and Query Serving (SQL and OLAP) Can Be Implemented Using Products That Match Existing Skills and Infrastructure PDI jobs and transformations are flexible, allowing IT developers to run workloads in Hadoop (MapReduce or YARN), in dedicated PDI clusters or on single PDI servers. Similarly, Pentaho s OLAP engine (Mondrian) can work with a large number of analytical databases. Moreover, the infrastructure can evolve to take advantage of new storage and processing options without affecting the availability of the logical service endpoints. Pentaho s Adaptive Big Data Layer helps facilitate this future-proofing. The Backlog of Requests for Custom Data Feeds Can Be Reduced By implementing parameterized request forms as part of a Streamlined Data Refinery, IT departments can offload the selection and filtering of raw data to analysts and researchers. This is directly analogous to selfservice interactive reporting, except that the data can be used with any number of company-standard reporting, analysis and statistical tools. The Streamlined Data Refinery s user-driven, governed process User initiated data request Blending and refining any data Publish analytic data sets Automatically generate data model PENTAHO 7

Conclusion In this discussion, we highlighted three core data delivery needs that are only being met on a limited basis in the market today: > Orchestrate on-demand processing, blending, and modeling of user requested data sets in order to accelerate time to value in complex analytics initiatives. > Ensure proper data governance during the delivery process, such that risk is minimized and confidence is increased in datadriven decisions. > Provide blended and enriched data in the end user format of choice, so that business users can be more productive in deriving insight from diverse data. Indeed, these challenges cut across a variety of sector-specific use cases discussed, including deep data exploration by researchers, forensic analysis of unexpected events, compliance assurance in regulated industries, and delivery of data to key customers and partners as a service. The Streamlined Data Refinery provides a well-defined solution architecture to address these needs in a fashion that both leverages existing organizational competencies and ensures that the on-demand data delivery process can quickly adjust to changes in the data environment. PENTAHO 8

Global Headquarters Citadel International - Suite 340 5950 Hazeltine National Dr. Orlando, FL 32822, USA tel +1 407 812 6736 fax +1 407 517 4575 US & Worldwide Sales Office 353 Sacramento Street, Suite 1500 San Francisco, CA 94111, USA tel +1 415 525 5540 toll free +1 866 660 7555 Learn more about Pentaho Business Analytics United Kingdom, Rest of Europe, Middle East, Africa London, United Kingdom tel +44 (0) 20 3574 4790 toll free (UK) 0 800 680 0693 pentaho.com/contact +1 (866) 660-7555 FRANCE Offices - Paris, France tel +33 97 51 82 296 toll free (France) 0800 915343 GERMANY, AUSTRIA, SWITZERLAND Offices - Munich, Germany tel +49 (0) 322 2109 4279 toll free (Germany) 0800 186 0332 BELGIUM, NETHERLANDS, LUXEMBOURG Offices - Antwerp, Belgium tel +31 8 58 880 585 toll free (Belgium) 0800 773 83 ITALY, SPAIN, PORTUGAL Offices - Valencia, Spain toll free (Italy) 800-798-217 toll free (Spain) 900-868522 toll free (Portugal) 800-180-060 PENTAHO 10

Copyright 2014 Pentaho Corporation. Redistribution permitted. TOWARD All trademarks A are STREAMLINED the property of their DATA respective REFINERY owners. For the latest information, please visit our web site at pentaho.com. PENTAHO 11