Composite Software Data Virtualization Turbocharge Analytics with Big Data and Data Virtualization



Similar documents
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Big Data Technologies Compared June 2014

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Performance and Scalability Overview

Performance and Scalability Overview

How To Handle Big Data With A Data Scientist

Big Data and Data Science: Behind the Buzz Words

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Peninsula Strategy. Creating Strategy and Implementing Change

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Cloud Scale Distributed Data Storage. Jürmo Mehine

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Native Connectivity to Big Data Sources in MSTR 10

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

The 3 questions to ask yourself about BIG DATA

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Customized Report- Big Data

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

SAP Real-time Data Platform. April 2013

I/O Considerations in Big Data Analytics

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

This Symposium brought to you by

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

So What s the Big Deal?

Tap into Hadoop and Other No SQL Sources

BIG DATA-AS-A-SERVICE

Advanced In-Database Analytics

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Tap into Big Data at the Speed of Business

Big Data and Its Impact on the Data Warehousing Architecture

Oracle Big Data SQL Technical Update

Oracle Database 12c Plug In. Switch On. Get SMART.

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

Data Integration Checklist

Oracle Big Data Building A Big Data Management System

Crazy NoSQL Data Integration with Pentaho

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Luncheon Webinar Series May 13, 2013

Cisco Solutions for Big Data and Analytics

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

<Insert Picture Here> Big Data

Large Scale/Big Data Federation & Virtualization: A Case Study

BIRT in the World of Big Data

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

NoSQL for SQL Professionals William McKnight

Databases 2 (VU) ( )

Evolution to Revolution: Big Data 2.0

Why Big Data in the Cloud?

NoSQL Data Base Basics

Data processing goes big

The Power of Predictive Analytics

How To Use Hp Vertica Ondemand

The Next Wave of Data Management. Is Big Data The New Normal?

Hurtownie Danych i Business Intelligence: Big Data

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

The IBM Cognos Platform

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Five Technology Trends for Improved Business Intelligence Performance

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

Applications for Big Data Analytics

UNIFY YOUR (BIG) DATA

Integrating SAP and non-sap data for comprehensive Business Intelligence

OWB Users, Enter The New ODI World

In-memory computing with SAP HANA

White Paper. Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

Extend your analytic capabilities with SAP Predictive Analysis

Exploring the Synergistic Relationships Between BPC, BW and HANA

Big Data Integration: A Buyer's Guide

Workday Big Data Analytics

GigaSpaces Real-Time Analytics for Big Data

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

W o r l d w i d e B u s i n e s s A n a l y t i c s S o f t w a r e F o r e c a s t a n d V e n d o r S h a r e s

White Paper: Datameer s User-Focused Big Data Solutions

Understanding the Value of In-Memory in the IT Landscape

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Transcription:

Composite Software Data Virtualization Turbocharge Analytics with Big Data and Data Virtualization Composite Software, Inc. June 2011

TABLE OF CONTENTS INTRODUCTION... 3 PROBLEM ANALYTICS PUSH THE LIMITS OF TRADITIONAL DATA MANAGEMENT... 4 ANALYTICS ARE RED HOT... 4 BETTER ANALYTICS EQUALS BETTER PERFORMANCE... 4 ANALYTICS AND BIG DATA STORAGE ARE INEXORABLY LINKED... 4 ANALYTICS AND BIG DATA CHALLENGES ARE EXTREME... 4 BIG DATA INTEGRATION IS A BARRIER TO DELIVERING ANALYTICS VALUE... 5 DATA VIRTUALIZATION SOLVES BIG DATA INTEGRATION PROBLEMS... 6 THE ANSWER TO TOO MANY DATA SILOS IS NOT MORE DATA SILOS... 6 DATA VIRTUALIZATION TURBOCHARGES BIG DATA ANALYTICS... 6 DATA VIRTUALIZATION DELIVERS SIGNIFICANT BUSINESS AND IT BENEFITS... 7 USING DATA VIRTUALIZATION TO INTEGRATE THREE MAJOR CLASSES OF BIG DATA... 8 THE ANALYTIC DOMAIN IS ENTERPRISE DATA PLUS BIG DATA... 8 MASSIVELY PARALLEL PROCESSING-BASED ANALYTICAL DATA STORES... 8 HADOOP... 9 OTHER BIG DATA TYPES... 10 BIG DATA ANALYTICS AND DATA VIRTUALIZATION IN ACTION AT A WEALTH MANAGEMENT FIRM... 11 BUSINESS BACKGROUND... 11 BIG DATA ANALYTICS BACKGROUND... 11 THE DATA INTEGRATION PROBLEM... 11 DATA INTEGRATION ALTERNATIVES CONSIDERED... 11 COMPOSITE DATA VIRTUALIZATION SOLUTION... 12 BENEFITS ACHIEVED... 12 CONCLUSION... 13 Composite Software 2

INTRODUCTION In the best-selling book Competing on Analytics: The New Science of Winning, authors Thomas H. Davenport and Jeanne G. Harris found a striking relationship between the use of analytics and business performance High performers (those who outperformed their industry in terms of profit, shareholder return and revenue growth) were 50 percent more likely to use analytics strategically and five times as likely as low performers. Enterprises and government agencies seek to increase profitability, streamline operations, improve customer retention, extend product lines and reduce risk through analytics. However, traditional data integration approaches slow analytics adoption and constrain the ability to achieve these objectives. The Composite Data Virtualization Platform provides an agile, high performance data integration approach that overcomes data complexity and disparate silos to provide analytics with both the Big Data and enterprise data needed to outperform the competition. This white paper introduces the big data analytics opportunity, big data integration challenges, and the data virtualization solutions that Composite Software delivers to successfully address these challenges. Composite Software 3

PROBLEM ANALYTICS PUSH THE LIMITS OF TRADITIONAL DATA MANAGEMENT Analytics Are Red Hot Analytics are the fastest growing segment in the business intelligence software industry. According to IBM, 85% of US corporations plan to implement predictive analytics in the next five years. So if you are not already developing analytics solutions, you soon will be. Enterprises and government agencies seeking to increase profitability, streamline operations, improve customer retention, extend product lines and reduce risk through analytics are aggressively pursuing new analytic approaches. Analytics opportunities are abundant, including: Pricing optimization; Sales and inventory forecasting; Customer churn prevention; Marketing campaign optimization; Fraud detection; Supply chain management; and Many more. Better Analytics Equals Better Performance In the best-selling book Competing on Analytics: The New Science of Winning, authors Thomas H. Davenport and Jeanne G. Harris found a striking relationship between the use of analytics and business performance High performers (those who outperformed their industry in terms of profit, shareholder return and revenue growth) were 50 percent more likely to use analytics strategically and five times as likely as low performers. Analytics and Big Data Storage Are Inexorably Linked Analytics require data, the more the better. Standalone or integrated with existing enterprise data, a number of new, extremely high volume data sources such as web clicks, call detail records, log files and more provide fresh, new opportunities to apply analytics. To support these new use cases, analytical data sources including EMC Greenplum, HP Vertica, IBM Netezza, ParAccel, SAP Sybase IQ and more have changed where and how analytics are performed, tying the data storage approach and the analytics closer together. Similarly, Hadoop, along with MapReduce, has revolutionized how and where analysis is performed and data is stored. Analytics and Big Data Challenges Are Extreme On April 7, 2011, Gartner published a seminar research report entitled 'Big Data' Is Only the Beginning of Extreme Information Management. In this report they point out that the term "big data" overemphasizes volume, while underemphasizing other important extreme aspects of information management today. Composite Software 4

In the report, they identify twelve aspects of extreme information management with volume being just one of four quantifiable factors. Other quantifiable factors include: Velocity of data streams, access demands and record creation; Variety of data formats; and Complexity of individual data types. Brian Hopkins of Forrester recently addressed this bigger than big issue in his Forrester Blog, Blogging From the IBM Big Data Symposium - Big Is More Than Just Big. Quoting from Brian s blog, The term Big Data is a misnomer and it is causing some confusion. Several of us here at Forrester have been saying for a while that it is about the four V s" of data at extreme scale volume, velocity, variety and variability. In his May 19, 2011 report, Big Opportunities in Big Data, Hopkins goes further when addressing the challenges of managing big data. Hopkins states, The term big data processing refers to tools and techniques that handle certain types of data extreme volumes, high velocity, in a variety of formats, and with a variability of meaning beyond the capability of existing, mature data management technologies. Big Data Integration Is a Barrier to Delivering Analytics Value While the value of analytics is unquestioned, data integration issues often slow analytics adoption and thus delay these benefits. Big data integration is a difficult barrier for three reasons: Data silo and complexity challenge Effective analytics applications leverage data from multiple internal and external sources, including relational, semi-structured XML, dimensional MDX, and the new Big Data data types such as Hadoop and analytic appliances. Query performance challenge Large volumes of data must be analyzed, making query performance a critical success factor. Agility challenge Dynamic businesses require new and ever changing analyses. This means new data sources must be brought on board quickly and existing sources must be modified to support each new analytic requirement. Composite Software 5

DATA VIRTUALIZATION SOLVES BIG DATA INTEGRATION PROBLEMS The Answer to Too Many Data Silos Is not More Data Silos New fit for purpose analytics appliances are proliferating. These new big data sources can be integrated with existing enterprise sources in several ways. The traditional data consolidation approach where data is extracted from original sources and loaded onto an analytics data store of some nature remains valid as a core approach. However, what happens when you need to integrate across these new analytical silos to perform a wider, more far-reaching analysis? For example, if you are trying to analyze marketing campaign effectiveness, your overall analysis requires analytics data from multiple existing data repositories including: Web site clicks analysis in Hadoop; Email campaign metrics analysis in Unica; Nurture marketing analysis in Manticore; Lead and opportunity analysis in salesforce.com; and Revenue analysis in SAP BW. Does it make sense to create yet another silo that physically consolidates these existing diverse data silos? Or is it better to federate these silos using data virtualization instead? Data Virtualization Turbocharges Big Data Analytics Data virtualization is an agile, high performance data integration approach that overcomes data complexity and disparate silos to provide both the Big Data and enterprise data today s complex analytics require. Composite integrates your existing enterprise data with all the major types of Big Data including: Massively Parallel Processing-based Analytic Data Stores Examples include EMC Greenplum, HP Vertica, IBM Netezza, SAP Sybase IQ, and more Columnar/tabular NoSQL Data Stores Examples include Hadoop, Hypertable, and more XML Document Data Stores Examples include CouchDB, MarkLogic, and MongoDB, and more Key/value Data Stores Examples include Cassandra, Memcached, Voldemort, and more Composite Software 6

Data Virtualization Delivers Significant Business and IT Benefits Using the Composite Data Virtualization Platform to turbocharge analytics has numerous benefits including: Query Optimization for Timely Business Insight Composite s query optimization algorithms and techniques are the fastest in the industry, delivering the timely information your analytics require. Data Federation Provides the Complete Picture Composite s data federation virtually integrates your data in memory to provide the complete picture without the cost and overhead of physical data consolidation. Data Discovery Addresses Data Proliferation Composite s unique-in-the-industry data discovery automates entity and relationship identification and accelerates data modeling so your analysts can better understand and leverage your distributed data assets. Data Abstraction Simplifies Complex Data Composite s powerful data abstraction tools simplify your complex data, transforming it from native structures to common semantics for easier consumption. Data Access, Caching and Delivery Improves Data Availability Composite s flexible standards-based data access, caching and delivery options support your diverse analytic solutions. Data Governance Maximizes Control Composite s data governance ensures data security, data quality and 7x24 operations to maximize control. Layered Data Virtualization Architecture Enables Rapid Change Composite s loosely-coupled data virtualization layer architecture and rapid development tools provide the agility required to keep pace with your ever-changing analytic needs. Composite Software 7

USING DATA VIRTUALIZATION TO INTEGRATE THREE MAJOR CLASSES OF BIG DATA The Analytic Domain Is Enterprise Data Plus Big Data The Composite Data Virtualization Platform is an optimal solution to integrate enterprise data and big dataand thereby improve analysis and business insight. This is especially true for the following three major classes of big data: Massively Parallel Processing-based Analytical Data Stores; Hadoop; and Other Big Data Types. The Composite Data Virtualization Platform provides a complete development and runtime environment for discovering, accessing, federating, abstracting and delivering data from these diverse sources. Access is typically done via standards-based protocols and APIs, for example JDBC and ODBC for SQL-based sources, HTTP and SOAP for Web services, JMS for messages, APIs for enterprise and cloud-based applications. Through these methods, source data is securely exposed from a single virtual location, regardless of how and where it is physically stored. Additional specifics on how the Composite Data Virtualization Platform integrates each of these big data classes are addressed below. Massively Parallel Processing-based Analytical Data Stores EMC Greenplum, HP Vertica, IBM Netezza, Oracle Exadata, SAP Sybase IQ, Teradata Aster Data, Teradata EDW are but a few of the large scale analytical data stores in the market today. Because the data volumes stored in these appliances are so high, query performance is a key consideration when integrating this valuable data with the rest of the enterprise. Composite Software provides the widest set of optimized MPP source adopters in the data virtualization market. Unlike other data virtualization products that only connect to the source using simple metadata (schema) mapping, Composite s PerformancePlus Data Adapters intelligently evaluate and leverage underlying data source capabilities to ensure optimal federated query performance. Key features include: Interactive Metadata (Schema) Mapping Enable fast and accurate modeling; Standard SQL to Vendor-specific SQL Resolution Ensure precise SQL translation and execution; Statistical Analysis and Cardinality Estimation Accumulate critical metrics to be utilized by cost-based optimizer; Capability Introspection and Coordination Determine configurations, functionality, and parameters required to enable optimal performance; and Vendor-specific Engineered Functions Supercharge performance beyond vendor s standard capabilities. For example, to further optimize IBM Netezza queries, Composite Composite Software 8

teamed with Netezza s engineers to provide and certify several advanced optimizations such as the Data-Ship-Join. See the Composite Data Virtualization Platform Data Sheet for a complete listing of MPP sources supported. Hadoop Hadoop is fast emerging as a leading repository for big data analytics. However, the MapReduce language used to interact with Hadoop data sources is not well understood in typical enterprise IT organizations. This may not be a problem when performing specialized analytics but it can be a big barrier when trying to combine Hadoop and enterprise data using enterprise IT standard languages such as SQL. The Composite Data Virtualization Platform overcomes the query language challenge by integrating and extending Hive and thus provides a unified SQL based approach for querying both enterprise and Hadoop data sources. In practice, developers build views in Composite using SQL that include both enterprise and Hadoop data sources. At runtime, Composite submits SQL queries to Hadoop via Hive. If the result set already exists, Hive returns the data directly. If the result set requires reduction, then Hive executes the appropriate MapReduce functions before returning the data to Composite. Composite Software 9

Other Big Data Types A number of additional big data types exist. These have evolved to address specific information needs. At the same time each provides data access methods appropriate to their data structure and syntax. These include: Tabular / Columnar Data Stores Storing sparse tabular data, these stores look most like traditional tabular databases. Examples include Hadoop/HBase (Yahoo!), BigTable (Google), Hypertable and VoltDB. Their primary data retrieval paradigm utilizes column filters, generally leveraging hand-coded MapReduce algorithms. Document Stores These data sources store unstructured (i.e., text) or semistructured (i.e., XML) documents. Examples include MongoDB, MarkLogic and CouchDB. Their data retrieval paradigm varies highly, but documents can always be retrieved by unique handle. XML data sources leverage XQuery. Text documents are indexed, facilitating keyword search-like retrieval. Graph Databases These sources store graph-oriented data with nodes, edges, and properties and are commonly used to store associations in social networks. Examples include Neo4J, AllegroGraph and FlockDB. Data retrieval focuses on retrieving associations from a particular node. Key/Value Stores These sources store simple key/value pairs like a traditional hashtable. They are further subdivided into in-memory and disk-based solutions. Examples include Memcached, Cassandra (Facebook), SimpleDB, Dynamo (Amazon), Voldemort (Linked-In) and Kyoto Cabinet. Their data retrieval paradigm is simple: given a key, return the value. Some offer more complex querying mechanisms that can look inside the value, but normally the value is considered opaque. Object and Multi-value Databases Object databases store objects (as in objectoriented programming). Multi-value databases store tabular data, but individual cells can store multiple values. Examples include Objectivity, GemStone and Unidata. Proprietary query languages are used to retrieve data. While these big data access approaches vary, all provide some sort of Java-based development API appropriate for accessing their big data type. The Composite Data Virtualization Platform uses these APIs as well as Composite s Custom Java Procedure (CJP) resource and Adapter SDK to access and integrate these sources. Three kinds of NOSQL systems are a particularly natural fit for this integration approach. These include Tabular/Columnar Data Stores, XML Document Stores, and Key/Value Stores. A more detailed integration approach for each of these is outlined in the Composite Data Virtualization and NoSQL Data Stores White Paper. Composite Software 10

BIG DATA ANALYTICS AND DATA VIRTUALIZATION IN ACTION AT A WEALTH MANAGEMENT FIRM Business Background With over $130 Billion in assets under management, this global wealth management firm is one of the largest in the United States providing a range of asset management, retirement plan and mutual fund offerings. Big Data Analytics Background In Financial Services, upsell and cross-sell campaigns can generate 50% or more of a firm s growth. Therefore, every new marketing campaign, from Wall Street Journal ads and CNBC commercials to monthly statement inserts, must be fully exploited in order to achieve its maximum potential. Their marketing campaigns crossed multiple sales and marketing information silos, including traditional enterprise sources as well as big data and cloud sources. With this diversity, data integration was slow and difficult. Campaign results were rarely up-to-date or complete. Upsell and cross-sell opportunities were being missed and thus marketing campaign spends were being wasted. The Data Integration Problem SAS analytical tools were the tools of choice for campaign management analysis. Analysis requirements ranged from retrospective analysis of historical campaigns, real-time analysis of on-going campaigns and predictive analysis of future campaigns. These analytics required data from multiple sources including: Web Analytics (Big Data); DST HiPortfolio Trades and Asset Management (Cloud); Salesforce.com Customer Master (Cloud); Salesforce.com Sales Force Automation (Cloud); Oracle Siebel CRM (Oracle); Investment Account Master (Oracle). IBM Unica Campaign Management System (SQL Server); and StrongMail Email Marketing (SQL Server); Data Integration Alternatives Considered The Firm considered consolidating this data in a unified data warehouse. However, the extra replication processing and storage costs would be too high. Lead times to set up new campaigns would be too long. And that approach would not support real-time campaign analysis. Composite Software 11

Composite Data Virtualization Solution Instead, the Firm implemented the Composite Data Virtualization Platform as the data layer across their diverse sources, enabling SAS to perform historical, real-time and predictive marketing campaign analysis. In advance of each new campaign, sales and marketing activity data from every source is modeled in Composite along with specific campaign associations. Then at any point before, during or after a campaign, marketing analysts and sales teams run SAS analytics. To SAS, Composite behaves as a unified (virtual rather than physical sales and marketing data warehouse. When called by SAS, Composite requests only the required data from the diverse sources and delivers it to SAS within seconds. Benefits Achieved The Firm achieved a number of benefits including: 1. Integrating all sales and marketing data sources for the first time allows them to understand the true impact of marketing campaigns; 2. Real-time data improved sales agent responsiveness during campaigns resulting in increased revenue; 3. Broader analysis also revealed the effectiveness of various marketing mix components which led to more impactful, yet cost effective future campaigns; 4. Easier integration set up allowed new campaigns to be brought on board faster, enabling quicker market responses; and 5. IT costs were significantly reduced because there is less data to replicate and maintain. Composite Software 12

CONCLUSION When done well, analytics provide a sustainable business advantage. But with today s extreme information management challenges including a wide variety, high velocity, complex and big or high volume data achieving these analytics benefits can be a difficult challenge. Data virtualization helps overcome these complexity challenges and fulfills critical analytic data needs significantly faster with far fewer resources than other data integration techniques. In this paper, analytics and big data opportunities were identified so you can understand the business value and technology trends driving this accelerating technology market. Composite Software s data virtualization approach to big data integration was described, along with the specifics of integrating MPP analytical data stores, Hadoop and other big data sources. With this foundation you can understand specific ways to apply data virtualization to help achieve your analytics objectives. Finally the campaign analysis use case from a leading investment manager provides a tangible example you can use to cement your new knowledge. If your enterprise is facing similar big data analytic opportunities and data integration challenges, consider Composite Software, the gold standard in data virtualization. Composite Software 13

ABOUT COMPOSITE SOFTWARE Composite Software, Inc. is the data virtualization performance leader. Backed by a decade of pioneering R&D, Composite Software is the data virtualization gold standard at 10 of the top 20 banks, six of the top 10 pharmaceutical companies, four of the top five energy firms, major communications providers and the world s largest IT organization, the US Army. These and hundreds of other global organizations rely on Composite Software to fulfill their ever-changing information requirements with greater agility and lower costs. Composite Software is a registered trademark of Composite Software, Inc. Copyright Composite Software, Inc. 2011. 2655 Campus Drive, Suite 200 T / 650.227.8200 info@compositesw.com San Mateo, CA 94403 F / 650.227.8199 www.compositesw.com 2011 Composite Software, Inc. All rights reserved.