The Next Wave of Data Management. Is Big Data The New Normal?



Similar documents
How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Luncheon Webinar Series May 13, 2013

Big data for the Masses The Unique Challenge of Big Data Integration

Talend Big Data. Delivering instant value from all your data. Talend

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

The Future of Data Management with Hadoop and the Enterprise Data Hub

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data Realities Hadoop in the Enterprise Architecture

The Future of Data Management

THE JOURNEY TO A DATA LAKE

How To Handle Big Data With A Data Scientist

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

A Modern Data Architecture with Apache Hadoop

HDP Enabling the Modern Data Architecture

HDP Hadoop From concept to deployment.

Big Data and Apache Hadoop Adoption:

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

What to Look for When Selecting a Master Data Management Solution

Bringing Together ESB and Big Data

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Independent process platform

Data Integration Checklist

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Data Refinery with Big Data Aspects

Big Data on Microsoft Platform

Comprehensive Analytics on the Hortonworks Data Platform

Washington State s Use of the IBM Data Governance Unified Process Best Practices

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA-AS-A-SERVICE

#TalendSandbox for Big Data

Transforming the Telecoms Business using Big Data and Analytics

Big Data Analytics: Today's Gold Rush November 20, 2013

Big Data Integration: A Buyer's Guide

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

The 3 questions to ask yourself about BIG DATA

Customized Report- Big Data

BIG DATA TRENDS AND TECHNOLOGIES

So What s the Big Deal?

We are Big Data A Sonian Whitepaper

Beyond the Single View with IBM InfoSphere

DEVELOP INSIGHT DRIVEN CUSTOMER EXPERIENCES USING BIG DATA AND ADAVANCED ANALYTICS

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

White Paper: What You Need To Know About Hadoop

Informatica and the Vibe Virtual Data Machine

Apache Hadoop: The Big Data Refinery

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

GigaSpaces Real-Time Analytics for Big Data

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Cloudera Enterprise Data Hub in Telecom:

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Apache Hadoop Patterns of Use

Tap into Hadoop and Other No SQL Sources

Capital Market Day 2015

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Data processing goes big

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Generating the Business Value of Big Data:

Smarter Analytics. Barbara Cain. Driving Value from Big Data

Protecting Big Data Data Protection Solutions for the Business Data Lake

BIRT in the World of Big Data

Hadoop Big Data for Processing Data and Performing Workload

The Rise of Industrial Big Data

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Big Data for Banking. Kaleem Chaudhry Senior Director, Sales Consulting, ASEAN. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

A financial software company

Informatica Data Quality Product Family

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Evolution from Big Data to Smart Data

Wikibon Big Data Analytics Adoption Survey, Frequency Analysis

Deploying Big Data to the Cloud: Roadmap for Success

Big Data Zurich, November 23. September 2011

Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC

Microsoft SQL Server 2012 with Hadoop

Industry Impact of Big Data in the Cloud: An IBM Perspective

Foundations of Business Intelligence: Databases and Information Management

Big Data: Are You Ready? Kevin Lancaster

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Using Tableau Software with Hortonworks Data Platform

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Microsoft Big Data. Solution Brief

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Data Modeling for Big Data

Are You Ready for Big Data?

Big Data and Your Data Warehouse Philip Russom

Big Data Readiness. A QuantUniversity Whitepaper. 5 things to know before embarking on your first Big Data project

Modernizing Your Data Warehouse for Hadoop

Understanding traffic flow

Transcription:

The Next Wave of Data Management Is Big Data The New Normal?

Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management Technologies 5 The Technology Maturity Cycle 6 IT Challenges Meeting New Data Demands 7 Managing Increasing Data Volume and Variety 7 Ensuring Data Governance and Data Quality 8 Strategies For Managing Evolving Big (and Enterprise) Data 10 A Total Data Management Approach For All Data 10 Benefits of Total Data Management for All Your Data 11 Talend Total Data Management 12 About Talend 13 Contact Us 13

Introduction Much has been written about the promise of big data. With the growth of corporate data, it is clear that big data may become the new normal. In analytics, for example, companies are able to make more informed decisions by analyzing bigger and more diverse sets of data. A recent Gartner report states that through 2015, organizations integrating high value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20% (Source). However, this new data management journey will bring many instances of failed big data experiments requiring rearchitecture in a few years. Effectively incorporating big data technologies like Hadoop and NoSQL into your current information architecture is not a simple exercise. This paper reviews the business drivers that are impacting this new wave of data management, highlights the new tools companies have for consideration in their next generation data architecture, and outlines a total data management blueprint for consideration in your information strategy. Separating Reality and Hype Is there anything big data cannot solve? Is big data overhyped? Well, many would say yes as big data is being used pervasively in vendor presentations to adorn company information strategies. Big data use cases are ranging from marketing campaign analysis and recommendation engines to predictive analytics, sentiment analysis, and fraud detection. Some were even proposing the use of big data to predict the next Pope as leader of the Catholic Church. What s next you might ask? Well, we are at the tip of the big data iceberg in finding all the value big data can deliver. What is un deniable, is that the digital universe is rapidly growing. For example: The total worldwide volume of data is growing at 59% per year, with the number of files growing at 88% per year. (Source) IDC Estimates that by 2020, business transactions on the internet business to business and business to consumer will reach 450 billion per day. (Source) Akamai analyzes 75 million events per day to better target advertisements. (Source) 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 3

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. (Source) More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. (Source) 100 terabytes of data uploaded daily to Facebook. (Source) Why Are Firms Making IT Investments In Big Data? A recent survey by Talend highlights the top business drivers for big data with increasing the accuracy and depth of predictive analytics being the primary reason. With so much unstructured data not being analyzed, firms recognize the value in using big data. What are the business drivers for big data in your organization? (multiple responses, n=95) 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 4

Trends In Data Management Technologies Whether big data is in production, in or on the horizon for your organization, there are considerations in several areas of data management that require strategy and planning: Real time Data Processing the requirement to have information on demand or in real time is increasing, as firms do not want to make decisions with last month s or even yesterday s information. Data Services to address the demand for information accessibility and availability from many consumers, data management features such as data integration and data quality are becoming available as on premise or cloud based services. Open Source Data Management to overcome the huge investment typically made in enterprise software, open source makes data management fiscally possible for smaller organizations and improves total cost of ownership for big ones. Data Governance an evolving approach to set forth policies and procedures for the use, access and management of data. Particularly important as business users play an everincreasing role in managing data and budgeting. Big Data and NoSQL providing a bridge between familiar legacy database technologies and the demands of rapidly storing, retrieving, processing and analyzing big data. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 5

The Technology Maturity Cycle There is a typical maturity cycle as these new technologies are introduced. First it needs to integrate with existing systems, next it needs to become hardened to address scalability, reliability and security concerns, then it may incorporate related technologies including software lifecycle management features like project management and testing, or in the example of data management, capabilities to expose a function as a consumable service. The technology maturity timeline may look like: Each one of the aforementioned trends is going through the hype to productivity timeline offering greater benefits to end users. If we look at big data (Hadoop) for example, it started as an implementation of MapReduce and has been extended into a massive operating system for distributed parallel processing of huge amounts of data. Other technologies have been added including Hbase, an open source database, HCatalog for meta data, Apache Hive as a data warehouse infrastructure, and Apache Pig as a programming language. However, Hadoop by itself, although well equipped for storing and processing large amounts of information across nodes, does not yet include software lifecycle management functionality, such as project management or related data management functionality such as data integration, metadata management, governance, profiling, cleansing and matching. Data governance is particularly important for successful firm wide adoption, since it incorporates business processes, organizational roles and best practices. Is big data a replacement for a data warehouse? A European telecommunications provider needs to store all call transactions in a database for 2 years to comply with regulations. One alternative is to create a large data warehouse to house this information; however, a less expensive and faster performing alternative is to install a Hadoop cluster or a NoSQL database to store the information. Sometimes called data hoarding, many companies are using this strategy as a means to store information for subsequent processing. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 6

IT Challenges Meeting New Data Demands To meet the new business demands and the appetite to process and analyze more information, IT looks to modernize their data management processes. Conventional data management tools fail when trying to integrate, search and analyze very large datasets. In the past, adding hardware and software was a solution to make information available and scalable, but that is no longer the case. Information needs to be available 24x7 across a variety of channels from web to mobile devices. Managing Increasing Data Volume and Variety Much has been written (including this paper so far) about the increasing volume of growing data. IT needs to be concerned since the projected growth of global data generated per year is increasing at 40%; however IT budgets are increasing only at 5% (Source). But what is of greater concern is the increasing variety of data, or the information that is not in a structured database or data warehouse, i.e. content management repositories, streaming data (audio, video), images, blogs, customer comment forums, twitter, network sensors, transactional data and more. By 2015, IDC estimates that 90% of data in the digital universe will be unstructured (Source). It is clear that firms who focus on volume alone and not data characteristics like variety will need to make additional investments. The following survey from Forrester highlights the issue in trying to perform better customer analytics a task that involves integrating many sources of structured and unstructured data. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 7

Source: Forrester Webinar with Talend Ensuring Data Governance and Data Quality Depending on the goal of a big data project, poor data quality can have a big impact on effectiveness. It can be argued that inconsistent or invalid data could have exponential impact on analysis in the big data world. If a record enters a system as duplicate, incomplete or incorrect, many other systems are impacted. As analysis on big data grows, so too will the need for validation, standardization, enrichment and resolution of data. Even identification of linkages can be considered a data quality issue that needs to be resolved for big data. As with any corporate data management discipline big data will eventually need to comply with established corporate standards and accepted project management norms for organizational, deployment and sharing of project artifacts. Not too many years ago the terms terabyte (or 10 12 bytes) and petabyte (or 10 15 bytes) were seen as very large. A RFP (requirement for proposal) for a data warehouse may have been to reliably handle a petabyte of information. Now terms such as exabyte (or 10 18 bytes) and zettabyte (or 10 21 bytes) may soon be the norm. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 8

In a recent Gartner report, they state that through 2015, organizations integrating high value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%. However, through 2015, 85% of enterprises will fail to adapt their information infrastructure to "big data," socially mediated content and new connected devices. This means that all systems, even the largest integration platform in IT (the data warehouse), will be overwhelmed in the next three to four years (Source). 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 9

Strategies For Managing Evolving Big (and Enterprise) Data Many companies have built their information architecture department by department to solve specific data management problems. Enterprise data or the data in business systems such as CRM and ERP applications are then linked together by data integration and data quality tools into a data warehouse for further processing, analysis, viewing and management. This is an example of total data management for enterprise data. With the requirement to now include vast amounts of unstructured data for analysis (2 to 10 times structured data), one could take an approach to create a separate big data architecture as another silo. Unfortunately, many of the existing data vendors prescribe this approach since their tools cannot be adapted to support total data management across structured and unstructured data. A problem is created that you are building an information architecture where: there is a lot of duplication across silos with limited reuse; there is inconsistent data governance and data quality practices across the company; and there exists high costs for operations, maintenance and software licenses. Big data should be treated like non big data, i.e. you need to store it, you need to cleanse and enrich it, and you need to use it, which could be retrieving it for a query or analyzing it in a system. The more your systems are integrated, the bigger the potential for data quality and data governance issues. Big data management is not just about deploying a Hadoop cluster, deploying data into a Hadoop file system across distributed nodes and then MapReduce processing. It is making sure your entire information environment incorporates tools to not only handle the increasing volume, velocity, variety and complexity of data, but also provides the software lifecycle management, service enablement and data quality functions as well as data governance procedures. A Total Data Management Approach For All Data Instead of managing silos of structured, unstructured and semi structured data and linking together (bottom up), a recommended approach is to think of managing data holistically (topdown). Your company needs to manage big data just as it does enterprise data, and lets not forget about small data such as spreadsheets. The data can be located on premise, in the cloud, and in a SaaS app. It can be transactional in flight data or clickstream data from your website. You need to profile all of the data, you need to deduplicate and match data, and you need to service enable it 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 10

for consumption. And often information needs to be processed and available in real time. Total data management is applying the same best practices and tools to all of your data. Fortunately for consumers today, there are good alternatives to create this total data management approach that will not break the bank. Most data management functions have become commodity items, so the buyer can cost efficiently achieve total data management for both structured and un structured data. Benefits of Total Data Management for All Your Data With a total data management approach for all your data, firms can expect to see the following benefits: Improved Data Quality by profiling, matching, deduping and monitoring all your data, you can improve the completeness, accuracy and integrity of your data and therefore the value it will provide to the organization. Improved Accountability having set data governance processes in place for all your data, firms can better measure the use of data, ensure the appropriate access, and set internal policies and procedures. Lower Costs with a decrease in the amount of silos created there will be a decrease in the amount of duplicate processes. With data services reuse, projects can be completed quicker at a lower cost. Firms can also leverage commodity data management tools for most data management functions. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 11

Talend Total Data Management Talend provides a unified platform for managing data, no matter if data resides in text files, enterprise databases, spreadmarts or big data clusters like Hadoop. The product set incorporates tools for all your data management needs including: big data integration, data quality and data governance, master data management, business process management and application integration. Talend recognizes that data management is about supporting a wide range of data needs, regardless of scale. Users need not learn to use different environments to manage big data and enterprise data. This saves on infrastructure costs, licensing, user training, project development time and the wide range of costs associated with managing different data scales. For example to integrate big data sources, you use Talend data management graphical tools to integrate big data and NoSQL sources as well as all your enterprise data and small data, and then generate code in Hive, Pig and other native code bases. With big data quality that takes advantage of the massively parallel environment of Hadoop, you can improve the completeness, accuracy and integrity of data as well as remove duplicates. Talend provides big data governance through a simple, intuitive environment for implementing and deploying a big data program with the ability to schedule, monitor and deploy any big data job. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 12

About Talend Talend provides integration that truly scales. From small projects to enterprise wide implementations, Talend s highly scalable data, application and business process integration platform maximizes the value of an organization s information assets and optimizes return on investment through a usage based subscription model. Ready for big data environments, Talend s flexible architecture easily adapts to future IT platforms. And a common set of easy to use tools implemented across all Talend products enable teams to scale developer skillsets, too. More than 1,800 active subscribers worldwide leverage Talend s solutions and services. The company has major offices in North America, Europe and Asia, and a global network of technical and services partners. For more information, please visit www.talend.com. Contact Us Contact info goes here: www.talend.com/contact info@talend.com partners@talend.com sales@talend.com 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Page 13