22 SMARTENTERPRISEMAG.COM



Similar documents
Are You Ready for Big Data?

Are You Ready for Big Data?

The Next Wave of Data Management. Is Big Data The New Normal?

BIG DATA TRENDS AND TECHNOLOGIES

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

So What s the Big Deal?

White Paper: What You Need To Know About Hadoop

BIG DATA TECHNOLOGY. Hadoop Ecosystem

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Apache Hadoop: The Big Data Refinery

Transforming the Telecoms Business using Big Data and Analytics

Hadoop. Sunday, November 25, 12

The Future of Data Management

Hadoop Big Data for Processing Data and Performing Workload

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

How To Scale Out Of A Nosql Database

Getting Started Practical Input For Your Roadmap

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Chapter 7. Using Hadoop Cluster and MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Big Data Integration: A Buyer's Guide

ANALYTICS BUILT FOR INTERNET OF THINGS

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Generating the Business Value of Big Data:

Tap into Hadoop and Other No SQL Sources

CIO Guide How to Use Hadoop with Your SAP Software Landscape

Ten common Hadoopable Problems Real-World Hadoop Use Cases WHITE PAPER

Big Data Zurich, November 23. September 2011

Big Data and Apache Hadoop Adoption:

Native Connectivity to Big Data Sources in MSTR 10

Large scale processing using Hadoop. Ján Vaňo

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

The Future of Data Management with Hadoop and the Enterprise Data Hub

Manifest for Big Data Pig, Hive & Jaql

How To Make Data Streaming A Real Time Intelligence

BIRT in the World of Big Data

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Microsoft Big Data. Solution Brief

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

White Paper: Hadoop for Intelligence Analysis

Sources: Summary Data is exploding in volume, variety and velocity timely

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Hadoop implementation of MapReduce computational model. Ján Vaňo

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

Why Big Data Analytics?

How To Handle Big Data With A Data Scientist

Hadoop for Enterprises:

Big Data, Big Traffic. And the WAN

Ten Common Hadoopable Problems

Expert Reference Series of White Papers. Ten Common Hadoopable Problems.

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Big Data. Lyle Ungar, University of Pennsylvania

White. Paper. Big Data Advisory Service. September, 2011

Big data for the Masses The Unique Challenge of Big Data Integration

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data. Fast Forward. Putting data to productive use

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Deploying Big Data to the Cloud: Roadmap for Success

Ubuntu and Hadoop: the perfect match

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Testing 3Vs (Volume, Variety and Velocity) of Big Data

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Chapter 1. Contrasting traditional and visual analytics approaches

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

How To Get More Data From Your Computer

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

There s no way around it: learning about Big Data means

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big data and its transformational effects

Banking On A Customer-Centric Approach To Data

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

White Paper: Datameer s User-Focused Big Data Solutions

BIG DATA-AS-A-SERVICE

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

BIG DATA FUNDAMENTALS

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Open source Google-style large scale data analysis with Hadoop

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Industry Impact of Big Data in the Cloud: An IBM Perspective

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

The New Normal: Get Ready for the Era of Extreme Information Management. John Mancini President, DigitalLandfill.

Bringing Big Data into the Enterprise

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA What it is and how to use?

Interactive data analytics drive insights

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Transcription:

22 SMARTENTERPRISEMAG.COM

Smart Strategies BIG DATA, Big Innovation Smart CIOs are mining their organizations huge data stores for insights that lead to business innovation. By Tom Farre ILLUSTRATION: BRAD HAMMAN is the new oil goes an analogy making the rounds. Just as oil powered a good portion of the 20th century Data economy, data is driving business innovation and efficiency in today s 21st century enterprise. This naturally turns the discussion to big data the terabyte and now petabyte-class data stores that companies are accumulating from their internal systems, social media and the Web, external sources and a new wave of machine sensors. The more data you have, it would seem, the more potential for innovation. Yet there s a twist to the data is oil analogy, according to Peter Hinssen, Chairman of consulting firm Across Technology and the author of two books on business and technology. Hinssen notes that in the early 20th century, John D. Rockefeller became the world s richest man not by amassing the most oil from the Earth s wells, but by controlling the oil-refinery process through his company, Standard Oil. Leveraging big data is not about having the most data or the most capacity, Hinssen says, but about refining that data, turning it into insight at the speed dictated by the market. Indeed, big data is poised to usher in a transformative era in which business decisions are informed more by data, analysis and scientific testing than by opinion and intuition. Enterprises can harness data analytics for sustainable competitive advantage. Done correctly, data analytics empowers employees at all levels with information that helps them make smarter decisions. Analytics increases corporate intelligence, says Wayne Eckerson, Director of Research at TechTarget, an IT content provider. That s something you can never package or systematize and that competitors can t duplicate. Big data analytics is still in its infancy. For competitive reasons, successful use cases, let alone concrete ROI examples, are still hard to come by; companies are keeping their successes close to the vest. 2012 SMART ENTERPRISE 23

BIG DATA By the Numbers Yet some successes are beginning to be reported: n Internet firms such as Google and Yahoo! were among the first to leverage Web log data to personalize search, ad and product recommendations to enhance the customer experience. They, along with companies such as Capital One, use big data to perform thousands of rigorous tests each year, experimenting to improve products and create new business models and revenue streams. n Enterprises are analyzing log files and sensor data to optimize the performance of data centers, drilling operations, trucking routes, manufacturing lines and other processes. n Consumer-products giants analyze point-of-sale and other data in real time to forecast demand and fine-tune their promotional strategies. n Large financial firms analyze historical data to identify patterns that indicate fraud, then make the results operational in their transaction systems. n Retailers are performing social-graph analyses to create maps of customers social circles, changing the idea of most-valuable customer from who buys the most to who is most influential. In commercial markets, big data is enabling CIOs and other IT executives to shift their focus from business-process automation to business optimization. That is, from doing things right to doing the right things, says Donald Ferguson, Executive VP and Chief Technology Officer at CA Technologies. Automation is about doing a predefined thing, such as processing a check, repeatedly and efficiently, Ferguson explains. It assumes you re doing the right thing. Big data analytics is less about processing the check and more about adding context to transactions, to decide what business to be in, how to treat each customer, and what products should be offered. Big data will transform IT into an engine of business innovation and optimization. Yet despite the enormous potential, many enterprises are still not getting big data right. Market watcher Gartner predicts that through 2015, fully 85 percent of all Fortune 500 companies will fail to effectively exploit big data for competitive advantage. Collecting and analyzing the data is not enough, a recent Gartner report states. Most organizations are ill prepared to address both the technical and management challenges. 1.8 ZETTABYTES: 2.7 BILLION: $16.9 BILLION: Big DataTackles Critical Business Issues Amount of information created and replicated as of last year. Number of daily likes/comments Revenue forecast from big data One zettabyte = 1 trillion gigabytes. To store a l this data requires 500 quadrillion files. posted on Facebook technology and services by 2015 FINANCIAL SERVICES: Better and deeper understanding of risk to avoid credit crisis TELECOMMUNICATIONS: More reliable networks where we can predict and prevent failures MEDIA: More content that is aligned with users preferences LIFE SCIENCES: Better targeted medicines with fewer complications and side effects RETAIL: A personal experience with products and offers that are just what consumers need GOVERNMENT: Citizen services that are based on hard data, not just intuition SOURCE: Cloudera, 2012 24 SMARTENTERPRISEMAG.COM

A look at big data itself helps explain why. Big data today can be defined by what some call the Three V s: volume, variety and velocity. Today, most of the emphasis is on volume, and for good reason. Until quite recently, reaching terabyte-class data marked a major milestone. But today it s difficult not to collect terabytes, petabytes and even more. Consumer-products maker Proctor & Gamble, for example, has developed a big data analytic environment that answers questions by analyzing and connecting as much as 200 terabytes of data. Similarly, Inflection LLC, a Web provider of information about people, not only has petabytes of data on disk, but also generates hundreds of gigabytes of operational data every day. Trying to analyze all that data with traditional storage and processing infrastructure would be far too slow and would also raise costs prohibitively. Big data s second V is variety, and it challenges conventional business intelligence (BI) approaches based on structured data in SQL-based relational databases. In fact, some of the most interesting big data today is semistructured or even unstructured. This data spans a wide range of classes and types, including clickstreams, blogs, text documents, SMS messages, social information, knowledge bases, census data, call logs, weather maps, GPS readings, machine data, satellite images, even audio and video files. Where traditional relational databases require schemas to be created in advance, big data storage must accept raw data as it arrives, without knowing the format or what gems might be contained. Velocity, the third V, poses challenges too. Ideally, big data is collected and analyzed in real time. As Hinssen says, There s no sense in knowing about an online prospect four milliseconds after he leaves your website, or about a retail customer four minutes after she leaves your store. It has to happen in real time, and that s a tremendous technical challenge. Even analytics that aren t needed instantly must be done faster than in traditional BI time frames. That s especially true with event-driven data, such as status updates and Twitter feeds. We ve got an entire engineering team dedicated to ensuring that our business analytics run in under 30 minutes, says Matthew Baird, Chief Technology Officer of Inflection. With big data, you just can t be fast enough. But CIOs who combine the three V s also gain complexity. Typically, that complexity is far greater than anything IT has had to reckon with in the past. Fortunately, a technical solution known as Apache Hadoop is gaining critical mass. Emerging from work done by major Internet firms, Hadoop is an open source, Java-based project. It provides a platform for large-scale, distributed processing for big data capture and analysis. The Apache Hadoop platform enables enterprises to store and process 10 times the data at 10 times the rate at the same level of investment, says Doug Cutting, creator of Hadoop and Chief Architect at Cloudera Inc., a Hadoop distribution provider. The platform accomplishes this through a complex mix: an infrastructure of distributed commodity servers, or nodes, using local disks for storage; the Hadoop Distributed File System (HDFS) for storing and retrieving structured and unstructured data; MapReduce, a compute layer for parallel processing of data on the servers; and other tools for programming, data organization and analytics. Hadoop is important to big data. Every page view at Yahoo!, for instance, is connected to several Hadoop applications. Yet it s still a version 1.0 technology. Because TOP 10 USES FOR HADOOP 1 2 3 4 5 6 7 8 9 10 RISK MODELING: How banks can better understand customers and markets CUSTOMER-CHURN ANALYSIS: Why companies really lose customers RECOMMENDATION ENGINE: How to predict customer preferences AD TARGETING: How to increase campaign efficiency POINT-OF-SALE TRANSACTION ANALYSIS: Targeting promotions to make customers buy PREDICTING NETWORK FAILURE: Using machinegenerated data to identify trouble spots THREAT ANALYSIS: Detecting threats and fraudulent activity TRADE SURVEILLANCE: Helping banks spot the rogue trader SEARCH QUALITY: Delivering more relevant search results to customers DATA SANDBOX: Exploring new ways to leverage data SOURCE: Cloudera, 2012 200 TERABYTES: 235 TERABYTES: Amount of data required to answer a question by Procter & Gamble s Business Sphere analytic technology SOURCE: Various Amount of data collected by the U.S. Library of Congress as of mid-2011

few applications and high-level languages sit atop the Hadoop stack, running jobs often requires Java programming and specialized expertise. The [shortage] of data scientists who understand this environment is one of the biggest complaints we hear, Eckerson of TechTarget notes. In addition, Hadoop management and monitoring tools still need improvement, including tools for cluster administration and internal processing. Big data also raises concerns about security and privacy. Issues of data loss, secure access and privacy can become muddled when data is combined, sifted, sorted and repurposed through big data analytics. I believe that over the next year or two, we re going to see one or more significant scandals from security breaches around big data, predicts Debra Danielson, Senior VP, Mergers & Acquisitions Strategy, and a Distinguished Engineer at CA Technologies. I expect that will lead to lot of expense and pain. Business-management issues can also limit success. Big data projects will be judged by their business results, experts say. The challenge here is not an IT challenge; it s about the effective use of information to drive business results, adds Greg Valdez, CIO at CA Technologies. When IT and the business are integrated in their thinking on how to achieve this, then creating value from big data is possible. These four steps can help launch business innovation with big data: Explore innovation: Because big data s potential benefits are so open-ended, consider using a research-based, experimental method. It s best to look at big data as a business project, an innovation project, advises Hinssen. You shouldn t put just technologists on it, but a multidisciplinary team of creative people with knowledge of business, innovation and the customer experience. Increase your technical depth: Data scientists and Hadoop experts are in high demand, and that situation is not expected to change anytime soon. So consider starting a training program for those who show potential. Also, attend industry conferences and learn from your peers whenever possible. Start a pilot project: With something as new as Hadoop, it s best to start small. Even Wal-Mart Stores started with a 10-node Hadoop cluster as proof of concept. Later, after the test produced positive results, the retailer expanded the project to 250 nodes. Similarly, many projects start by combining previously siloed data sets. Leverage the cloud: The elasticity and utility pricing of cloud computing is perfect for big data proof of concepts, and production implementation. If you asked me to pick the one application that is ideal for cloud computing, I d say big data analytics, says Ferguson of CA Technologies. I d first figure out how to do some basic computations and analysis using a mix of public and private clouds, and then go from there. Now is the time to get started with big data. As the technology matures and applications develop, the complexity and challenges will lessen but so will the opportunities. Early adopters of big data analytics will gain the experience and expertise that should lead to business innovation and a sustainable competitive edge. n A Glossary of Big Data Terms Apache Hadoop: Open source software framework for distributed processing of large data sets across clusters of computers using a simple programming model. HDFS: Short for Hadoop Distributed File System, it manages the retrieval and storage of data and metadata for computation. NoSQL: Nonrelational databases, such as HBase and Apache Cassandra, used for data storage and retrieval. MapReduce: The compute layer of big data for parallel processing on distributed server and storage nodes. Pig: Higher-level programming language in Hadoop, and an alternative to Java. Hive: Data warehouse layer built on top of Hadoop. Cascading: Thin Java library that sits on top of Hadoop. It allows suites of MapReduce jobs to be run and managed. TOM FARRE is former Editor of VARBusiness and a freelance journalist. Big Data Marketplace, 2012-2017 (sales in $ billions) $60.0 $50.0 $40.0 $30.0 $20.0 $10.0 $5.1 $10.2 $16.8 $32.1 $48.0 $53.4 $0.0 2012 2013 2014 2015 2016 2017 SOURCE: Wikibon.com, 2012 26 SMARTENTERPRISEMAG.COM