22 SMARTENTERPRISEMAG.COM
Smart Strategies BIG DATA, Big Innovation Smart CIOs are mining their organizations huge data stores for insights that lead to business innovation. By Tom Farre ILLUSTRATION: BRAD HAMMAN is the new oil goes an analogy making the rounds. Just as oil powered a good portion of the 20th century Data economy, data is driving business innovation and efficiency in today s 21st century enterprise. This naturally turns the discussion to big data the terabyte and now petabyte-class data stores that companies are accumulating from their internal systems, social media and the Web, external sources and a new wave of machine sensors. The more data you have, it would seem, the more potential for innovation. Yet there s a twist to the data is oil analogy, according to Peter Hinssen, Chairman of consulting firm Across Technology and the author of two books on business and technology. Hinssen notes that in the early 20th century, John D. Rockefeller became the world s richest man not by amassing the most oil from the Earth s wells, but by controlling the oil-refinery process through his company, Standard Oil. Leveraging big data is not about having the most data or the most capacity, Hinssen says, but about refining that data, turning it into insight at the speed dictated by the market. Indeed, big data is poised to usher in a transformative era in which business decisions are informed more by data, analysis and scientific testing than by opinion and intuition. Enterprises can harness data analytics for sustainable competitive advantage. Done correctly, data analytics empowers employees at all levels with information that helps them make smarter decisions. Analytics increases corporate intelligence, says Wayne Eckerson, Director of Research at TechTarget, an IT content provider. That s something you can never package or systematize and that competitors can t duplicate. Big data analytics is still in its infancy. For competitive reasons, successful use cases, let alone concrete ROI examples, are still hard to come by; companies are keeping their successes close to the vest. 2012 SMART ENTERPRISE 23
BIG DATA By the Numbers Yet some successes are beginning to be reported: n Internet firms such as Google and Yahoo! were among the first to leverage Web log data to personalize search, ad and product recommendations to enhance the customer experience. They, along with companies such as Capital One, use big data to perform thousands of rigorous tests each year, experimenting to improve products and create new business models and revenue streams. n Enterprises are analyzing log files and sensor data to optimize the performance of data centers, drilling operations, trucking routes, manufacturing lines and other processes. n Consumer-products giants analyze point-of-sale and other data in real time to forecast demand and fine-tune their promotional strategies. n Large financial firms analyze historical data to identify patterns that indicate fraud, then make the results operational in their transaction systems. n Retailers are performing social-graph analyses to create maps of customers social circles, changing the idea of most-valuable customer from who buys the most to who is most influential. In commercial markets, big data is enabling CIOs and other IT executives to shift their focus from business-process automation to business optimization. That is, from doing things right to doing the right things, says Donald Ferguson, Executive VP and Chief Technology Officer at CA Technologies. Automation is about doing a predefined thing, such as processing a check, repeatedly and efficiently, Ferguson explains. It assumes you re doing the right thing. Big data analytics is less about processing the check and more about adding context to transactions, to decide what business to be in, how to treat each customer, and what products should be offered. Big data will transform IT into an engine of business innovation and optimization. Yet despite the enormous potential, many enterprises are still not getting big data right. Market watcher Gartner predicts that through 2015, fully 85 percent of all Fortune 500 companies will fail to effectively exploit big data for competitive advantage. Collecting and analyzing the data is not enough, a recent Gartner report states. Most organizations are ill prepared to address both the technical and management challenges. 1.8 ZETTABYTES: 2.7 BILLION: $16.9 BILLION: Big DataTackles Critical Business Issues Amount of information created and replicated as of last year. Number of daily likes/comments Revenue forecast from big data One zettabyte = 1 trillion gigabytes. To store a l this data requires 500 quadrillion files. posted on Facebook technology and services by 2015 FINANCIAL SERVICES: Better and deeper understanding of risk to avoid credit crisis TELECOMMUNICATIONS: More reliable networks where we can predict and prevent failures MEDIA: More content that is aligned with users preferences LIFE SCIENCES: Better targeted medicines with fewer complications and side effects RETAIL: A personal experience with products and offers that are just what consumers need GOVERNMENT: Citizen services that are based on hard data, not just intuition SOURCE: Cloudera, 2012 24 SMARTENTERPRISEMAG.COM
A look at big data itself helps explain why. Big data today can be defined by what some call the Three V s: volume, variety and velocity. Today, most of the emphasis is on volume, and for good reason. Until quite recently, reaching terabyte-class data marked a major milestone. But today it s difficult not to collect terabytes, petabytes and even more. Consumer-products maker Proctor & Gamble, for example, has developed a big data analytic environment that answers questions by analyzing and connecting as much as 200 terabytes of data. Similarly, Inflection LLC, a Web provider of information about people, not only has petabytes of data on disk, but also generates hundreds of gigabytes of operational data every day. Trying to analyze all that data with traditional storage and processing infrastructure would be far too slow and would also raise costs prohibitively. Big data s second V is variety, and it challenges conventional business intelligence (BI) approaches based on structured data in SQL-based relational databases. In fact, some of the most interesting big data today is semistructured or even unstructured. This data spans a wide range of classes and types, including clickstreams, blogs, text documents, SMS messages, social information, knowledge bases, census data, call logs, weather maps, GPS readings, machine data, satellite images, even audio and video files. Where traditional relational databases require schemas to be created in advance, big data storage must accept raw data as it arrives, without knowing the format or what gems might be contained. Velocity, the third V, poses challenges too. Ideally, big data is collected and analyzed in real time. As Hinssen says, There s no sense in knowing about an online prospect four milliseconds after he leaves your website, or about a retail customer four minutes after she leaves your store. It has to happen in real time, and that s a tremendous technical challenge. Even analytics that aren t needed instantly must be done faster than in traditional BI time frames. That s especially true with event-driven data, such as status updates and Twitter feeds. We ve got an entire engineering team dedicated to ensuring that our business analytics run in under 30 minutes, says Matthew Baird, Chief Technology Officer of Inflection. With big data, you just can t be fast enough. But CIOs who combine the three V s also gain complexity. Typically, that complexity is far greater than anything IT has had to reckon with in the past. Fortunately, a technical solution known as Apache Hadoop is gaining critical mass. Emerging from work done by major Internet firms, Hadoop is an open source, Java-based project. It provides a platform for large-scale, distributed processing for big data capture and analysis. The Apache Hadoop platform enables enterprises to store and process 10 times the data at 10 times the rate at the same level of investment, says Doug Cutting, creator of Hadoop and Chief Architect at Cloudera Inc., a Hadoop distribution provider. The platform accomplishes this through a complex mix: an infrastructure of distributed commodity servers, or nodes, using local disks for storage; the Hadoop Distributed File System (HDFS) for storing and retrieving structured and unstructured data; MapReduce, a compute layer for parallel processing of data on the servers; and other tools for programming, data organization and analytics. Hadoop is important to big data. Every page view at Yahoo!, for instance, is connected to several Hadoop applications. Yet it s still a version 1.0 technology. Because TOP 10 USES FOR HADOOP 1 2 3 4 5 6 7 8 9 10 RISK MODELING: How banks can better understand customers and markets CUSTOMER-CHURN ANALYSIS: Why companies really lose customers RECOMMENDATION ENGINE: How to predict customer preferences AD TARGETING: How to increase campaign efficiency POINT-OF-SALE TRANSACTION ANALYSIS: Targeting promotions to make customers buy PREDICTING NETWORK FAILURE: Using machinegenerated data to identify trouble spots THREAT ANALYSIS: Detecting threats and fraudulent activity TRADE SURVEILLANCE: Helping banks spot the rogue trader SEARCH QUALITY: Delivering more relevant search results to customers DATA SANDBOX: Exploring new ways to leverage data SOURCE: Cloudera, 2012 200 TERABYTES: 235 TERABYTES: Amount of data required to answer a question by Procter & Gamble s Business Sphere analytic technology SOURCE: Various Amount of data collected by the U.S. Library of Congress as of mid-2011
few applications and high-level languages sit atop the Hadoop stack, running jobs often requires Java programming and specialized expertise. The [shortage] of data scientists who understand this environment is one of the biggest complaints we hear, Eckerson of TechTarget notes. In addition, Hadoop management and monitoring tools still need improvement, including tools for cluster administration and internal processing. Big data also raises concerns about security and privacy. Issues of data loss, secure access and privacy can become muddled when data is combined, sifted, sorted and repurposed through big data analytics. I believe that over the next year or two, we re going to see one or more significant scandals from security breaches around big data, predicts Debra Danielson, Senior VP, Mergers & Acquisitions Strategy, and a Distinguished Engineer at CA Technologies. I expect that will lead to lot of expense and pain. Business-management issues can also limit success. Big data projects will be judged by their business results, experts say. The challenge here is not an IT challenge; it s about the effective use of information to drive business results, adds Greg Valdez, CIO at CA Technologies. When IT and the business are integrated in their thinking on how to achieve this, then creating value from big data is possible. These four steps can help launch business innovation with big data: Explore innovation: Because big data s potential benefits are so open-ended, consider using a research-based, experimental method. It s best to look at big data as a business project, an innovation project, advises Hinssen. You shouldn t put just technologists on it, but a multidisciplinary team of creative people with knowledge of business, innovation and the customer experience. Increase your technical depth: Data scientists and Hadoop experts are in high demand, and that situation is not expected to change anytime soon. So consider starting a training program for those who show potential. Also, attend industry conferences and learn from your peers whenever possible. Start a pilot project: With something as new as Hadoop, it s best to start small. Even Wal-Mart Stores started with a 10-node Hadoop cluster as proof of concept. Later, after the test produced positive results, the retailer expanded the project to 250 nodes. Similarly, many projects start by combining previously siloed data sets. Leverage the cloud: The elasticity and utility pricing of cloud computing is perfect for big data proof of concepts, and production implementation. If you asked me to pick the one application that is ideal for cloud computing, I d say big data analytics, says Ferguson of CA Technologies. I d first figure out how to do some basic computations and analysis using a mix of public and private clouds, and then go from there. Now is the time to get started with big data. As the technology matures and applications develop, the complexity and challenges will lessen but so will the opportunities. Early adopters of big data analytics will gain the experience and expertise that should lead to business innovation and a sustainable competitive edge. n A Glossary of Big Data Terms Apache Hadoop: Open source software framework for distributed processing of large data sets across clusters of computers using a simple programming model. HDFS: Short for Hadoop Distributed File System, it manages the retrieval and storage of data and metadata for computation. NoSQL: Nonrelational databases, such as HBase and Apache Cassandra, used for data storage and retrieval. MapReduce: The compute layer of big data for parallel processing on distributed server and storage nodes. Pig: Higher-level programming language in Hadoop, and an alternative to Java. Hive: Data warehouse layer built on top of Hadoop. Cascading: Thin Java library that sits on top of Hadoop. It allows suites of MapReduce jobs to be run and managed. TOM FARRE is former Editor of VARBusiness and a freelance journalist. Big Data Marketplace, 2012-2017 (sales in $ billions) $60.0 $50.0 $40.0 $30.0 $20.0 $10.0 $5.1 $10.2 $16.8 $32.1 $48.0 $53.4 $0.0 2012 2013 2014 2015 2016 2017 SOURCE: Wikibon.com, 2012 26 SMARTENTERPRISEMAG.COM