TEN MISTAKES TO AVOID



Similar documents
Ten Mistakes to Avoid

ten mistakes to avoid

DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS

Tips to ensuring the success of big data analytics initiatives

Using and Choosing a Cloud Solution for Data Warehousing

Data Warehousing in the Cloud

The 4 Pillars of Technosoft s Big Data Practice

The 3 questions to ask yourself about BIG DATA

DEFINITELY. GAME CHANGER? EVOLUTION? Big Data

Making data analytics work: Three key challenges

In-Memory Analytics for Big Data

TDWI research. TDWI Checklist report. Data Federation. By Wayne Eckerson. Sponsored by.

Big Data Integration: A Buyer's Guide

We are here to help you...

The Future of Data Management

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

Evolving Data Warehouse Architectures

BIG DATA-AS-A-SERVICE

Ten Mistakes to Avoid

The Lab and The Factory

White Paper The Benefits of Business Intelligence Standardization

Big Data Big Deal? Salford Systems

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Predictive Analytics: Revolutionizing Business Decision Making

A 7-Step Analytics Reporting Framework

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

Charting a Course to Linux Joe Panettieri Editorial Director, Custom Conference Group Ziff Davis Media

The six key marketing challenges facing recruitment firms today

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Top 10 Trends In Business Intelligence for 2007

2015 State of Self-Service. Logi Analytics Second Executive Review of Self-Service Business Intelligence Trends

DATA REPLICATION FOR REAL-TIME DATA WAREHOUSING AND ANALYTICS

INVESTOR PRESENTATION. First Quarter 2014

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

INVESTOR PRESENTATION. Third Quarter 2014

Quarterly Mobile Apps, Business Intelligence, & Database. BILT Meeting June 17, Meeting Minutes

Achieving Business Value through Big Data Analytics Philip Russom

The Principles of the Business Data Lake

SATISFYING NEW REQUIREMENTS FOR DATA INTEGRATION

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

How to Justify Your Security Assessment Budget

2014 STATE OF SELF-SERVICE BI REPORT

Marketing... are you up to speed?

The Ultimate Guide to Buying Business Analytics

Big Data: Moving Beyond the Buzzword

INSIGHT. IDC's Social Business Taxonomy, 2011 IDC OPINION IN THIS INSIGHT. Scott Guinn

The Big Facts about Big Data

8 TIPS FOR MAKING THE MOST OF GOOGLE ANALYTICS. Brought to you by Geary LSF and Orbital Informatics

Our Guide to Customer Journey Mapping

How To Turn Big Data Into An Insight

Big Data Defined Introducing DataStack 3.0

Advanced Big Data Analytics with R and Hadoop

Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008

Building Social Media Success. Gaining referrals and revenues in the digitized world

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Sources: Summary Data is exploding in volume, variety and velocity timely

THE BENEFITS AND RISKS OF CLOUD PLATFORMS

The Ultimate Guide to Buying Business Analytics

Integrating Routing Software With Your Business Process Workflow

Consulting Firms Retrench with Social Media: A 2013

Internet Marketing Rules!

Better Decision Making

Big Data Are You Ready? Thomas Kyte

Demonstrating Understanding Rubrics and Scoring Guides

9 Reasons Your Product Needs. Better Analytics. A Visual Guide

Northrop Grumman White Paper

People Ready BI a Microsoft White Paper

The 3 Biggest Mistakes Investors Make When It Comes To Selling Their Stocks

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

RevoScaleR Speed and Scalability

Avoiding Big Failure with Big Data

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

white paper Social Networking Four ways to use social media to increase the value of your webinars and virtual events

Executive Summary For Advertisers, Engaging Consumers Online Is Harder Than Ever Advertisers Are Embracing Rich Digital Media...

Cloud Less Talk, More Action. Find your starting place and take action that makes sense for your organization.

ANALYTICS BUILT FOR INTERNET OF THINGS

The How and Why of Coupon Follow Up

Six Signs. you are ready for BI WHITE PAPER

What You Don t Know Does Hurt You: Five Critical Risk Factors in Data Warehouse Quality. An Infogix White Paper

Best Practices for Scaling a Big Data Analytics Project

Louis Gudema: Founder and President of Revenue + Associates

Data Analytics Solution for Enterprise Performance Management

FORGE A PERSONAL CONNECTION

Integrated Online Payroll Service Software and CRM Benefits

Big Data & the Cloud: The Sum Is Greater Than the Parts

Key Trends, Issues and Best Practices in Compliance 2014

The analytical sales team series: Understanding how advanced analytics can transform your business

WHITE PAPER. Data Center Fabrics. Why the Right Choice is so Important to Your Business

Inside Track Research Note. In association with. Hyper-Scale Data Management. An open source-based approach to Software Defined Storage

How to make more money in forex trading W. R. Booker & Co. All rights reserved worldwide, forever and ever and ever.

Marketing Online SEO Facebook Google Twitter YouTube

Better Business Analytics with Powerful Business Intelligence Tools

Investor Presentation. Second Quarter 2015

Transcription:

EXCLUSIVELY FOR TDWI PREMIUM MEMBERS SECOND QUARTER 2012 TEN MISTAKES TO AVOID In Big Data By Marc Demarest 1 7 8 tdwi.org 4 9 2 3 5 10 6

TEN MISTAKES TO AVOID In Big Data By Marc Demarest FOREWORD It s early days in the big data revolution. So early, in fact, that not everyone agrees there is a big data revolution, or anything more than a big data detour from which we ll recover any day now. We re a notoriously short-sighted group. In the early 1990s, everyone knew there would never be a data warehouse larger than a few dozen gigabytes. Thus, it isn t too early, from my vantage point, to begin cataloguing the missteps and fatal actions that accompany the early stages of any significant technological change and that are part and parcel of the big data conversations with which I m involved. Here are the top 10 big data mistakes I m running into right now. ABOUT THE AUTHOR Marc Demarest is CEO and a principal of Noumenal, Inc., an international management consulting firm headquartered in the U.S. and UK. Noumenal principals regularly consult to suppliers and consumers of big data technologies, and Demarest speaks and writes frequently on topics related to data warehousing and business intelligence. He can be reached at marc@noumenal.com or @marcdemarest on Twitter. 2012 by TDWI (The Data Warehousing Institute TM ), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. E-mail requests or feedback to info@tdwi.org. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies. tdwi.org 1

MISTAKE ONE: IGNORING THE INFRASTRUCTURE The big data revolution is brought to us not by Twitter but by Intel, Seagate, and Qualcomm. Big data is the effect of a revolution we re well into the big compute/memory/storage revolution. One of the reasons the supply side of the IT market is so ebullient about big data is that big data projects (along with the big data analytics that accompany them) will soak up the excess compute/memory and network capacity we ve (collectively) purchased and installed. I say collectively because most individual firms are underprovisioned in one or more of these three critical infrastructure dimensions. They re especially deficient in storage, where traditional storage area networks (SANs) are a huge bottleneck, infuriating database management system (DBMS) vendors and application developers and posing a significant (and perhaps insurmountable) barrier to effective use of big data. Many business intelligence (BI) professionals I work with are oblivious to their infrastructure. Someone else worries about bandwidth, computing power, memory, and storage. Poor infrastructure can, has, and will kill big data projects, and big data projects can, have, and will break IT infrastructure. Big data doesn t just soak up bandwidth, CPU cycles, and storage frequently, it overruns them. Smart IT leadership teams, I find, assume that big data overtaxes the existing one or more parts of their infrastructure. They are using big data projects as a pretext for silo-busting: for bringing storage teams, server teams, networking teams, and BI teams back together to work out blueprints that deliver business value without overtaxing underlying infrastructure. 2 TDWI RESE ARCH

MISTAKE TWO: ASSUMING THE OUTCOME More than one social media hotshot has said it: capture all the data, hand good analytical tools to smart data scientists, and you ll find something valuable. More often than not, the big data vendor s snazzy presentation begins with a YouTube video inviting us to consider for the umpteenth time just how many tweets and Facebook posts there are, every single day. Surely there s value in all that data, right? Commercially, we make bets and take bets on a regular basis. At some level, it s our stock-in-trade. However, we don t bet blind, and we know we tend to get better outcomes and get rewarded by our organizations from judging, structuring, and metering our bets before we make them. Venture-backed technology companies and wildly profitable commercial firms may be able to spend millions of dollars on the assumption that in big data lies a big advantage (exact location TBD), but the rest of us need to have a clear hypothesis about where, specifically, that advantage is, and we need to have clear models for how we ll measure the commercial value of our big data projects. Taking advantage of a decision maker s enthusiasm for big data to skip over the Where s the payoff on this bet? question won t, alas, save us from being penalized when the bet has no payoff as many assumed outcome big data projects will inevitably have. Even in high-enthusiasm environments, smart IT teams are spending every bit as much time proofing big data projects, using tried-and-true governance methods, and embracing the same project disciplines they employ on conventional IT projects. Big data is big risk, and big risk has to be well characterized. tdwi.org 3

MISTAKE THREE: MISTAKING THE POSTER CHILD FOR THE REVOLUTION There s no doubt that Hadoop is the poster child of the big data revolution. A recent Google search on Hadoop yielded 11.9 million distinct hits; a search on big data yielded 11.6 million; NoSQL produced 7.5 million. For contrast, analytical application yielded 1.9 million distinct hits and in-database analytics (a far more important topic) yielded a mere 626,000. Discussions of big data are, in my opinion, dominated by Hadoop to the detriment of everyone except suppliers of Hadoop distributions and services. As we evaluate big data technologies, whether we consider Hadoop as a distributed file system, an analytics development environment, or both, remember that there are (1) many big data project success stories that don t involve Hadoop, and (2) alternatives to Hadoop, some of which have more attractive risk profiles. We need to keep Gates Third Law in mind: technology choices don t make or break technical revolutions. It seems probable that commercially hardened successors to Hadoop will have a long and happy trajectory as part of our big data toolkit. However, the center of the big data revolution is elsewhere: in analytics, as well as in who or what makes and consumes the output of commercial decisions. Like many a Hollywood movie, the poster child Hadoop figures in the film, but isn t necessarily the center of the plot. 4 TDWI RESE ARCH

MISTAKE FOUR: OPERATING WITHOUT A PLAN One of the themes in the marketing material emanating from big data proponents seems to be that we don t need to plan or design things anymore. Usually these claims are made at fairly low levels in the value stack: we don t have to design schemas anymore; we don t have to plan storage tiering or allocation; we don t have to implement complex a priori indexing strategies. The larger message seems to be just do it. In my experience, this you-can-just-leap-to-implementation message is having a profound impact on IT professionals, who have felt for some time that planning was ultimately about going through the motions for very little value, and that planning often just got in the way of getting the job done. Planning also stretched project cycle times and angered business constituencies. As I suggested above, the broad-based notion that big data somehow relieves us of the responsibility of planning and design is a serious misconception. I don t see anything in the big data phenomenon that requires significant changes to our current planning and governance models; if they work well now, they ll work well for big data projects. If they re organizational theater now, they ll be organizational theater for big data projects as well. Planning is a response to risk and complexity, and, as I suggested in Mistake Two, big data equals big risk. Uncharacterized and unplanned for, that risk will certainly produce projects that consume cash and deliver no appreciable business value. tdwi.org 5

MISTAKE FIVE: THINKING HORIZONTALLY We ve grown accustomed to thinking about the business intelligence infrastructure as a common horizontal pattern that crosses vertical market boundaries. Although the source system portfolio may change from industry to industry, the patterns we use to integrate source systems into data warehouses and data marts, and the patterns we use to deliver information from those warehouses and marts to desktops, are common currency for the BI industry. Periodically, we innovate data vaults spring to mind here and the innovation is absorbed relatively quickly and without much contention. With big data, these patterns won t initially be horizontal. If big data becomes, as McKinsey and others predict, a factor of production rather than an asset or a convenience, that process is likely to happen very differently in different industries. For example, although it s clear to me that consumer packaged goods companies can potentially gain a great deal in their brand reputation management initiatives by tackling the persistence and analysis of social media data, I don t believe that durable goods manufacturers have much to gain from similar projects, brand-wise or otherwise. As the big data market matures, the integration patterns we use will become, once again, more horizontal than vertical. That s an effect of market maturation. In the short term, smart IT strategists will be watching big data leverage plays primarily in their industries and looking for emerging horizontal patterns to come from the industries that are most effectively leveraging large-scale data today: the military and public safety markets; financial services, transportation, and logistics; and process manufacturing. 6 TDWI RESE ARCH

MISTAKE SIX: SWAPPING FOCUS I ve had more than one eager big data proponent talk to me as though big data initiatives and technologies are effective replacements for the BI environments we ve spent the last decade building and enhancing. Nothing could be further from the truth. Today s BI environments are typically concerned with information distribution, not analytics. We use the term analytics to describe today s BI environments, but it s not clear that much analysis actually takes place in those environments. Even the most sophisticated KPI models embedded in high-performance, mobileready dashboards don t constitute analytics, any more than a car s speedometer or an airplane s altimeter are analytical devices. Today s BI assumes smart people; big data does not. Big data assumes smart algorithms operating on massive amounts of data, and those smart algorithms may never interface in any way with a human decision maker. In the best cases, big data technologies augment our conventional data warehousing and data mart environments, and support classes of decision making that are underserved in BI environments today, whether data scientists or algorithms are the decision makers. It s certainly true that many organizations BI environments benefit by having distributed file systems upstream from their enterprise data warehouses, serving as data pools. Some organizations will also benefit from being able to recast some of their data hygiene and ETL/ELT processes as, for example, MapReduce algorithms. A few organizations may find that certain data mart users benefit, in cost-effective ways, from the radical speed-up or scale-up offered by in-memory and NoSQL DBMSs. Thoughtful BI teams have figured out that big data expands, deepens, and makes the BI environment they operate today more complex. They understand that there s relatively little that big data technologies and practices can do, at this stage of their maturation, to replace conventional BI for conventional user constituencies. tdwi.org 7

MISTAKE SEVEN: ASSUMING TECHNICAL MATURITY Much of the technology we re asked to contemplate when building big data environments is staggeringly immature. It s so immature that we cannot assume the features we expect in any competent technology are contained in some of the most popular big data components. Single points of failure abound, configuration of a complex environment is often performed by copying XML files around a large grid of service nodes, and things we might consider price-to-play items are not possible or are difficult at best. Functionality we assume to be present is not, is rudimentary, or requires coding. Our dependency on traditional, reliable suppliers (see Mistake Nine) has raised the level of service we expect from the data management infrastructure, and we don t ask questions we should because, after all, what self-respecting DBMS vendor would sell a product with no bulk-export feature? I have seen more than a few prototype big data environments where the team s time is spent hacking and whacking at purchased or open source technologies to ensure they operate in ways that are consistent with organizational practices, expectations, and requirements. There s no commercial value there; that s unplanned integration cost, pure and simple. And that unplanned integration cost is, in turn, the result of incomplete planning (see Mistake Four). Smart big data strategists aren t making large-scale commitments to big data technologies that haven t been sandboxed and tested, end-to-end, for the long list of don t-haves and gotchas that (for the next few years at least) will characterize many big data technologies and will have to be costed out, scheduled, and implemented as part of a well-planned and well-executed project. 8 TDWI RESE ARCH

MISTAKE EIGHT: FACTORING PEOPLE OUT OF THE EQUATION Many column inches have been spent in the past 18 months discussing the new roles required in the big data world: data scientists, data engineers, and the like. We may need these new roles; it depends on how quickly analytical tool makers and in particular the providers of statistical analysis engines and packaged analytical applications step up to the task of making big analytics something a competent subject matter expert can build without technical (statistical or programming) virtuosity. Within our current BI ecosystems are large numbers of people with expectations about how decision support happens in our organizations. In BI teams, IT operations, and functional organizations, we have created a complex set of processes and practices that depend on the BI environment remaining roughly the same: delivering the same sort of information in the same ways to the same people. Most organizations don t have a map of these process and practice dependencies. We find out about them the old-fashioned way: we change a small part of one schema, drop a few data elements, and then field a call from the European sales office, informing us that we ve just broken the revenue roll-up report for all of Central European sales. I ve reviewed many big data strategy documents in the last year, and in only one of those documents did I find a thoughtful discussion of how a planned big data project was likely to affect established processes and practices in IT and end-user organizations, what old expectations needed to be reset, and what new expectations needed to be installed and reinforced. This environmental impact assessment is something I expect we ll learn, painfully and over time, to do as a matter of course. tdwi.org 9

MISTAKE NINE: ASSUMING A RELIABLE SUPPLY I m surprised that many people don t recognize the big data revolution is, in part, a supply-side revolution: a transformation being driven by the suppliers of transformation technology. I m also surprised that BI professionals have often forgotten that in earlystage markets, supply is unreliable and must be managed and used on that basis. In mature IT markets, such as the merchant relational database management system (RDBMS) market, we can, and do, assume reliable supply. As the market shakes out, the weaker suppliers disappear or are absorbed. The suppliers that remain the franchise suppliers can be expected to remain in place, get larger, move slower, and exact higher fees, thanks to their dominant market position. We may not like the relatively large amount of power those franchise suppliers can exert over our behavior, but we do like the reliable supply they represent. We often depend on those reliable suppliers for expertise as well as technology for the architectural and integration assistance we require to make our enterprise IT environments function properly as a whole. The maturation of the BI marketplace led (perhaps inevitably) to a loss of risk management and architecture/design competencies inside most IT organizations. Most IT organizations are less competent to design and build their BI infrastructure using a bestof-breed model than just a decade ago. We must recover these capabilities, or recruit staff with the ability to design complex BI architectures from the ground up, and to assess the risk of those designs and the technologies we choose to support those designs. Our franchise technology suppliers don t typically have the expertise we need to design, architect, and integrate big data. Those skills are often found in small, independent consulting firms whose resources are already overtaxed. New, skilled resources can t be created by formalized educational programs; integration skills are a function of experience, and we don t have a broad or deep big data experience pool, collectively. Smart IT management teams (1) assume many of the big data technology choices they make will be invalidated by market shifts in the not-too-distant future, and (2) are working on plans to grow their own in-house big data design, architecture, and integration talent to reduce their dependency on unreliable supply. 10 TDWI RESE ARCH

MISTAKE TEN: CONFUSING THE PREREQUISITE WITH THE PAYOFF When I talk to my clients and others about big data, it s hard to get past the talk about persisting and organizing large volumes of data so we can discuss analytics: processing those large data volumes to yield actionable value. Partly that s the effect of other behavior patterns I ve identified earlier, specifically Mistakes Two, Three, and Four. This data-first thinking is an old habit. It got us in significant trouble in the early days of data warehousing, and its echoes persist in the high-failure-rate stories some analysts tell to this day. Well-organized data, at whatever volume, is at best latent value and is often just an expensive monument to our own shortsightedness. Competition in a big data world is based on the quality and precision of your algorithms and your analyses. Big data is really big analytics. By definition, big data analysis is beyond the boundaries of the brain s ability to process and organize, and beyond the boundaries of conventional BI tools ability to represent visually. The analyses are either facilitated by code or produced by code, and are often consumed by code as well, with no eyeballs required. When eyeballs and wetware are required, they are exceptional eyeballs and exceptional wetware for the most complex, nuanced judgments we can imagine. I ve begun to despair slightly when a conversation with a client or colleague begins What do you think of NoSQL? or We think we ll need thus-and-such bandwidth and storage to persist 3 TB a day by 2015 By contrast, the best conversations I have with people about big data begin with We have this idea for a great analytical application that consumes large data volumes and lets us change the game in our market or words to that effect. tdwi.org 11

A BOU T TDWI TDWI, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the best practices, strategies, techniques, and tools required to successfully design, build, maintain, and enhance business intelligence and data warehousing solutions. TDWI also fosters the advancement of business intelligence and data warehousing research and contributes to knowledge transfer and the professional development of its members. TDWI offers a worldwide membership program, five major educational conferences, topical educational seminars, role-based training, onsite courses, certification, solution provider partnerships, an awards program for best practices, live Webinars, resourceful publications, an in-depth research program, and a comprehensive Web site, tdwi.org. 1201 Monster Road SW Suite 250 Renton, WA 98057-2996 T 425.277.9126 F 425.687.2842 E info@tdwi.org tdwi.org