Ten Mistakes to Avoid

Similar documents
ten mistakes to avoid

Predictive Analytics: Revolutionizing Business Decision Making

Using and Choosing a Cloud Solution for Data Warehousing

Big Data Integration: A Buyer's Guide

how to gain insight from text

The 3 questions to ask yourself about BIG DATA

Ten Mistakes to Avoid

Tips to ensuring the success of big data analytics initiatives

How to Run a Successful Big Data POC in 6 Weeks

Evolution to Revolution: Big Data 2.0

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Big Data Comes of Age: Shifting to a Real-time Data Platform

Is Your Big Data Solution Production-Ready?

Seven Steps for Executing a Successful Data Science Strategy

Ten Mistakes to Avoid

End Small Thinking about Big Data

The Future of Business Analytics is Now! 2013 IBM Corporation

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS

What is a process? So a good process must:

NAVIGATING THE BIG DATA JOURNEY

TDWI BIG DATA MATURITY MODEL GUIDE Interpreting Your Assessment Score

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Achieving Business Value through Big Data Analytics Philip Russom

The Future of Data Management

The Next Wave of Data Management. Is Big Data The New Normal?

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014

Architected Blended Big Data with Pentaho

Big Analytics: A Next Generation Roadmap

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Digital Asset Management: Three Tickets To The Blockbuster Content Marketing Show

Enterprise Data Integration

From Lab to Factory: The Big Data Management Workbook

The Definitive Guide to Strategic Analytics. White Paper

One View Of Customer Data & Marketing Data

Big Data. Fast Forward. Putting data to productive use

Do You Need to be a Data Scientist to Analyze Text? Fern Halper

The Principles of the Business Data Lake

Patient Relationship Management

How to Manage Your Data as a Strategic Information Asset

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Sources: Summary Data is exploding in volume, variety and velocity timely

Data Modeling in a Coordinated Data Management Environment: The Key to Business Agility in the Era of Evolving Data

Are You Big Data Ready?

The Business Owner s Guide to Selecting CRM

Investor Presentation. Second Quarter 2015

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

Wikibon Big Data Analytics Adoption Survey, Frequency Analysis

IDC MaturityScape Benchmark: Big Data and Analytics in Government

7 Steps to Superior Business Intelligence

Best practices for managing the data warehouse to support Big Data

INTRODUCING TALEO 10. Solutions Built for the Talent Age. Powering the New Age of Talent

Analance Data Integration Technical Whitepaper

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Big Data Efficiencies That Will Transform Media Company Businesses

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

Operationalizing and Embedding Analytics for Action

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Apache Hadoop: The Big Data Refinery

TDWI research TDWI BEST PRACTICES REPORT FOURTH QUARTER 2013 MANAGING BIG DATA. By Philip Russom. tdwi.org

Analance Data Integration Technical Whitepaper

Data Refinery with Big Data Aspects

Smarter Analytics. Barbara Cain. Driving Value from Big Data

Top 10 Trends In Business Intelligence for 2007

Data virtualization: Delivering on-demand access to information throughout the enterprise

How To Understand The Benefits Of Big Data

BI Dashboards the Agile Way

The Definitive Guide to Data Blending. White Paper

SATISFYING NEW REQUIREMENTS FOR DATA INTEGRATION

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Using a Multichannel Strategy to Deliver an Exceptional Customer Experience

Before You Buy: A Checklist for Evaluating Your Analytics Vendor

Table of Contents. Research Methodology and Demographics 3 Executive Summary 4 Introduction to Big Data Management 5

Resolving the Big Data Dilemma

Gain Deep Brand and Customer Insight with Social Media Analytics

Analytics Strategy Information Architecture Data Management Analytics Value and Governance Realization

Ganzheitliches Datenmanagement

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

BEYOND BI: Big Data Analytic Use Cases

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

CONNECTING DATA WITH BUSINESS

Winning with an Intuitive Business Intelligence Solution for Midsize Companies

redesigning the data landscape to deliver true business intelligence Your business technologists. Powering progress

With the Emergence of Big Data, Where do Relational Technologies Fit?

The Future of Data Management with Hadoop and the Enterprise Data Hub

Integrating Big Data into Business Processes and Enterprise Systems

Data Warehousing in the Cloud

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Integrating Data Governance into Your Operational Processes

Big data: Unlocking strategic dimensions

Predictive Analytics for Business Advantage

Multichannel Attribution

Transcription:

EXCLUSIVELY FOR TDWI PREMIUM MEMBERS TDWI RESEARCH SECOND QUARTER 2014 Ten Mistakes to Avoid In Big Data Analytics Projects By Fern Halper tdwi.org

Ten Mistakes to Avoid In Big Data Analytics Projects By Fern Halper FOREWORD Big data analytics requires the ability to collect, manage, and analyze potentially huge volumes of disparate data at the right speed, within the right time frame, while providing the right-time analysis and activity to the end consumer. Big data analytics has the potential to provide great value for companies by increasing productivity and performance. It is an exciting and challenging time for organizations as they consider big data opportunities. Recently, TDWI launched its Big Data Maturity Model Guide and Assessment (see tdwi.org/bdmm), which provides a benchmarking tool for organizations to assess their big data maturity. As part of the research for the model, we spoke to many organizations at various stages of their big data journey to understand best practices for big data and big data analytics. Although many enterprises are still in the early stages of their big data efforts, a number of interrelated themes emerged about what works and doesn t work when it comes to big data analytics projects. ABOUT THE AUTHOR Fern Halper, Ph.D., is director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and big data analytics approaches. She has more than 20 years of experience in data and business analysis and has published numerous articles about data mining and information technology. Halper is co-author of Dummies books on cloud computing, hybrid cloud, service-oriented architecture, service management, and big data. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at fhalper@tdwi.org, or follow her on Twitter: @fhalper. 2014 by TDWI (The Data Warehousing Institute TM ), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. E-mail requests or feedback to info@tdwi.org. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies. tdwi.org 1

MISTAKE ONE: FAILURE TO DEFINE THE BUSINESS PROBLEM UP FRONT Some organizations start a big data effort without first defining the business problem they are trying to solve. To many of these organizations, Hadoop equates with big data effort, so an IT group might buy a Hadoop cluster and start dumping a lot of data into it. It is not a bad thing to store data in a so-called data lake in Hadoop. The benefit of Hadoop is that it is cheap and can store all kinds of data because it doesn t require a schema on write. Many organizations use it to house unstructured and semi-structured data. However, data by itself really has no value if it isn t being analyzed and acted on in some way. Big data is not an isolated activity. If executed properly, it can have a big impact on a business strategy. Therefore, it makes sense to get the business involved early on. We have observed that when a big data effort is funded solely by IT, it doesn t tend to go far often not even moving past the proof-of-concept stage. However, when IT and business users collaborate and have business funding, that usually means there is a business problem worth solving, which often involves some sort of analytics to take action. These problems run the gamut from marketing (identifying customers who will churn, commit fraud, or take other actions) to maintenance (predicting part failure). The key is to focus on the opportunity. 2 TDWI RESEARCH

MISTAKE TWO: NEGLECTING A PROOF OF CONCEPT WITH DEFINED METRICS Many organizations agree that a proof of concept (POC) can be a valuable exercise for big data projects. The POC, or what some organizations call a proof of value, can help illustrate the benefit of a project. Some businesses build several POCs simultaneously to see what will stick. There are a few points to keep in mind when building a POC. It should address a real need, yet be doable. Doable means that the people and technology exist to make it happen and the data is available. You may have a good idea about building out a POC that predicts when a certain machine will require maintenance, but if you can t access the data, the POC won t get far. Another critical factor that some organizations miss is the importance of identifying and measuring defined metrics. A business metric is a measurement that is quantifiable and relates to a business activity. These metrics will help you better demonstrate the value and potential impact of the big data POC. In fact, it makes sense to build out a POC where you can measure impact. For example, if you re building out a POC that is used to predict churn, it makes sense to create a metric such as percent churn per month, and then use that metric to compare what the churn rates were before and after implementing the POC assuming you have also developed a way to target and approach those customers at risk of churning, because a model by itself is of little worth unless you can act on the results. The key is to find a POC where you are fairly certain you can show value. tdwi.org 3

MISTAKE THREE: TRYING TO SOLVE EVERYTHING AT ONCE Big data is not a sprint. Successful companies generally take it one step at a time, making targeted investments tied to a business objective and growing from there. We have seen companies fail when they tried to boil the ocean with architectures that were too heavy to get off the ground. That is not to say you should wait forever or take tiny steps, but it is important to be judicious about how you approach a big data project. There are numerous entry points to getting started with big data. Some organizations have grown their data warehouses to store huge volumes of structured data, and they run advanced analytics against that data. At some point they might enhance their data warehouse by putting some sort of data analytics platform in a cloud environment or by adopting Hadoop on premises. Another entry point might be to start utilizing Hadoop for unstructured data, such as internal data that originates from call center notes or e-mail messages. This could be stored in your content management system or in Hadoop. Depending on the data complexity and what you need to do with it, you might start to extract various terms, concepts, and sentiments from the text that can be used for analysis. This data might be merged with your structured data or analyzed separately. The key is that your organization is gaining useful insights that could drive action. Successful organizations often take an ecosystem approach to big data. This approach includes technologies, data management, analytics, governance, and organizational components. From an architecture perspective, it involves thinking through how technology parts can work together. It also ultimately involves utilizing some sort of unified information architecture. 4 TDWI RESEARCH

MISTAKE FOUR: DISREGARDING DATA QUALITY The importance of data quality doesn t go away with big data. Principles such as integrity and reliability are still significant. Interestingly, when big data was first hyped, some felt that you didn t need to be concerned with data quality and that big data should be explored in its raw form. Any problems would even themselves out, they argued. Others contended that data quality is fundamental if you re using it to make decisions. Obviously, the answer depends on the business problem you re trying to solve. Healthcare and operational maintenance are two examples where high-quality data is essential. Sometimes in the initial stages of analysis when you are exploring huge amounts of disparate data, you might not be too concerned about the quality of each data element and that is fine. Exploration is a good thing and an important part of any analysis. However, if you decide a data source should be part of the analysis, then it must be validated. This includes understanding data provenance to know if the data can be trusted. Social media data, for instance, can be untrustworthy. It can also be very dirty in terms of names, addresses, dates of birth, and so on. Some may not think this is important, and of course it will depend on your business problem. However, if you are building a model that targets someone for a birthday promotion, and the majority of birthdates are 0101, then you have a problem. tdwi.org 5

MISTAKE FIVE: UNDERESTIMATING THE DATA It s one thing to dump lots of data into Hadoop or store it in other data repositories. However, getting the data in shape to analyze without understanding how complex it can be is another matter entirely. Data quality was discussed in Mistake Four. Additionally, in TDWI surveys on big data challenges, data integration often ranks at the top of the list. Yet some companies underestimate it. Integrating big data for analysis is complex for several reasons: the data is often siloed; the data can come from multiple sources with no metadata or master data; and the data can be multistructured. Couple that with the possibility that some of it might be real-time data, and you can see how complex big data can be. Big data also introduces new types of data. Twitter streams, Facebook posts, sensor data, security logs, video, and other sources of data are emerging daily. However, in order to ultimately take action, you need to analyze the data and introduce it into the operating processes of your company. Smart organizations often put together a strategy with a road map for dealing with new data sources and modifying existing workflows to accommodate big data or create a new big data workflow, if the data source will be used in an ongoing way. 6 TDWI RESEARCH

MISTAKE SIX: IGNORING SKILL SET ISSUES The term data science has emerged alongside the term big data. Data science involves principles and techniques for gaining insight via data analysis. The data scientist is the person or group charged with this task. The necessary skills for a data scientist boil down to strong technical skills, strong data and analytics skills, knowledge of the business, and communication skills. Clearly, data scientists are in short supply. One mistake that some enterprises make is to not have a plan for dealing with the skills scarcity. There is not a one size fits all solution to this problem. Companies take different approaches. Some train from within. Others recruit talent out of universities. Still others outsource the work. Another mistake enterprises make is to stick to the belief that a data scientist must be one person. Some organizations are putting together small teams that act as the data scientist. These teams generally consist of someone who is technical, someone skilled in data mining and modeling, and a business analyst. Others look for one person to fit the bill and plan to use them wisely. Regardless of the approach, the important thing is to have a plan for dealing with the potential talent gap that works for the culture of your organization. tdwi.org 7

MISTAKE SEVEN: LACKING CRITICAL THINKING Analytics is becoming democratized, meaning that it is being made available to more people. This democratization can happen through a variety of methods, such as making analytics easier to use, improving tooling to make collaboration between data scientists and business analysts easier, and embedding analytics in a process (see Mistake Nine). However, in most cases, it is still important to have an analytic thought process. Here s an example: a business analyst is exploring social media data from a service that provides the percentage of buzz by gender or location. However, that person never thinks to look at how many people the service could actually categorize into these groups, and just takes the percentages at face value. Another example: someone is using a social media analytics service to perform competitive intelligence. This person looks at social media streams to determine brand presence, but hasn t thought through how to handle duplicate instances of press releases. Should that release be counted only once or more than once? The point is that many companies make the mistake of thinking that just because analytics is accessible, everyone can use it. Unfortunately, not everyone is a critical thinker. Depending on your company, you should put a plan in place to deal with this. Such a plan might include training, limiting how data is presented to end users, or organizing to execute differently. 8 TDWI RESEARCH

MISTAKE EIGHT: FORGETTING THAT PEOPLE AND PROCESS ARE A BIG PART OF BIG DATA ANALYTICS Big data projects often fail not because of the technology but because of people, politics, and processes around the technology. Organizational issues and culture are a big part of big data analytics. Building out analytics can take time and requires building trust among all parties involved. Cultural issues can be pervasive in such efforts. Clearly, strong analytics leadership is an important part of this equation. With leadership that supports analytics and uses analytics to drive decisions, it is easier to build an analytics culture. Companies that don t have analytics leadership often try to grow efforts organically, which usually requires running more pilots to prove the value of analytics. It may also take more hand-holding. tdwi.org 9

MISTAKE NINE: FAILURE TO MAKE THE ANALYTICS ACTIONABLE Analytics is basically useless if nothing is done with the results. This is true for traditional analytics as well as for big data analytics. For instance, an organization might go ahead and create an insightful model that can decrease fraud without thinking about the need to instantiate the model as part of a business process. That means that a special investigation unit, or another part of the organization that deals with fraud, will be using the model output. In other words, it is important to think through how you are going to use the results of a model, if it is important enough. Operationalizing a model (i.e., making it part of a business process) is one way to make a model actionable, and this is vitally important. We ve seen in our research that organizations that have an analytics process in place are more likely to derive measurable value from those analytics. This requires planning up front and gaining trust because sometimes people are suspicious of what they don t understand (see Mistake Eight). Whether the model is part of a manual process or embedded into systems that support a business process (e.g., for a call center), people need to buy in. 10 TDWI RESEARCH

MISTAKE TEN: NOT DEALING WITH GOVERNANCE Governance is about applying policies related to using services. Data governance deals with the roles, policies, and rules around data. It s about defining the principles, roles and responsibilities, and rules with which an organization must conform. Governance is often combined with compliance and security issues across computing environments. Once you have explored your data and understand which data and analytics will be used for business purposes, it is important to understand that this data is now part of your corporate intellectual property and needs to be governed and secured appropriately. Some organizations put off governance, stating that they are moving too fast and don t have time for it. It feels too overwhelming to them. They fail to understand two concepts. First, governance can look different for different organizations. It doesn t have to be a cumbersome process. Second, companies that deal with governance fairly early in big data efforts don t run into issues such as data ownership, data definitions, or data provenance that can derail big data efforts. In other words, governance is a good thing. It is fine to use existing governance frameworks for your big data efforts in fact, it makes sense to do so. If you don t have a framework, you can start small, perhaps with a data steward. The important thing is to begin to put a plan in place so you re not caught off guard as you grow. tdwi.org 11

ABOUT TDWI TDWI, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the best practices, strategies, techniques, and tools required to successfully design, build, maintain, and enhance business intelligence and data warehousing solutions. TDWI also fosters the advancement of business intelligence and data warehousing research and contributes to knowledge transfer and the professional development of its members. TDWI offers a worldwide membership program, five major educational conferences, topical educational seminars, role-based training, on-site courses, certification, solution provider partnerships, an awards program for best practices, live Webinars, resourceful publications, an in-depth research program, and a comprehensive website, tdwi.org. 555 S Renton Village Place, Ste. 700 Renton, WA 98057-3295 T 425.277.9126 F 425.687.2842 E info@tdwi.org tdwi.org