DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you.
JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition to extensive telecommunications and managed services development. He holds advanced degrees in both Aerospace and Electrical Engineering, and an MBA. Prior to AT&T, James was a scientist and PhD candidate at Sandia Labs and Air Force Research Lab. His experiences there includes a variety of projects using complex data analytics in orbital systems and Synthetic Aperture Radar. Previously manager of product development at NetSolve, James was the principal engineer for the design, development and delivery of AT&T s Frame+ managed DSU and CO-Frad managed services. 2
TODAY S PRESENTATION I m not trying to sell you anything This is a high-level approach to understanding and implementing Big Data Based upon my recent experience talking with customers just like yourselves Every organization has the same issues & concerns YOU can avoid the mistakes others have made! 3
AGENDA What is Big Data? Terminology From Business Analytics to Big Data Implementing Big Data Infrastructure Implementing Big Data Data Analytics Applications and Use Cases 4
BIG DATA What is it? 5
WHAT IS BIG DATA? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. (Gartner, IT Glossary) A term of art used to describe large data projects. Any project where collecting, storing, retrieving, or processing the data becomes a significant part of the problem. Getting answers to questions you didn t even know to ask! 6
WE KNOW WHAT IT IS..SO WHAT S THE PROBLEM? Overly-broad definition No common industry understanding Everyone organization has a slightly different definition Vendors can label a wide variety of hardware and software products and services as Big Data Too much focus on hardware and software products You can t buy a Big Data 7
THE VALUE OF BIG DATA You re sitting on a Gold Mine and you don t even know it! Your data contains a wealth of insights and information unavailable from any other source. Use the data you already own to run your business it s FREE! There is no outside data you can purchase that will tell you more about your business than your own data. 8
THE PROMISE OF BIG DATA What they tell you Give us all your data, everything, sales, marketing, customers surveys, manufacturing, accounting, structured, unstructured, text, logical, numeric data. We ll crunch it together and produce insights and actionable information that will enable you to run your business better. What they don t tell you It s amazingly expensive and time-consuming to do Big Data that way. Think millions of dollars and several years. The good news It doesn t have to be all-or-nothing. Organizations get excellent results with a focused, programmatic approach 9
KEEP IN MIND.. Big Data can INFORM your business practices Help you to make informed decisions Big Data CANNOT tell you how to run your business! 10
SOME TERMINOLOGY. 11
BIG DATA & FRIENDS Business Intelligence The aggregation and processing of business data to provide a 360-degree view of the business. Focus around aggregating and visualizing and reporting on the overall business Data Analysis/Analytics The overall process of analyzing data, from collecting data thought analysis through visualization Data Science The overarching term for tools and techniques to extract information from data Data Mining Tools and techniques for discovering patterns in datasets Predictive Analytics Tools and techniques that analyze trends and historical data to make predictions NOTE: You can predict trends, you CANNOT predict the future! Text Analytics Data analytics for text Business Analytics General name for data analytics performed on business data 12
FROM BUSINESS ANALYTICS TO BIG DATA Where Are We Today? 13
CURRENT BUSINESS ANALYTICS PARADIGM Focus on data products for managing the business Typical questions: How many calls did we take yesterday? How much did we sell yesterday? How much inventory do we have? Focus on KPI s, metrics Looking for changes from the norm Using descriptive statistics Summarize data Mean, variance, trends Reporting Chart, graphs, trend plots All about monitoring the machine Focus on a narrow set of data 14
BIG DATA ANALYTICS VS TRADITIONAL BUSINESS ANALYTICS Business Analytics and a whole lot more Use much larger set of data Many different data types & combinations Structured, unstructured, logical, text Typically can t process with traditional systems New algorithms and approaches Predictive Analytics Who is most likely to buy this widget? If a device fails, how likely is it to fail again in 30 days? Data Mining What makes customers unhappy? Text Analytics Sentiment analysis Topic Modeling Visualization Heat maps of customer satisfaction by county Finding relationships What factors most affect employee retention? 15
BIG DATA ANALYTICS TOOLS Machine Learning Assisted and unassisted algorithms that run on a data set to extract information Regression (numeric, logistic, polynomial) Fit data to a predictive model What factors determine used car prices, pricing model? Classification Create a model to determine which customers are good/bad credit risks Clustering Which features of the data cluster together? Which customers are most similar? Decision Trees What s the fastest way to diagnose a heart valve failure? Dimensionality reduction What are the top 50 topics in 10,000 trouble tickets? 16
SOME PITFALLS Literally, thousands of techniques Which one(s) should you use? These new techniques require a lot of skill to use properly Data cleanliness requirements, robustness Have to know how to interpret results Generally not possible to verify results How do you check that 100,000 trouble tickets were properly categorized by an algorithm? Can verify a small few, can t check them all Rely upon goodness-of-fit to check quality Algorithms don t lend themselves to auto-run tools Identified relationships may not actually exist. Artifact of a particular dataset You need experienced algorithms people (data scientists) to pick algorithms, build and expand models and interpret results properly. 17
IMPLEMENTING BIG DATA Infrastructure 18
THE MOST COMMON BIG DATA FAIL Failure to (1) create Use Cases that are (2) tied to Business Goals 19
BUILDING AN INFRASTRUCTURE ID Business Goals Gather Stakeholders Create Use Cases for the business Create a Strategy & Roadmap that meets the Use Case requirements Implement infrastructure 20
BIG DATA INFRASTRUCTURE FAILS Buying from vendors before you have a plan Building an infrastructure BEFORE you define use cases Neglecting to engage stakeholders Not having a well-defined S&R plan Neglecting to use existing systems Underestimating storage & processing requirements Big Data Hadoop You don t need Hadoop to implement Big Data 21
IMPLEMENTING BIG DATA Data Analytics 22
BIG DATA IS A TEAM SPORT Business Analyst Gather requirements, create use cases Data Engineer Design, build, maintain Big Data infrastructure Data Scientists Select algorithms, build, verify models Data Curators Acquire and preserve datasets Handle data governance and quality issues Data Visualizers Create data products from the information gleaned from the data 23
BIG DATA IS A PROGRAMMATIC APPROACH Identify Business Goals Measure and Evaluate Create Use Cases Why a Programmatic Approach? Not every use case will produce desired results Implement into business processes Big Data Analytics Repeat until results achieved Insights Data Products 24
BIG DATA IS A PROGRAMMATIC APPROACH Identify Business Goals Measure and Evaluate Create Use Cases Most breakdowns occur Implement into business processes Data Analytics Insights Data Products 25
DATA ANALYTICS FAILS Failing to assemble a team Creating random data products and trying to feed those back into the business Poor Use Cases & business goals (yes, again!) Failing to implement recommendations Failing to integrate data products into business processes What are they supposed to do with these things? Failing to measure the impact on the business Can t justify why you re doing this Failing to continually implement/improve until results are achieved 26
APPLICATIONS & USE CASES 27
BIG DATA GENERAL USE CASES Data Exploration Find, visualize and exploit hidden information within your data Data Warehousing Integrate Big Data and data warehousing Predictive Analytics Predict equipment and part failures, consumer purchasing habits Data Visualization Create specific, information-rich visualizations that enable management decisions Create a 360⁰ view of your organization Business-specific dashboards Ex: Combine weather data with on-site service schedules Which sites are under threat of cancellation? 28
LARGE ON-LINE RETAILER Bus Goal: Increase revenue Use case: Get customers to buy more Offer you toothpaste when you place a toothbrush in your cart Everyone knows that! Use Big Data to compare every basket ever purchased and every item in every basket Which items are purchased together? Downside: Might only be a 5% chance you ll buy an item Maybe one in a million chance you ll buy any random item Big Data allows you to find those 5% probability items Upside: Offer 20 items! Customer will buy one of them. 29
AUTO DEALER Bus Goal: Reduce walk-always Use Case: Make sure the right car is available Predict which models will be out-of-stock (Predictive Analytics) Combine past sales data with other external data to predict model demand Order in advance so the right cars are always in stock Bus Goal: Increase used-car revenue Use case: Create a model to determine optimal usedcar buy prices What features are most useful for OUR business? Reduce time to evaluate car value 30
LARGE CUSTOMER CONTACT CENTER Bus Goal: Effortless Customer Experience Use Case: Increase customer satisfaction Look at text data that s not ordinarily reviewed Text Analytics on agent notes to uncover true voice of the customer Topic Modeling provides high level summarization of topics 31
QUESTIONS?