The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright
Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve Analy+cs.. 10 Chapter 3: Case Studies Inves+ga+ve Analy+cs for Big Data.. 16 Summary 23
Introduction Big Data and Investigative Analytics There s no ques+on that big data represents both a challenge and an opportunity. As big data volumes con+nue to explode, businesses will face challenges in quickly extrac+ng rich insight from the mountain of machine- generated data streaming in from devices, sensors, smart meters, opera+onal equipment and other sources. Tradi+onal analy+c tools are oten not up to the job of allowing users to interrogate highly diverse types of big data. As data connec+ons and dependencies grow exponen+ally, it s no longer possible to capture ac+onable informa+on in a rigid set of KPIs and canned reports. To effec+vely manage big data, companies need to explore op+ons for performing richer, real- +me data analysis with far fewer resources. One approach for doing that is Inves+ga+ve Analy+cs, where users ask a series of quickly changing, itera+ve ques+ons to figure out why something did or did not happen and how to op+mize a par+cular outcome in the future. Compared to tradi+onal analy+cs, which lack flexibility, inves+ga+ve analy+cs yields insight into ques+ons that haven t even been dreamed up yet. In this ebook, we will delve into the role of inves+ga+ve analysis as it relates to big data, technology requirements for pu]ng inves+ga+ve analy+cs into ac+on, as well as case studies. 3
CHAPTER ONE WHAT IS INVESTIGATIVE ANALYTICS? 4
Emerging Data Analytics Stack Days of One-Size-Fits-All Are Gone Yesterday s BI- ETL- EDW stack is wrong- sided for tomorrow s needs, and quickly becoming irrelevant. - Gigamon In today s big data world, the one- size- fits- all approach no longer works. The data management stack has transformed into mul+ples, while the analy+c stack has had to respond with individualized tools to get at the appropriate data and func+on, be it opera+onal analy+cs, inves+ga+ve analy+cs or predic+ve analy+cs. Big data has created pockets of specializa+on, where some databases are great for warehousing (e.g. Hadoop), while others excel at analy+cs. Companies are also challenged by an evolving infrastructure and the prolifera+on of data centers, data warehouses and data marts. Not only is the infrastructure used to deliver informa+on changing, the data coming in from a myriad of new devices is also changing drama+cally in terms of speed, type and volume of data. With the overwhelming influx of machine- generated data begging to be analyzed, business users such as data scien+sts need real- +me, interac+ve visualiza+on of their data and flexible query crea+on. Today, with the right mix of solu+ons, businesses are able to analyze months worth of data with sub- second response +me and realize extraordinary business value from performing deep analysis with queries created on the fly. 5
Big Data & The Internet of Things Today s AnalyGc Environment: The Internet of Things is a MulGplier for EVERYTHING A jet airliner generates 20TB of diagnos+c data per hour of flight. The average oil plaborm has 40,000 sensors, genera+ng data 24/7. 80% of all households in Germany (32 million) will need to be equipped with smart meters by 2020, in accordance with the European Union market guidelines. These examples alone represent a staggering amount of data that must be captured, analyzed and acted upon. 6
More things are now connected to the Internet than people, a phenomenon dubbed The Internet of Things. Fueled by machine- to- machine (M2M) data, the Internet of Things promises to make our lives easier and bemer, from more efficient energy delivery and consump+on to mobile health innova+ons where doctors can monitor pa+ents from afar. However, the resul+ng +dal wave of data streaming in from smart devices, sensors, monitors, meters, etc., is tes+ng the capabili+es of tradi+onal database technologies. They simply can t keep up; or when they re challenged to scale, are cost prohibi+ve. Just ten years ago, the largest data warehouse in the world was 30TB; today, petabyte- sized data warehouses are common, and the volumes con+nue to grow. According to a 2012 Informa+on Difference survey, most of the 209 customers surveyed said they were experiencing data growth of 20-50% annually. 7
Investigative Analytics Move from What Happened?...to Why? Tradi+onal analy+c tools are oten not up to the job of allowing users to interrogate the fast moving, highly diverse types of high- volume big data. As data connec+ons and dependencies grow exponen+ally, it s no longer possible to capture ac+onable informa+on in a rigid set of KPIs and canned reports. To effec+vely manage big data, companies should explore op+ons for performing richer, real- +me data analysis. One effec+ve approach is inves+ga+ve analy+cs. In the recent TDWI ebook, Inves&ga&ve Analy&cs: The New BI Fron&er (June 2013), analyst Stephen Swoyer describes the bookends of the analy+c con+nuum as tradi+onal analy+cs and predic+ve analy+cs: Tradi+onal analy+cs puts ques+ons into historical context, includes common BI ac+vi+es (e.g. reports, dashboards, scorecards), and is mostly SQL- driven. Predic+ve analy+cs on the other hand uses uses data mining or sta+s+cal algorithms to score data with models and forecasts. Both of these approaches answer the ques+on of what What happened? What will happen? With a more open- ended process, inves+ga+ve analy+cs, in comparison, answers the why: Why did it happen? 8
InvesGgaGve AnalyGcs What has happened and why? IteraGve, quickly changing queries (usually ad hoc) OperaGonal AnalyGcs PredicGve AnalyGcs What happened? Alerts, KPIs, standard reports What is going to happen? AutomaGc calculagons during live transacgons A Connected AnalyGcs Landscape Swoyer describes inves+ga+ve analy+cs as an open- ended ac+vity that looks for pamerns, anomalies, and clusters (i.e., for clues) that can be used to formulate ques+ons or which can be correlated with events, condi+ons, or phenomena. With inves+ga+ve analy+cs, users can ask a series of quickly changing, itera+ve ques+ons to figure out why something did or did not happen and how to op+mize a par+cular outcome in the future, resul+ng in deeper and richer insight. 9
CHAPTER TWO TOP FIVE REQUIREMENTS FOR INVESTIGATIVE ANALYTICS 10
Number 1: Low Touch X Low- touch minimal DBA requirements with a self- tuning system The extensive effort needed to fine tune with indexing, par++oning and sharding can all get in the way of effec+ve, efficient analy+cs. In a +me of s+ll- constrained budgets, data analysis needs to be affordable, as well as easy- to- use and implement, in order to jus+fy the investment. This demands low- touch solu+ons that are op+mized to deliver fast analysis of large volumes of data, with minimal hardware, administra+ve effort or customiza+on needed to set- up or change query and repor+ng parameters. The cool thing is that it can produce a new report which produces a new ad-hoc query and I don t have to worry about performance because Infobright takes care of all that for me. - Bob Hammond, CTO, Jumptap 11
Number 2: Ad- Hoc Performance FricGonless Inquiry: Move from quesgon to answer, quickly. In fast- paced business and opera+onal environments (smart grids are a great example), intelligence needs change quickly, so analy+c tools can t be constrained by data schemas that limit the number and type of queries that can be performed. Tradi+onal data solu+ons like standard, row- based rela+onal databases fall short here, as they were designed to handle single- record, structured data. Big data analysis requires a flexible solu+on that allows for unplanned, ad- hoc querying, and that doesn t require a lot of +nkering or +me- consuming manual configura+on such as indexing and managing data par++ons to create and change analy+c queries. Enter fric+onless inquiry, where the path between ques+on and answer is void of rigid structure: when users reach the aha! moment, they ll have all the informa+on needed to ask the next ques+on or dig deeper into data, without having to call IT or the help desk to create a new query. 12
Number 3: Dynamic Scalability Scalability: Inherently respond to increased load along any of these axes query performance, number of users, number of records/size of data. As demand for inves+ga+ve analysis of big data increases, businesses need highly scalable solu+ons that can handle current and future data growth. At some point, tradi+onal, hardware- based infrastructure will run out of headroom in terms of storage and processing capabili+es. However, adding more data centers, servers and disk storage subsystems is expensive to buy and maintain, crea+ng a situa+on where costs begin to outweigh the benefits. 13
Number 4: Load Speeds Machine- generated data is loaded very, very quickly and oten needs to be inves+gated within a short period of +me for example, a mobile carrier who wants to automate loca+on- based smart phone offers based on incoming GPS data. If it takes too long to process and analyze this kind of data, the resul+ng intelligence will fail to be useful. Businesses can t afford for data to get stale. Solu+ons must be able to quickly and easily load, dynamically query, analyze and communicate informa+on quickly enough to provide for whatever real- +me query processing or aler+ng is required. Within 60 seconds of data hitting Infobright customer HasOffers tracking platform, customers are able to run ad-hoc queries and get results that they can use to make better business decisions in real-time. 14
Number 5: Compression Economical storage of big data requires very efficient data compression within a network node, smart device or even a massive data center cluster. Efficient compression lowers TCO, allowing for less storage capacity and minimized networking and hardware investments. In addi+on, efficient data compression increases the accuracy of query results by enabling +ghter data sampling increments and longer historical data sets (e.g. accommoda+ng for situa+ons like seasonality in retail.) By capturing more data at lower granularity levels e.g. one second vs. one hour businesses will be able to iden+fy pamerns that exist at lower levels (which may have previously been missed due to storage constraints.) 15
CHAPTER THREE BIG DATA, INVESTIGATIVE ANALYTICS CASE STUDIES 16
Mavenir Overview Mavenir s Converged Messaging SoluGon Mavenir Systems provides innovative mobile convergence solutions that enable mobile operators to offer subscribers new and enhanced services and applications. 17
Mavenir Challenges Mavenir s goal was to drive more revenue by offering a solu+on to mobile operators that allows them to retrieve detailed SMS records for customer service and regulatory compliance. They needed an analy+cs solu+on to: Quickly load and store large volumes of detailed data Capacity in excess of 3 billion messages per day Peak periods like Chinese New Year can generate over 70 million messages in an hour Make that data available for analysis within minutes Store 90 days worth of data with a small hardware footprint Handle projected 70% growth rate in mobile messaging Have low TCO including low storage and license costs Data storage is a big issue for mobile operators, and it s only going to get more challenging as the use of messaging continues to explode. Payam Maveddat, VP of product management at Mavenir Systems 18
Mavenir SoluGon: Infobright Enterprise EdiGon (IEE) Data Compression & History Keep 90 days of data stored in less hardware footprint due to dras+c compression Ge]ng Data in and Out Quickly 20k records per second at peak capacity in ini+al release Current itera+on is 100k records per peak Projected 70% growth plan Load from event/log files every 5 minutes, making available in near- real +me Reducing Capex & Opex No indexes, data par++oning or manual tuning No need for DBA resources to manage the database on an ongoing basis Low licensing costs TCO only 20% of the cost of compe++ve solu+ons Mavenir has won major wireless carriers such as MetroPCS, Telstra and Viettel based on this solution. 19
LiveRail Overview LiveRail is a mul+- plaborm, real- +me video adver+sing ecosystem providing: Real- +me bidding Yield op+miza+on Ad serving analy+cs Private exchanges LiveRail is the leading publisher monetization platform for video delivering over three billion impressions 25% of all online video ads each month. 20
LiveRail Challenges With a growing roster of customers including PBS, MLB.com and CBS Interac+ve LiveRail was faced with managing increasingly large data volumes and a need to provide clients with near real- +me access to this informa+on for repor+ng and ad- hoc analysis. 10 billion monthly video ad opportuni+es 2 billion data points each day Dozens of engagement metrics including percentages Viewed/completed Pause/resume Mu+ng Publishers needed the ability to drill down with near real- +me access to determine op+mal video length, as well as determine whether there is a correla+on between comple+on rates and ad frequency. Infobright gives our customers the ability to do fast, ad-hoc analysis against the extensive video advertising data. - Andrei Dunca, CTO of LiveRail 21
LiveRail SoluGon: Infobright IEE + Hadoop Data Compression & History 25X space reduc+on Or 25X more history online Analyzing Data Quickly 20,000 ad- hoc/real- +me reports per day run by customers Reports that used to take two to three minutes now take seconds Reducing Capex & Opex No indexing or tuning required Fewer servers or storage disk required Lower licensing costs than alterna+ves Low- touch, simple administra+on LiveRail recognized with Computerworld Data+ Award 22
In Summary Big Data and Investigative Analytics Big data demands a big change in thinking. Companies that maintain their status quo of analy+cs technologies and processes will find themselves spending progressively more money on servers, storage and DBAs an approach that s difficult to sustain and s+ll presents the risk of not ge]ng the needed answers. Gone are the days of simply seeking the what from an analy+cs solu+ons. Today, companies can and need to know why. Inves+ga+ve analy+cs are the key to revealing pamerns of behavior or insights to immediately take ac+on on, and either capitalize on or prevent in the future. To extract rich, real- +me insight from the onslaught of machine- generated data, companies require a technology founda+on characterized by five requirements: Low- touch administra+on Flexible, ad- hoc querying Dynamic scalability Fast, reliable performance Efficient compression When there s more and more data to mine, inves+ga+ve analy+cs cut through the clumer with precision, ensuring accurate, immediate results, even as machine- generated data grows to the petabyte scale and beyond. By maximizing insight into data, companies can make bemer decisions at the speed of business, thereby reducing costs, iden+fying new revenue streams, and gaining a compe++ve edge. 23
See how JDSU and others are using Infobright to meet their investigative analytics needs and drive business value. HAVE QUESTIONS? Find us on the web: www.infobright.com Contact us: 877-596- 2483 / info@infobright.com A Publication of Infobright 24