mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what s needed in the business analytics layer of a Big Data platform. For more information about how this layer relates to others in a Big Data platform please refer to the corresponding papers in this series: Navigating the Big Data infrastructure layer and Turning Big Data into Big Insights. Finally, for more information about the opportunities and challenges posed by Big Data for organisations today please refer to the first paper in the series, Unlocking the potential of Big Data. This is a special report prepared independently for Actuate. For further information about MWD Advisors research and advisory services please visit www.mwdadvisors.com. MWD Advisors is a specialist advisory firm which provides practical, independent industry insights to business analytics, process improvement and digital collaboration professionals working to drive change with the help of technology. Our approach combines flexible, pragmatic mentoring and advisory services, built on a deep industry best practice and technology research foundation. www.mwdadvisors.com
Navigating Big Data business analytics 2 Summary Understand the business need of Big Data The promise of a Big Data platform is that it takes in its rawest form and converts it into consumable, actionable information Business analytics brings potential to Big Data Tool choices are dependent on a range of factors To really get to value from your Big Data you first need to understand how this new world of varied and voluminous sources can potentially solve problems or create opportunities within your business. It requires you to not only make sense of your by analysing it and deriving meaningful insights from it, but to be able to apply those insights in a business context in a timely and impactful way. The concept of a Big Data platform provides a technology framework for taking in its rawest form, transforming it and putting it in a format where it can be consumed and acted upon by decision makers. Three core layers are required to support these capabilities: the lowest layer is responsible for the storage and organisation of ; the middle layer is where the of that occurs; and the upper layer is where insights are discovered and consumed. This report focuses on the second: the business analytics layer. Business Analytic tools help bring understanding and meaning to your Big Data. Technologies such as predictive analytics, for instance, can analyse and model Big Data to help make predictions about future events, whereas visual analytics tools can help identify trends or patterns in large volumes of more easily, and text mining and natural language processing can be used to understand sentiment and extract meaning from textual. What tool you choose will ultimately depend on the problem your business is trying to solve. But equally it will also need to take into account other technical factors such as the type of being analysed (whether it s structured or multistructured, for example) and also the scope of being performed (such as whether it involves real-time, exploratory or advanced analytic techniques). Being able to understand and map both business and IT requirements to your Business Analytic tool choices remains an important part of any Big Data initiative.
Navigating Big Data business analytics 3 Technology cost and sophistication driving the Big Data train As outlined in the first report in this series, Unlocking the potential of Big Data, in spite of all the headlines and vendor rhetoric, the ability to manage growing volumes of is not a new phenomenon for organisations today. In fact, many early adopters of Business Intelligence (BI) and warehousing technology (especially in the retail, telecoms and financial services industries) have long been accustomed to capturing and managing large volumes of. Yet in spite of this we still see the rise and rise of Big Data as a seemingly relatively new concept so what has changed? Through their own technology innovations, web and social -driven businesses such as Google and LinkedIn have shown us how to process Big Data sets (in their case web searches) on massively scalable storage and computing platforms using commodity hardware. Their technology expertise and success is the inspiration behind open source Big Data technologies such as Apache Hadoop and its ecosystem of tools (which we introduce in more detail in the second report of this series, Navigating the Big Data infrastructure layer). The challenge of processing certain kinds of Big Data has also driven other technology innovations related to massive parallel processing architectures, in-memory analytics, columnar bases and complex event processing platforms. All of these pieces bring more choices to organisations that want to advance their use and management of Big Data. Similarly, enhancements in predictive analytics, text mining and advanced visualisation tools make the exploitation of Big Data more straightforward by making it easier to discover hidden or interesting patterns and insights that, in turn, can be used to enhance productivity, drive efficiencies and growth, and create a sustainable competitive advantage. Figure 1: Drivers of broader Big Data adoption Source: MWD Advisors But it s not only technology developments spurring the advancement of Big Data; as figure 1 shows, the deployment economics of technologies are equally important. In particular, the decreasing cost of storage and memory, alongside the scalability of cloud computing platforms and appliances together with the growing influence of open source tools brings the promise of lower cost and more affordable Big Data platforms. The opportunities of Big Data are opening up to a wider audience, as it becomes more economically feasible to exploit, manage and leverage Big Data especially for those organisations that may have been priced out of this activity previously.
Navigating Big Data business analytics 4 A Big Data platform has three layers Most of the commentary around Big Data has focused on the type of under management whether structured or multi-structured (defined as stored and organised in a multitude of formats, including text, video, documents, web pages, email messages, audio or social media posts, and so on), or real-time or in-motion. However, before any decision can be made about what kind of information and technology capabilities are required to support this there needs to be agreement and buy-in about what you want to achieve from your Big Data initiative. At the very least it needs to be framed by a clear strategy that helps outline how and analytics can be tied to a particular business challenge or opportunity that needs addressing. This in turn provides the starting point from which organisations can assess the technical implications of their Big Data effort, for example by examining how can be transformed from its raw state to a point to where it can be consumed and acted upon. To support this capability a Big Data platform needs to provide capabilities for: Capturing, processing and storing Exploring and applying advanced analytics techniques Discovering and consuming insights. Today these activities are supported by a multitude of technology components some of them are relatively new, while others are based on existing technologies and architectures. In figure 2 we bring these concepts together as part of an overall Big Data platform with three layers. The lowest layer is concerned with organising and storing ; the middle layer is where the of that occurs; and the upper layer is where insights are discovered and consumed. Figure 2: Capabilities of the Big Data platform layers Source: MWD Advisors Although these capabilities aren t necessarily new to BI and warehousing practitioners, it s become apparent that the old models for storing and analysing don t necessarily apply to all Big Data assets. Not only is the amount of vast and potentially more time-sensitive in nature, but the variety of to be managed can be far greater and this is markedly changing the requirements of the technology needed. This report focuses principally on explaining what s needed in the analytics layer of a Big Data platform. Please refer to the other papers in this series for an explanation of the other two layers.
Navigating Big Data business analytics 5 Getting to grips with Big Data business analytics Within the Big Data analytics layer, technologies extract value from by exploring, modeling and analysing it. Assuming that your company has been successful in organising and storing its Big Data assets then it s at this point that the comes to life and organisations have the potential to unlock valuable insights within it. However, before any decision is made about what technology to use, any organisation embarking on a Big Data initiative needs to be clear about the business challenge or opportunity they are trying to address through its use, whether it s about devising a more profitable pricing strategy, offering more sophisticated product recommendations, improving fraud detection or being able to apply more granular customer segmentation to your. Once this has been established you can then look towards how business analytic technology can help support these aims and objectives. What technology you use to support the of Big Data, however, depends on two key factors: the type of that is required for (such as whether it s structured or multi-structured ), and the use cases driving the need. To help assimilate a picture of what technology fits where in a Big Data analytic environment, it s worth classifying and grouping the different types of that can be performed with these technologies. Our research suggests that three broad categories are prevalent: is a practice focused on applying sophisticated algorithms such as machine learning, predictive modeling or natural language processing algorithms to Big Data (either structured or multi-structured) to solve a particular business problem or maximise an opportunity. It can be performed by both line-of-business and/or IT users and is focused on identifying a specific goal such as predicting churn, identifying a customer s propensity to respond or understanding consumer sentiment before the analytics process can begin. Real-time is focused on using technology enablers such as in-memory or event stream processing engines to facilitate the rapid ingestion and/or of where the results are served up in real time to a user (such as an online product recommendation, for example), or equally where the results are served up to business users in dashboards where the information is used to drive decision-making. Exploratory differs from traditional BI query and reporting as it centres on exploring a complete set of less well understood (rather than a sample), to determine what has value, and where the hidden patterns and trends lie within that subset without any constraints as to what those patterns or trends may infer. Exploratory may be performed in an academic or research setting and hence requires a different mindset, one where an analyst or scientist can be more creative in their and one where they don t always have a clear understanding of the questions they want to ask from the. Table 1 below provides an overview of the key technologies you should consider as part of your Big Data analytics layer. As you can see from the table, Big Data encompasses a whole range of technologies and tools. Some, such as predictive analytics or SQL tools, are well established, whereas others especially where the of multi structured is required shine the spotlight on a newer breed of Big Data technologies such as Hadoop Hive or text analytics.
Navigating Big Data business analytics 6 Table 1: Big Data analytics options Big Data Analysis technology Key Facts Predictive and advanced analytics The main goal of predictive analytics is to develop a model using a combination of sophisticated analytic algorithms, statistical models and mathematical calculations that analyse current and historical facts to make predictions about future events. Some base vendors support the execution of advanced analytics within the base (typically within SQL-based MPP bases) to take advantage of parallel processing capabilities of the source base to speed up query processing times. Today an increasing number of analytic applications are also being built in Hadoop HDFS using the MapReduce paradigm in languages such as R or by utilising Apache Mahout, an open source project providing a library of scalable machine learning and mining algorithms. In-memory visual analytic tools Text analytics Underpinned by an in-memory base, these tools support advanced users in the interactive on-the-fly exploration and of large, complex structured sets to help pin point trends, segment the set, and identify outliers and hidden patterns far more easily and often in real time. Text analytics applies linguistic rules and statistical methods to automatically assess, analyse and find patterns found within large quantities of electronic text such as those found within social media posts, emails, and call centre notes. The process of analysing text usually involves parsing and filtering the text, understanding and extracting its meaning in a structured form for use and in a store such as a warehouse. Sentiment that utilises Natural Language Processing (NLP) techniques is a growing branch of text analytics used to extract linguistic subjective information about opinions, attitudes, emotions and perspectives from text. SQL Event stream processing SQL is the primary query language used by most BI and analytics tools as well as a lot of business analysts. While it is primarily used to query structured, today many vendors are increasing support for querying Hadoop directly using SQL, for example by supporting a Hive interface which allows SQL to be converted to a MapReduce program and processed within Hadoop. This technology detects events or patterns of events as streams through transactional systems, networks or communications buses, before correlating and analysing the so an appropriate action can be taken to minimise risk or maximise an opportunity, for example. Analysis of occurs when the is in-motion, i.e. before the is usually stored in a base or file system, and is often used in conjunction with other technologies such as business rules, predictive analytics and optimisation techniques to help organisations automate and guide decision-making processes, for instance around detecting fraud, managing risk, optimising pricing and strategic process improvements. Mapping Big Data technologies to analytic use cases To help explain how these analytic use cases impact and map to your Big Data technology analytic choices, the following table takes a look at some sample Big Data applications and details what makes each technology option particularly suitable for this form of. As always this should only be used as a guide as it does not take into account other factors such as interoperability with existing tools and infrastructure, budget, and skill levels that will also naturally dictate technology choices. For a more detailed explanation of each storage component mentioned please refer to the other paper in this series, Navigating Big Data infrastructure.
Navigating Big Data business analytics 7 Table 2: Big Data applications and supporting technologies Example application area Usage scenario Example type Example technology option Customer Churn Structured Predictive mining models that analyse transactional, behaviour, demographic and social interaction can take advantage of the in-base analytics and parallel processing capabilities of the SQL MPP base to run and score customers to identify those that are at risk of churning. Marketing campaign Structured In-memory visual analytic tools can be used to analyse revenue by market, campaign, or other attributes to help improve campaigns and market segmentation as well as identifying segments in the customer base that can be used to tailor marketing messages to particular groups or markets. Click stream analytics Multi structured and structured Hadoop MapReduce programs written in R can support the parallel processing of large amounts of web log files where insights into navigation behaviour are extracted and combined with existing customer from the warehouse to support activities such as website optimisation and conversion rate. Product affinity Multi and structured Statistical methods are used to determine the relationship between different products and/or product features based around customer purchasing patterns, interaction, and transaction. This can then be analysed using visualisation tools to identify opportunities for cross-selling and up-selling, for example. Real-time sentiment Real-time Structured and multi-structured Event stream processing technology that combines sophisticated analytics and natural language processing technologies can be utilised to enable real-time opinion mining on millions of public tweets to gain a view into brand performance that in turn can help organisations understand target audiences and shape decisionmaking. Real-time offer management Real time Structured and multi-structured In-memory technology and advanced analytics tools can be used to calculate loyalty card points in real time so that when a customer enters the store, they are provided with real-time offers based on loyalty status and specific store inventory. On-line recommendation engine Multi-structured HDFS can be used to store and process huge volumes of online behaviour and used in conjunction with Mahout s library of machine learning algorithms (which operates on top of Hadoop) and the Pig language to recommend complementary products based on predictive for cross-selling. Customer segmentation Real-time Structured In-memory visual analytic tools can query and analyse large amounts of structured providing a fast and interactive way to segment customers based on behaviour, or attributes of customer to help quickly identify potential growth or profitable customer segments. Drug research Exploratory Multi Structured Hadoop MapReduce can support the processing and interpretation of large amounts of research. The ability to easily and economically store in its rawest form without the need for rigid formatting means analysts can focus their efforts on building hypotheses and exploring what questions could be asked of that. On-line recommendation engine Multi-structured HDFS can be used to store and process huge volumes of online behaviour and used in conjunction with Mahout s library of machine learning algorithms (which operates on top of Hadoop) and the Pig language to recommend complementary products based on predictive for cross-selling.
Navigating Big Data business analytics 8 In many ways the problems a business is trying to solve will dictate the kind of architectures and business analytic technologies employed. As the table above demonstrates, it s possible to use a range of technologies and tools to satisfy your needs, some of which can be supported through traditional analytic tools, whereas others will require the introduction of new analytic practices and tools, especially where the scalability, performance and capabilities of existing analytic tools have run out of steam. Tapping into the potential of Big Data business analytics Although the breadth and variety of Big Data analytics options available to organisations is not in question, technology choices should only form part of the equation when it comes to assessing how you move forward with a Big Data project. To really get to grips with Big Data you first need to understand exactly how you can get value from large volumes of, very complicated, or very fast-moving (or a combination of any of these) prevalent across the organisation. It s an effort that requires organisations to improve their literacy by finding ways of understanding how this new world of Big Data can potentially solve problems or create opportunities in their business. What it boils down to is the need to not only make sense of and derive meaningful insights from it, but to be able to apply those insights in a business context. As we will see in the next report, Turning Big Data into Big Insights, this is an evolving area and one in which we expect both enterprises and vendor support to develop over time.
Navigating Big Data business analytics 9 Key considerations when planning your Big Data business analytics investment Big Data encompasses a whole range of technologies and tools. Some, such as predictive analytics and visual analytics, are well established, whereas others especially where the of multi-structured is required shine a spotlight on a newer breed of emerging Big Data technologies such as Hadoop MapReduce, R or Mahout. Today no one single technology platform can support the entire range of Big Data use cases, so expect to extend your existing BI and warehousing environment to incorporate these newer analytic components an effort that will increase demands on and application integration capabilities across a more diverse analytic environment. The options available for applying sophisticated advanced and specialised analytics to Big Data are growing as support for running predictive analytics and machine learning algorithms both in-base or in-hadoop (for example by using Mahout, Knime or R) increase. Be aware, however, that this will require you to step up your analytical practices and the type of skills employed within your analytics team. Processing and analysing text, such as conducting sentiment on social media, promises to open up new sources of intelligence for many organisations. It uses techniques such as natural language processing (NLP) to understand the opinions, attitudes and intent within text and is often used to understand the voice of the customer. However, no tool can fully automate this type of ; it still needs a human touch, and one that blends the power of machines with human intelligence and looks to build, train and evolve the tools language and linguistic capabilities over time. The unconstrained nature and scalability of the Hadoop environment and its associated technologies provides an ideal platform for iterative and exploratory. For example, it can be used to support analysts and scientists in their quest to uncover nonobvious relationships in the, detect hidden patterns and generate new theories, hypotheses and experiments based on a full set of rather than just a selected sample. Event stream processing software is a valuable technology for continuously analysing as it is received and hence is often used for mission-critical and decision management applications such as real-time fraud detection, sentiment and risk management. However, while this technology supports streaming and analysing in motion, consideration also needs to be given to the speed of the feedback loop that is, the ability of a user or organisation to act on the information within an appropriate timescale otherwise its value could be lost. Above all, before you embark on your Big Data analytic journey consideration also needs to be given to the readiness of your organisation to deal with the deluge. This, amongst other things will involve developing the necessary skills or 'literacy' across your organisation to be able to understand how to value, its quality or validity, and how it can be utilised to make more effective, accurate and informed business decisions.