1 BIG DATA: IT MAY BE BIG BUT IS IT SMART? A Manifesto for Smart Data Colin Strong
2 In a relatively short space of time computing has become ubiquitous, leaving behind it an indelible trail of our activities. From the obvious electronic devices we use every day (smartphones, ebook readers, laptops, etc.) to the trails of our interactions with businesses from the items we purchased to who we communicated with, the potential to paint a detailed picture of our lives has never been so easily achievable. INTRODUCTION 01 Big Data is the term loosely used to describe this exponential increase in data volumes, alongside the growth in our ability to transfer, store and analyse. It seemingly promises to deliver invaluable insight that would not otherwise have been possible by conventional means. Of course, speculation of the commercial benefits of using Big Data has been widely debated, with one key study indicating that organisations applying analytics to Big Data are over twice as likely to substantially outperform their industry peers (1). Furthermore, a recent global survey found business executives typically believed that Big Data will improve organisational performance by 41% over the next three years (2). Yet despite the high expectations and obvious benefits delivered by Big Data, we consider that, in its current form, there are limitations in the extent it can be used to understand effectively the consumer. As such, the vast potential value of Big Data will not be realised. In this document a case is made for a new perspective to the area that addresses this; a manifesto, in fact, for Smart Data. But first, let s look at the benefits and opportunities that Big Data has delivered. The potential to use Big Data approaches in traditionally non-digital spheres is increasingly accepted. The retail experience demonstrates this; whilst optimising the retail proposition and experience is achievable online using the huge amount of data available, it has been more difficult to achieve in the bricks and mortar store. Here, however, innovations increasingly enable us to generate Big Data through innovations such as image-analysis of in-store cameras to monitor traffic patterns, tracking positions of shoppers from mobile phone signals, shopping cart transponders and use of RFID.
3 The arrival of the Big Data era has generally been considered positive, creating many new opportunities for businesses to profit from the insight it can deliver. So what are some of these benefits? Identifying the otherwise unknown THE OPPORTUNITIES DELIVERED BY BIG DATA 02 Big Data has been applied to good effect across many commercial and scientific fields, exploring vast volumes of data to identify relationships that might otherwise be elusive. American Express, for example, has used Big Data to look at behaviours that may be predictive of defaulters. It found that people racking up large bills on their American Express card and then registering a forwarding address were likely to declare bankruptcy. This is due to the liberal bankruptcy laws in the State of Florida; the correlation in the data allows companies such as American Express to take action early on (3). Looking at an example in the context of science, many argue that the Higgs Boson discovery (a pretty elusive target it s fair to say) was facilitated by our ability to handle Big Data, with the four main detectors at the Large Hadron Collider producing 13 petabytes of data in a year (4). Big Data is perceived to reduce uncertainty more than traditional research methods, its sheer volume providing credibility to the robustness of findings. Indeed, results from large datasets are likely to be more convincing than research conducted with only a few thousand surveys, even if sampling is statistically representative of the universe. Visualisation of data is often a key element of Big Data, allowing patterns to be more easily observed. For example, researchers using Google Earth were able to observe that two thirds of cows around the world align their bodies with magnetic north, a finding that is unlikely to have been discovered without the visualisation element (5).
4 Profiling and targeting Perhaps the biggest opportunity for Big Data is the profiling and subsequent targeting of consumers. In one famous example, the US retailer, Target, became highly efficient at identifying pregnant women in their second trimester, a key time at which the latter tend to make a lot of pregnancy and baby-related purchases. Target was able to spot these customers by looking at subtle changes in the pattern of purchases, with about 25 purchases being good predictors of second trimester pregnancy, such as unscented body lotion and various supplements. This was so successful that in Minneapolis in the US, a father complained to Target that they were sending his daughter offers for baby clothes which was not appropriate as she was still at High School and clearly not pregnant. When the manager called to apologise the previously angry father was now himself apologising as he had discovered his daughter was due to give birth. So one could argue that Big Data is so effective it can identify these sorts of life events even before close members of the family recognise them (6). Forecasting the present There is an increasing realisation that in certain contexts Big Data can provide a good account of the present but in pretty much real time. A good example is the use of Google Flu Trends where the tracking of flu-related search trends enabled flu outbreaks to be predicted at least a week or two before official reports. Hal Varian, Google s chief economist, considers that these can also be used to predict future economic trends and indeed help a company to outperform the market as To make money, you ve got to predict two things what s going to happen and what people think is going to happen. You only make money by beating that spread (7). Nowcasting, as Google calls it, is likely to become a progressively important tool to assist strategy planning. Improving the consumer experience Big Data is helping to design more effective consumer experiences by offering greater immersive consumer interaction. Many online retailers now use the Big Data relating to their customers journeys through the store to better shape their online experiences. This is typically achieved by undertaking iterative studies creating experimental and control conditions to test new product or pricing strategies (8). For example, Progressive Insurance and Capital One are known for conducting experiments to segment their customers systematically and effectively, tailoring product offers accordingly (9).
5 HOW MUCH CAN BIG DATA REALLY DELIVER ON ITS OWN? 03 The excitement of the possibilities of Big Data has led some to question whether it is revolutionising the way we conduct science. This perspective is possibly best captured by a now famous article in Wired magazine by its editor-in-chief, Chris Anderson (10). He argues that in the face of Big Data, traditional scientific method has effectively lost its value and throws down the gauntlet to those in data analytics: We don t need new models, we can simply sift through to find meaningful correlation.it s time to ask: What can science learn from Google So on this basis, we no longer need to understand consumers or hold theories of human behaviour; we can simply use large computers to uncover the important patterns and trends. Indeed, in his polemic he suggests that some sciences have drifted into arid, speculative theorising with the implication that in the meantime Big Data is breaking new ground. Whilst clearly a controversial statement, it has nevertheless resonated through the Big Data community, shaping an assumption in many quarters that if we can see data patterns and there is value in those patterns, then does it matter? If it is delivering value then why not indeed? Target s revenues went from US$44billion in 2002 (when Pole, the analyst responsible for their enhanced profiling, was hired) to US$67 billion in 2010 (11).
6 So what s the problem you may ask? Is this not what market research has aimed to do but with an at times less convincing track? And amidst all this, what is the role, if any, for market research? The answer is perhaps that Big Data is not the universal panacea that is sometimes presented. Rather, it has its own challenges that need to be addressed. The heritage of commercial Big Data is firmly rooted in highly numeric disciplines including statistics, computer science, applied mathematics, and economics (12). Whilst to some minds this may sound multidisciplinary, there is clearly a distinct lack of a social science representation. This is more than a quibble; it is perhaps represents a limitation in the Big Data mind-set to the extent that we need to fundamentally reconsider our approach to Big Data and make the case for a Smart Data agenda. Let s explore this in a little more detail. What versus why A key issue with the mainstream approach to Big Data is the apparent reliance on analysing associations. These associations may be straightforward correlations to more sophisticated approaches such as segmentation, regression, pattern recognition, signal processing and spatial analysis (13). But the problem is that association does not always get it right. Illustrative of this is a dated but nevertheless well-known and amusing example from an article that appeared in the Wall Street Journal titled My TiVo thinks I am gay (14). This detailed a TiVo customer who realised that his TiVo recommendation system had identified him as gay (he was not) as it kept recommending gay-themed movies. Clearly, work can be done to enhance the sophistication of these systems is evolving but it does open the debate whether computer systems alone can ever reflect the subtlety
7 of human choices. Which leads us to the wider point that associations do not equal causality or, in other words, what is not the same as why. Big Data analytical approaches, whilst often highly descriptive and highly useful, are perhaps potentially limited. Danah Boyd, a senior researcher at Microsoft Research says in one of her influential blog posts on the topic, We must continue to ask why questions that cannot be answered through traces alone (15). This is a key point in our Smart Data manifesto. A Smart Data approach does more to simply describe human behaviour. Rather, it generates an understanding of an underlying set of principles concerning consumer behaviour that will provide guidance across a range of scenarios. Understanding the context The idea that Big Data can deliver a truth uncontaminated by the challenges beset from other data collection approaches raises concerns. First is the notion that data can be analysed without reference to the context, with every data point treated equally. So for example, when looking at the relationships between different people in a network, it is easy to assume that the frequency of contact is equivalent to the strength of relationship. However, we know this is not the case as otherwise, our strongest relationships would always be with our work colleagues which is patently not the case. Also, as Jeff Jonas, chief scientist of the IBM Software Group, has highlighted that all data sources are prone to some level of error (16). A sales database will typically not include all sales whilst a social media extract might just include the first 25% of transactions. Clearly if you don t understand the limits of the data you have then there is a big danger of misinterpreting it. A Smart Data approach very much relies on the accepted social science practice of understanding the source and representativeness of data sets. Making judgements
8 Another area which Big Data needs to address is the potential for apophenia, which is when we see patterns in data where in fact none exist. As humans. we appear to be predisposed to pull out patterns from random data. By simply looking at patterns without guiding principles there is clearly the potential to draw the wrong conclusions (17). Furthermore, sitting underneath any type of data analysis is a multitude of judgements which parts of the data set do you look at, how do you aggregate different pieces of data, which metrics do you create? These are not decisions made in a vacuum but influenced by the mind-sets of the team pulling together the Big Data analysis. So to return to Microsoft s Danah Boyd, Interpretation is at the centre of data analysis. Regardless of the size of the data set, there is always potential for limitation and bias. Without these biases and limitations being understood and outlined, misinterpretation is the result. (18) The Smart Data manifesto does not provide a panacea to this issue. Instead, it recognises that any finding is fundamentally a working understanding rather than a final truth. This seemingly subtle distinction creates a very different set of working practices more akin to the social sciences. Hypotheses are tested and any conclusion is heavily scrutinised to avoid inaccurate conclusions being drawn. Finding the right people And last but by no means least, are the right people available to do the analysis? In an interesting study titled Good Data Won t Guarantee Good Decisions published in Harvard Business Review (19), the Corporate Executive Review Board explored the types of people that they considered to be most suitable for analysing data sets. They found three broad categories of employees (based on an evaluation of 5000 employees across 22 companies); unquestioning empiricists trust analysis over judgement, visceral decision makers tend to go instinctively with their gut, while informed sceptics balanced judgement and analysis and, as such, possess good analytical skills. All very well but the Board found that only 38% of employees and 50% of senior managers fell into this latter category. Which led them to identify four issues that prevent companies from getting a better return on their Big Data investment: Analysis skills are concentrated in too few employees with only a handful of skilled analysts being equipped to undertake Big Data analysis
9 There is too little focus on managing the information side of the equation by the IT function. Consequently, it is difficult to interpret the often poorly articulated requests and diverse data demands from across the business to meet the demands The proliferation of channels has led to an explosion of data but almost half of employees say they don t know where to find the data they require for their job There is a lack of investment by senior management in managing information within the company These all play to the issue of needing the right sort of people to properly interpret Big Data. The mechanisms to analyse and understand Big Data don t run themselves. Individuals are needed to identify the appropriate inputs, determine the relevant processes and analysis tools, and intelligently interpret the outputs and implications. Strategic insight director of Sky IQ, Greg Meggs, highlighted this in a recent WARC paper, identifying the need for Leadership, coaching and a library of insight as areas of investment additional to technology. (20) The work of the Corporate Executive Review Board implicitly supports a Smart Data approach, as informed sceptics are clearly using the right sort of analytical approach to effectively leverage the value of a brand s data assets. The advent of Big Data has brought with it the ability to analyse the world from a unique standpoint. However, the benefits of Big Data should not cloud from vision the fact that it has its limitations; the danger is it s viewed as the elixir of the time and that the collection of survey data is somehow considered inferior to the CRM data held internally by companies. Indeed, on this very point, the paper by Brynjolfsson, Hitt, and Kim, indicated that Big Data makes customer behaviour and customer-firm interactions visible without having to resort to costly or ad-hoc focus groups or customer behaviour studies. (21)
10 WHAT ARE THE OPPORTUNITIES FOR MARKET RESEARCH? 04 Big Data cannot simply be left to the mathematicians and computer scientists alone. The threats and opportunities Big Data provides to the market research community have been debated. Currently, a consensus appears to focus on the ability of Big Data to dispense with fiddly, timeconsuming and expensive market research. So let us turn to the opportunities that Big Data creates for the market research community. Providing the right skills pool There has been plenty of speculation around the demand for a new breed of data scientist to make full use of Big Data. For example, McKinsey estimate that in the US alone there is a shortfall of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data. (22) The skills associated with market research are very much in the right category for managing Big Data. Ability to understand and manipulate large data sets, good analytical skills, ability to interpret the findings and apply them to the business, are all familiar territories and form the basis of a Smart Data approach. Some research agencies are now in the position where they are handling Big Data on a daily basis. As increasing levels of digital data is captured, agencies move into the role of aggregating their clients Big Data. So the ability to handle these sorts of large data sets is increasingly available within the market research community.
11 Intelligent Integration: adding the why to the what The need to understand the context and meaning of the behaviours that are observed in Big Data will rapidly become critical. Indeed, meaning will not be derived from looking at data sets in isolation; instead, they need to be brought together in order to generate maximum value. For example, overlaying share of shelf data with point of sales data creates insights that can help drive the retailer-manufacturer relationship in-store, potentially increasing profitability for both parties. Similarly, taking geo-location data (e.g. whereabouts in London a consumer is) with their consequent mobile / web behaviour (e.g. what are they searching for online) creates insights that have the potential to dramatically change the dynamics of businesses in a given location. An understanding of consumers needs in a specific area can help drive targeted sales or improve customer experience. Arguably there is only so much nuance that a purely behavioural analysis of data will provide. The consumer intention that is driving activity can only be properly understood by talking to that consumer. Here, as Roy Poynter identifies, lies an opportunity for a wide range of providers from anthropologists to market researchers, behavioural economists to semioticians, and social scientists to trend hunters. (23) Without understanding the consumer mindset there is a danger that Big Data never becomes Smart Data and as such the value for marketeers is limited. Intelligent integration brings consumer and research knowledge together with Big Data to avoid the risk of chasing the wrong questions and finding false positives. Appreciation of underlying assumptions within the data and of the motivations behind consumer actions are likely to improve the way findings are understood and communicated within a strategic context.
12 The combination of behavioural data tracking and survey work presents huge new opportunities to generate insights that would previously have been unattainable. For example, we are undertaking this type of integrated approach to assess the impact of sponsorship of the Olympics on the brands awareness and disposition. Survey work is used to assess brand positioning pre and post-olympics (as well as tracking their offline behaviour during this time) whilst online behavioural tracking tracks online exposure to sponsorship. We can then explore the role that the online exposure has to play in subsequent measures of the brands performance. This provides highly effective research that provides very clear indications of the ROI of sponsorship that would have been difficult to achieve by other means. Creating an holistic view of the customer In a recent MRS conference paper, Poynter and Kaylor (24) argue that that Big Data is a reflection of a growing importance of focus on a brand s customer base, as companies move to their gaze from acquisition to retention. On this basis, they envision a role for Extended Community Panels where the customer base in its entirety is seen as the community. There are tiers of integration between a brand s Big Data assets, customer interactions with the brand, social media, and research tools. Together these can clearly deliver a significant opportunity for consumer insights. There is evidently much to applaud the proposed approach, consistent with the Smart Data perspective of adding the what to the why but is there not also a wider role for Big Data outside of the customer base? Big Data: the Big Picture The prevailing view for how Big Data should operate has often been for brands to analyse their own proprietary data set generated from their own digital assets. This is then used to optimise the consumer interaction with their brand. As such, a rationale has emerged for retaining this analysis work in-house. Yet walled-garden approaches only give part of the story, failing to address the activity that led the consumer to touch the brand in the first place, or indeed, which other brands the consumer is having conversations with. Research agencies are in a strong position to move to a Smart Data approach, providing a context for brands by: providing syndicated services that reflect activity in the market as a whole (achievable from the overview they accrue from being aggregators of multiple clients Big Data sets) generating panels of consumers that capture digital activity across multiple devices. It is therefore possible to build a greater
13 understanding of consumer behaviour as this covers the entire repertoire of their online lives, not just those related to specific brands in a category accessing social media data through APIs or other agreements to undertake a much more detailed look at the nature of online social interactions Together these provide opportunity chance for a broader understanding of a brand s performance, placing the work done on the brands own digital assets into a wider market context. Small Big Data New data collection tools suddenly allow us to collect vast amounts of digital data from relatively small samples, tracking consumers movements and their uses of objects. Software can track activity on mobile devices while sensors such as Ninja Blocks, Twine, Knut or Electric Imp can tap into the ever growing internet of things to track how they are being used. By integrating this type of Big Data collection from small samples with a rich in-situ qualitative or ethnographic approach, we generate a truly Smart Data approach; Digital Ethnography might be a good term for this. Whilst these activities require significant investment to undertake on a large scale, we are able to develop a hugely (literally) rich dataset that offers a level of granularity into behaviours that would be impossible to obtain by other means.
14 There is a huge opportunity for research by generating an understanding of human social interactions in data. Major tech brands, including Microsoft, Facebook and Intel, all made announcements this year that they have set up labs to do exactly this. THERE IS A BIGGER PRIZE AT STAKE 05 This is because the era of Big Data enables a unique, unparalleled exploration of human behaviour and specifically human social interaction. Facebook s in-house sociologist, Cameron Marlow, is cited as considering the research his team is undertaking as having the potential to transform scientific understanding of human behaviour in the same way that astronomy has transformed our understanding of the cosmos (25). And he is not alone; Microsoft Research s Duncan Watts says, The present time is a very special time in the history of social science because we are witnessing a dramatic transformation in our ability to observe and understand human behaviour (26) A seminal paper by data scientist Scott Golder that has an unlikely title as a call to arms is called Scaling social science with Hadoop (27). It suggests that the Big Data revolution could signal the end of normal science in social science as we can now explore how entire societies work. The emerging field of computational social science is one that allows us to explore the importance of the social network in driving behaviour. As Paul Ormerod points out in his book Positive Linking (28), Social networks are often thought about purely in terms of social media such as Facebook. However, we also include in this real-life social networks family, friends, colleagues that are even more important in helping us shape our preferences and beliefs, what we like and do not like. Computational sociologists are understandably excited about the opportunities that accessing social media represents to explore this phenomenon. We now have detailed data collected naturalistically (the by-product of peoples lives) for us to better understand how social linkages work.
15 And the emerging field of Cyber-Psychology is also creating waves by using digital behavioural data as a research tool in its own right; that is using the medium to create a better understanding of human psychology. A great example of this is a research project by the aforementioned Scott Golder and Professor at Cornell University, Michael Macy, looking at how diurnal and seasonal mood rhythms varied across different cultures across the globe. (29) This area has potential to be huge. After all, our existing theories of human behaviour (including human social behaviour) has previously been based on relatively small samples. The tools we use will always shape our understanding so if our tools suddenly evolve, then so will our understanding. And not just in size and scale but qualitatively, we have the potential to generate completely new insights into human behaviour. Perhaps this is the real heart of Smart Data, where our knowledge of consumers is embodied within the data itself, generating insights into the human condition that are hard to derive by other means. Creating this type of understanding will affect how brands can leverage the value of their data sets as it starts to integrate effectively models of human and social behaviour. Research agencies need to invest in order to be credible in this area. At GfK, we have created a unit to specifically explore these issues and have developed close links with universities to develop and expand our offer. Investment is also inevitably required in order to access the scale of data sets necessary for this type of work.
16 IN CONCLUSION 06 Once again, in its relatively young history, Market Research is at a cross roads. The Big Data agenda has the potential to shut out much of the industry with the focus shifting to the use of behavioural data to facilitate tactical efficiencies. Whilst there will still be opportunities for agencies that have the capabilities to aggregate and generate Big Data, I suspect that if this scenario comes to pass, a high proportion of research budget will be reallocated in this direction. Alternatively, there is the opportunity for the industry to seize the agenda, demonstrating that the much sought-after skill set required to execute the Big Data agenda is present within the industry. Indeed, it is hard to think of another industry that has the abilities to handle data sets and generate a nuanced understanding of the consumer. In short, the challenge for our industry is making the case for Smart Data as the most profitable way for brands to unlock the value held in their data assets. More information To find out more about Smart Data, you can contact the author directly below: Colin Strong UK Managing Director, GfK s Technology division (+44)
17 1. Report by MIT Sloane Management Review & IBM Institute of Business Value (2011) Analytics: The Widening Divide, USA: IBM Global Services 2. Capgemini (2012) The Deciding Factor: Big Data & Decision Making, London: Economist Intelligence Unit 3. Bollier, D. (2010), The Promise and Perils of Big Data, Washington: The Aspen Institute 4. Brumfiel, G. (2011) High-energy physics: Down the petabyte highway, Nature, 469, pp Maugh, T.H. (2008) Cows have magnetic sense, Google Earth images indicate, Los Angeles Times, August Duhigg, C. (2012) How Companies Learn Your Secrets, New York Times, Feb Bollier, D. (2010), The promise and perils of big data, Washington: The Aspen Institute 8. Brynjolfsson, E., Hitt, L., and Kim, H. (2011) Strength in Numbers: How Does Data-Driven Decision-making Affect Firm Performance?, US: Social Science Electronic Publishing REFERENCES McKinsey & Company (2011) Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May Anderson, C. (2008) The End of Theory: The Data Deluge makes the scientific method obsolete, Wired, June Duhigg, C. (2012) How Companies Learn Your Secrets, New York Times, Feb McKinsey & Company (2011) Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May McKinsey & Company (2011) Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May Jeffrey Zaslow, If Tivo thinks you are gay, here s how to set it straight Wall Street Journal, November Boyd, D. (2010) Big Data: Opportunities for Computational and Social Sciences, [Online], Available: big-data-opportunities-for-computational-and-social-sciences.html [accessed September 2012] 16. David Bollier (2010), The promise and perils of big data ; The Aspen Institute
18 17. Falk and Konold (1997) Making Sense of Randomness: Implicit Encoding as a Basis for Judgment, Psychological Review, 104, 2, pp Boyd, D. (2011) Six Provocations for Big Data, [Online], Available: scribd.com/doc/ /six-provocations-for-big-data-danah-boyd-kate- Crawford [accessed September 2012] 19. Shah, S., Home, A. and Capella, J. (2012) Good Data Won t Guarantee Good Decisions, Harvard Business Review, April 20. Meggs, G. (2011) From Big Data to Big Insight: How to answer the challenge when opportunity knocks, WARC Datacentric conference, December Brynjolfsson, E., Hitt, L., and Kim, H. (2011) Strength in Numbers: How Does Data-Driven Decision-making Affect Firm Performance?, US: Social Science Electronic Publishing 22. McKinsey & Company (2011) Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May Poynter, R. (2012) Is Market Research Ready for Big Data?, [Online], Available: [accessed September 2012] 24. Poynter, R. and Taylor, K. (2012) Communities in 2017: A prediction of where research communities will be in five years, [Online], Available: visioncritical.com/wp-content/uploads/2012/04/mrsconferencepaper_ Communities-in-2017.pdf [accessed September 2012] 25. Simonite, T. (2012) Facebook s Telescope on Human Behavior, [Online], Available: [accessed September 2012] 26. Simonite, T. (2012) Microsoft s New Lab Hunts for Value in User Data, [Online], Available: [accessed September 2012] 27. Golder, S. and Albanese, E. (2010) Scaling Social Science with Hadoop, [Online], Available: [accessed September 2012] 28. Ormerod, P. (2012) Positive Linking, London: Faber and Faber 29. Golder, S.A., and Macy, M.W. (2011) Diurnal and Seasonal Mood Vary with Work, Sleep and Daylength Across Diverse Cultures, Science, 30, pp