Crowd-Squared: A New Method for Improving Predictions by. Crowd-sourcing Google Trends Keyword Selection

Similar documents
Can product sales be explained by internet search traffic? The case of video games sales

Using internet search data as economic indicators

Business Challenges and Research Directions of Management Analytics in the Big Data Era

Marketing Mix Modelling and Big Data P. M Cain

Healthcare data analytics. Da-Wei Wang Institute of Information Science

Financial Trading System using Combination of Textual and Numerical Data

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Big Data. Fast Forward. Putting data to productive use

Building a Database to Predict Customer Needs

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Database Marketing, Business Intelligence and Knowledge Discovery

Statistical Challenges with Big Data in Management Science

Big Data. How it is Transforming Learning and Talent Development

Cleaned Data. Recommendations

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Beyond listening Driving better decisions with business intelligence from social sources

Accelerating Complex Event Processing with Memory- Centric DataBase (MCDB)

Supply chain intelligence: benefits, techniques and future trends

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Technical challenges in web advertising

Information Management course

The Future of Business Analytics is Now! 2013 IBM Corporation

Pulsar TRAC. Big Social Data for Research. Made by Face

2013 Ad Solutions. Cross Channel Advertising. (800) Partnership Opportunities 1. (800)

Opportunities and Limitations of Big Data

SEO Services. Climb up the Search Engine Ladder

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

PLA 7 WAYS TO USE LOG DATA FOR PROACTIVE PERFORMANCE MONITORING. [ WhitePaper ]

Big Data Big Noise. Its relevance to industrial Statistics in the context of SDG monitoring. Shyam Upadhyaya UNIDO

Data analytics Delivering intelligence in the moment

DIGITAL MARKETING SERVICES

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

COMP9321 Web Application Engineering

Realize Campaign Performance with Call Tracking. One Way Marketing Agencies Prove Their Worth

CoolaData Predictive Analytics

Big Data Collection Study for Providing Efficient Information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Bootstrapping Big Data

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

At a recent industry conference, global

Predicting the Present with Google Trends

Web 3.0 image search: a World First

Role of Social Networking in Marketing using Data Mining

LARGE-SCALE DATA-DRIVEN DECISION- MAKING: THE NEXT REVOLUTION FOR TRADITIONAL INDUSTRIES

How does investor attention affect crude oil prices? New evidence from Google search volume index

Labor Planning and Budgeting for Retail Workforce Agility

Sentiment Analysis on Big Data

IBM Social Media Analytics

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Understanding the impact of the connected revolution. Vodafone Power to you

Accelerating Business Intelligence with Large-Scale System Memory

one Introduction chapter OVERVIEW CHAPTER

Customized Efficient Collection of Big Data for Advertising Services

WordStream Helps New Agency Indulge in PPC Advertising

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Get Google AdWords Traffic With Almost No Out Of Pocket Cost!

DIGITAL MARKETING SERVICES

Five Steps to Optimizing an ecommerce Site for Search Engines

Design of an FX trading system using Adaptive Reinforcement Learning

Best Practices for Log File Management (Compliance, Security, Troubleshooting)

Impact. How to choose the right campaign for maximum effect. RKM Research and Communications, Inc., Portsmouth, NH. All Rights Reserved.

Here s your full marketing OS. Reimagined.

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives

Using Artificial Intelligence to Manage Big Data for Litigation

Internet Marketing Proposal

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks

THE STATE OF Social Media Analytics. How Leading Marketers Are Using Social Media Analytics

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Testing Big data is one of the biggest

Streamline your supply chain with data. How visual analysis helps eliminate operational waste

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Making Sense of the Mayhem: Machine Learning and March Madness

The Formula for Small Business Internet Marketing Success

Recommendations for Performance Benchmarking

HOW TO ACCURATELY TRACK YOUR SOCIAL MEDIA BUZZ

Business Process Services. White Paper. Social Media Influence: Looking Beyond Activities and Followers

IBM Social Media Analytics

Transcription:

Crowd-Squared: A New Method for Improving Predictions by Crowd-sourcing Google Trends Keyword Selection Full Paper (word count- 6058) Erik Brynjolfsson Sloan School of Management, Massachusetts Institute of Technology erikb@mit.edu Tomer Geva Recanati Business School, Tel-Aviv University tgeva@tau.ac.il Shachar Reichman Sloan School of Management, Massachusetts Institute of Technology shachar@mit.edu Abstract Advances in information technologies and analytic tools have dramatically increased our ability to obtain accurate data on billions of economic decisions almost the instant that they are made. Services such as Google Trends aggregate billions of search queries and provide information about the search volume of different terms. This information from the crowd, had been successfully used to accurately predict a wide variety of events. Nevertheless, a major challenge for successful utilization of this source of information is the generation of an appropriate list of search terms that resemble the phenomena of interest. Current methods for search term generation often use proprietary search engine data or employ black box classifiers or require extensive computational power. We introduce Crowd-Squared, a new crowd-based method, for selecting relevant search terms. We show that this method is successful in various domains and that it performs as well or even outperforms selection methods used in previous studies. 1

Introduction Advances in information technologies and analytic tools over recent years have dramatically increased our ability to obtain accurate data on literally billions of economic decisions almost the instant that they are made. In particular, each time a consumer uses a web search engine during the purchase process, valuable information is revealed about that individual s intentions to make an economic transaction and about the intentions of similar users. Services such as Google Trends aggregate of billions of search queries and provide information about the relative volume of different search terms. Using this information from the crowd, researchers can make accurate predictions of a wide variety of future events, including products sales, claims for unemployment, and epidemic outbreaks. Of course, the ability to rapidly detect current activities and accurately predict future events has considerable business implications, covering almost all aspect of a firm s activities such inventory and supply chain activities, marketing activities, pricing, and others. While early research has had considerable success, perhaps the most important challenge that has emerged is the generation of an appropriate list of search terms that resemble the phenomena of interest. Current techniques for the identification of relevant terms have various limitations, such as the need for proprietary search information; the use of an automated, black box Google category classifier, which is not available for all domains; or reliance on manual term selection based on specialized domain knowledge or trial and error. Typically, the problem is handled by a one person 2

guessing game and even when machine learning methods are utilized to generate the term identification, the methods are still constrained by their individual design. While researchers can use their own intuition and judgment about which terms web searchers may use when seeking information about particular phenomena, these may not match well with the terms that web searchers actually choose. In this paper, we offer a new crowd-based method for selecting relevant search terms that correspond to the underlying phenomena and facilitate accurate early detection or prediction of real-world events with the use of aggregated search data. In effect, we recruit a crowd of a few hundred participants to predict the terms employed by the much larger crowd of millions of search engine users. We show that this iterative use of the crowd is successful in various domains and that it performs as well or even outperforms previous search term selection methodologies that had been used in previous studies. Crowdsourcing is receiving increased interest recently and has been proved to be a reliable and successful technique in many aspects of business and research areas including capturing new product ideas and innovations, generating accurate images tags, improving image search, and even solving scientific problems. One of the main benefits of crowdsourcing is the ability to harness human intelligence to perform small tasks that are impossible or too expensive to appropriately perform by computers. In our context, the task is finding the terms relevant to a focal item or occurrence of event. We used a crowdsourcing environment and designed a game to capture people s ideas of focal phrases. We introduced an online word association game in which consumers are asked to provide five terms that come to mind when they see a specific word or phrase. 3

Given a specific phrase, word association technique provides a relative index of the strength accessibility of related words in the memory. We therefore expect that this technique reflects the same keywords generating process one may perform while using a search engine. 300 participants played the game using the Amazon Mechanical Turk platform. We aggregated the results and collected aggregate search trends data for each of the top mentioned terms. We used these search trends to generate predictions in three different domains: influenza epidemics, unemployment claims and housing indexes. We then compared our results with a benchmark model in each of the domains. We find that the use of crowd-generated terms as part of the prediction model is highly effective. Our results suggest that the integration of crowd-generated search terms with aggregated data from search engines performs as well or even outperforms costly or black box key word generation methods. 4

Related Literature The availability of search data, web activities data, and other source of information, along with developments of analytic tools have dramatically increased our ability to obtain accurate data on billions of economic decisions as well as on individuals intentions to make these transaction (McAfee and Brynjolfsson, 2012). This leads to a new topic of interest, using search engine logs to forecast occurrences of future events or to provide faster and more accurate means of gauging and identifying current events (also referred as nowcasting by Choi and Varian (2012). Predictions models using search query volume data Search engine logs or search trends, have received significant attention in recent years for their ability to predict and detect a variety of economic outcomes. Search volume data has been shown to provide useful predictions in a wide range of domains, from epidemic outbreaks (Ginsberg et al. 2008) through movie box office sales and music billboard rankings (Goel et al. 2010) to automotive sales (Du and Kamakura, 2012; Choi and Varian 2012) and home sales (Wu and Brynjolfsson, 2009; Choi and Varian, 2012) to claims for unemployment (Choi and Varian, 2012). An important component in successful utilization of search trends data is the specification of the relevant set of search queries whose volume best reflects the phenomenon of interest. Literature reports on several types of methods for selecting the relevant set of search terms. The first approach relies on entire categories rather than using specific keywords. This method uses an automatic allocation by a black box classifier 5

developed by Google that classifies each search query into several hundred predefined categories and sub-categories. Choi and Varian (2012) used this method to obtain relevant search volume data and demonstrate contemporaneous predictive capabilities in various fields, including sales of motor vehicle parts, initial claims for unemployment benefits, travel, consumer confidence index, and automotive sales. Vosen and Schimdt (2011) used Google-categorized search data to predict private consumption. For this purpose, they used 56 Google categories that the authors saw as most relevant. Subsequently, they employed a factor analysis method and used the factors with the largest eigenvalues. Wu and Brynjolfsson (2009) used Google search data pertaining to real estate categories to predict future house sales and price indices as well as home appliance sales. Their paper also raises the possibility of using search volume in one domain (such as real estate or real estate agencies) to predict sales in a different domain (home appliances). This demonstrates that relevant search query selection procedures may be improved via a non-trivial process of finding terms or categories with seemingly indirect influence. Overall, the advantages of using search volume based on Google s predefined categories are its ease of use and the fact that it can encompass multiple relevant search terms. However, the underlying classifier is a black box, so it is not possible to gauge its accuracy or its coverage. More importantly, the predetermined categories are applicable only to a set of popular items, but they do not include many possible items of interest (e.g. there is a category for the Ford automotive brand, but there is no category for the 6

Ford Focus model). 1 In addition, for items such as housing prices or public consumption, multiple Google categories or subcategories may be relevant, and a user decision is required as to which of the categories should be used. Another approach was taken by Ginsberg et al. (2008) for constructing an early detection system for influenza epidemics. For this purpose, they used Google s internal data concerning the 50 million most popular search terms. Subsequently, they fitted a simple logistic model using each of the search terms, explaining the dependent variable. Last, they selected the top n terms according to the mean correlation with actual influenza data across nine regions. This methodology was highly successful. However, it is impossible to reproduce this methodology with the trends data that Google provides to external users (at the Google Trends website). This is due to strict restrictions on the number of terms that can be extracted from Google Trends (several hundreds per day). In addition, this kind of analysis requires expertise and high computational power both to collect a large portion of all queries performed online and to create the correlation matrix to the phenomenon of interest. Another study that used proprietary information is Goel et al. (2010). This study reports about various methodological aspects of using search trends data. To demonstrate various methodological aspects, they perform tasks such as predicting movie revenues and music billboard rankings as well as video game sales. The methodology relied on the identification of search queries and predefined relevant webpages (e.g. IMDB) that were returned by the Yahoo search engine. While the authors (who were affiliated with 1 These studies used the search volume for the entire category. However, Google Trends also allows specifying a keyword within the Google category. For example, a search for the term Argo under the movie category will return search queries related to movies that specifically include the word Argo. 7

Yahoo) obtained good results using this methodology, it is virtually impossible to replicate it using publicly available data, as this requires conducting an exhaustive check of all possible search terms that return a set of specific links. Other studies have used handpicked keywords. For instance, Seebach et al. (2011) tested various combinations of search terms that included vehicle brand names (e.g. Volkswagen or VW), vehicle model name (Golf, Passat), as well as various Google categories for the purpose of predicting automotive sales. They found that simple usage of search terms using brand-level names under the vehicle shopping category provided the best results in terms of correlation with brand-level sales. D Amuri and Marcucci (2012) used a single, though highly relevant, search keyword jobs in forecasting unemployment. Last, Du and Kamakura (2012) report on a method for dynamic factor analysis for extracting latent dynamic factors in multiple time series data. They demonstrate this method using Google Trends data for U.S. automotive sales. For this purpose, they use an initial set of keywords suggested by Google AdWords keyword tool, which is used to recommend relevant search terms for advertising purposes. While the Google AdWords tool suggests relevant terms, its selection criteria for relevant terms are also not publicly disclosed. In many of the above-mentioned studies, the keyword models were based on a priori knowledge, a well-defined category, or search terms that were defined as closely related with the predicted variable. Another difficulty in search term selection that may occur frequently is that there is no prior knowledge about the queries that could be relevant to the predicted event (e.g. launching a new product) or what would be the best-matched search category that corresponds to its search patterns. 8

Crowdsourcing and word association game The fundamental idea behind using search query data for prediction is that it reflects cumulative actions performed by people and, as a result, will capture changes in their behavior over time. With its origin in the crowd, it is reasonable to assume that we can use the crowd to better understand the keyword generation process that leads to search queries. Specifically, as search behavior can be used to reveal consumers intention (Moe and Fader, 2004), this understanding will improve classification of search patterns of different consumption activities. Crowdsourcing is the act of harnessing a distributed network of individuals to solve a problem or perform a function that was once performed by employees (Brabham 2008; Howe 2006). In recent years, the use of crowdsourcing is accelerating in many fields, including capturing new product ideas and innovations (Bayus 2013), generating accurate image tags (Von Ahn 2006), improving image search (Yan et al. 2010), and even solving scientific problems (Lakhani et al. 2007). The benefits of crowdsourcing stem from its scale and diversity, which provides a variety of user backgrounds, level of expertise, and other demographics, at low costs. We followed this stream of research and leveraged the crowd to generate relevant keywords for prediction and early detection of events with search volume data. One of the challenges of crowdsourcing is how to engage the crowd in a meaningful and productive way (Boudreau et al. 2013). As noted by (Von Ahn 2006), an online game environment is an effective technique to capture crowd knowledge and may provide reliable information without any supplementary verification of the users answers. In addition, as shown by (Snow et al. 2008), aggregating results for the same task from 9

multiple non-expert individuals can generate results at the same level as those created by experts. In this paper, we used a crowdsourcing game environment and designed a word association game to capture people s ideas of focal phrases. We aggregated the associated terms results, collected search data for each of the most mentioned terms, and included them in the prediction model. To the best of our knowledge, our work is the first to study how the combination of word association with search data can improve prediction and early detection accuracy. 10

Methodology and Evaluation We studied how a crowd-based word association game can improve the generation of useful search terms, thereby improving trends predictions. We employed the Amazon Mechanical Turk platform, an online marketplace for tasks that require human intelligence (or tasks that can easily answered by a human but require a large computation cost to be solved algorithmically). Workers (known as Turkers) are paid small amounts of money to complete small tasks (called HITs Human intelligence Tasks). The platform allows randomization of the tasks assignment to multiple Turkers and provides control over the completion of the task. Word association We introduced a technique to use human workers to help find relevant keywords in a game-like environment. Specifically, we implemented a word association game (also known as free association) where workers are asked to submit phrases that are related. Word association is a task that requires participants to spontaneously provide a word or a phrase that is related to a presented word (known as the cue). Word association taps into one s lexical knowledge that is based on real-world experience (Nelson et al. 2004) and has been shown to be important in predicting cued recall (Nelson et al. 1998). This task is used in everyday activities as a mean for collecting thoughts (Nelson et al. 2000). Word association provides an index of the probability that words are related to the cue term. This information was found to be consistent across different people in the same culture recall (Nelson et al. 1998). In the context of web search, one may use word 11

association to determine effective search queries. With its consistent representations of the associated terms, these terms may reflect broader search patterns and therefore will assist in measuring current events and predicting future activities. Another benefit of the word association technique is the fact that it provides power law distribution of terms association; most associations relate to proximal terms, and a few associations connect to more distant terms. This technique allows us to capture terms that are more spread around and less correlated with each other; thus, they may have more explanatory power when combined with search data. Keywords association game design We designed an online word association website specially designed and built for this study. The website provides a single page with short instructions and one phrase (the cue term). Five text boxes were shown for participants to fill in with their associated terms (an illustration of this game is presented in Figure 1). The appearance of the website was planned to simulate the common game environment, and participants were not told about the purpose of the game nor on how those terms would be used after the game. Each Turker (participant) was shown a single phrase and was asked to provide 5 terms or phrases that come to mind when seeing this phrase. Each Turker was paid 5 cents ($0.05) for completing the game. The average duration of a game instance was 46 seconds (including answering three demographic questions). 12

Please write 5 terms (one word or more) that come to mind when you see the word Figure 1. An illustration of the online web association game. The word Flu is the focal phrase, and Turkers were asked to write 5 terms or phrases that come to mind when seeing the focal phrase. We aggregated the game results and generated a list of the top 10 terms associated with each cue phrase (Appendix A includes the top 10 terms by cue). We used this set of terms as the list of relevant query terms that accurately reflect actual search queries. For each term, we collected its search query volume over time and included the search data in the forecasting method. Evaluation To validate our methodology, we applied it over similar data and prediction tasks reported in three different domains. We replicated the tasks reported in three wellknown related studies: Ginsberg et al. (2008) in the influenza outbreaks detection 13

domain, Wu and Brynjolfsson (2009) in real estate market predictions, and Choi and Varian (2012) for predictions of unemployment levels. To allow an impartial comparison, we also intentionally limited ourselves to using the exact performance measures and same sets of data and time periods that these studies used. We compared our models with the prediction models reported in each of those papers and with a baseline model when one was used in the original comparison. It is important to note that we are not suggesting new forecasting method but rather introducing a new technique for generating relevant input variables to be included in any forecasting model that uses search query data. If our methodology is valid, we expect it to obtain predictive accuracy that is at least as good as the predictive accuracy reported in these studies. Influenza epidemics The first data that we used to validate our methodology is the flu outbreak data from the CDC. This type of data was used by Ginsberg et al. (2008) for constructing an early detection system for influenza epidemics. Specifically, the dependent variable in their study was the weekly ILI (Influenza-Like Illness) factor reported by the U.S. Centers for Disease Control (CDC). For selecting the search terms that should be included in the prediction model, they used Google s internal data concerning the 50 million most popular search terms from which they selected top n terms by calculating individual term correlation with the dependent variable. Subsequently, they used the selected terms in fitting a linear model that is used to generate prediction. Their method was highly successful for this application, reaching an out-of-sample mean correlation of 0.97 across U.S. regions. Nevertheless, it is impossible to use a similar methodology 14

without access to Google s proprietary data since Google does not allow external access to search trends data for more than several hundred search terms a day. In this study, we used U.S.-level data between Jan 2005 and the week commencing on March 11, 2007. 2 We validated our modeling using out-of-sample data from March 18, 2007 to May 11, 2008; this is the same out-of-sample validation period used by Ginsberg et al. Using the word association settings described above, we asked 100 Turkers (62% female, average age 31.8) to play the online game where the task description was please write 5 terms that come to mind when seeing the word Flu (see Appendix A for the top 10 list of associated words generated by the Turkers). The result set of different associated phrases was very large. Nevertheless, the use of any single phrase may not represent a common form of thinking but only one s unique thinking that will not reflect others search patterns. As shown by Snow et al. (2008), an aggregation of results from multiple individuals can generate results with high quality. We therefore restricted the analysis to include only the top 10 most popular association phrases. For each phrase, we collected the weekly search index from Google Trends. This search index is the share of searches at time t (typically week or month) relative to the total search volume across the time period. We limited our results to queries in the United States to match the predicted variable flu outbreak in the U.S. 2 We excluded data from 2003 since Google Trends provides data only from 2004. 15

Specifically, we used the following prediction model: Influenza epidemics models: ε (1) Where ILI t is the percentage of Influenza-Like Illness at time t as reported by the Centers for Disease Control and Prevention (CDC); AssociatedTermi t is the search trends value at time t for the association-based term i (i=1..10) in the aggregated results of the word association game for influenza. We first compared the results of our model for the same time period reported in their paper. The training set included 167 weeks from 2004 to 2007. We validated our model on untested data from March 18, 2007 to May 11, 2008. Our prediction results achieved a similar level of out-of-sample correlation (0.973) in predicting the ILI (compared to 0.97 in Ginsberg et al.). With seemingly similar results, it is important to point out the huge difference in the amount of data that was included in each model. First, Ginsberg et al. used 50 million different searches and 450 million different models to generate the final model with 45 queries. The computation of this process employed hundreds of machines using a distributed computing framework. Our method is based on 100 online users; each played a game for less than one minute. Our final model included only the top 10 searches and a single model. For robustness, we extended our predictions and validated our model on the most recent available influenza data from the first week of April 2012 to the last week of March 2013. We compared our results, based on a prediction model whose latest training data is from 2007, with flu trend early detection data provided by Google Flu 16

Trends website 3. This website provides flu outbreak detection on an ongoing basis, using the methodology suggested by Ginsberg et al. Here, our results show significant improvement of the correlation level, 0.962 compared to 0.951 of the Google Flu Trends results. Figure 2 shows a comparison of our model predictions with the actual reported ILI data from the CDC over the two time periods described above. Looking at the 2012-2013 period, and specifically December 2012 to February 2013, our model generated predictions that matched better the actual influenza outbreak duration than the Google Flu Trends model. To summarize, these results suggest that with considerably less computation power and with a smaller set of initial candidate search query terms, association-based search terms generate equivalent or better results than the brute force technique reported in previous papers. 3 http://www.google.org/flutrends/us/data.txt. 17

Figure 2. A comparison of the Crowd-Squared model predictions with actual reported ILI and Ginsberg et al. (2008)/Google Flu Trends, over 2 separated periods 2007-2008 and 2012-2013. 18

Housing indicators The real estate market is traditionally used as a good indicator of a country s economy. Housing activities both reflect individuals financial situations and influence the country s economic growth by generating or eliminating real estate jobs and services. Hence, predictions of real estate indexes have become a common and important tool for policy makers and industries that rely on these activities. This type of data was used by Wu and Brynjolfsson(2009) for predictions of the real estate market and its complementary businesses (i.e. home appliances). The main predicted variable they used was the volume of housing sales in the U.S. 4 from the 4 th quarter of 2007 to the 2 nd quarter of 2009. Instead of selecting the search terms to include in the prediction model, they used two predefined search categories, available from Google s black box category classifier: Real Estate and Real Estate Agent and used their search trends index in the prediction model. Wu and Brynjolfsson used a seasonal autoregressive model and performed an in-sample evaluation of their model using Adjusted. They compared their model with a baseline model presented in equation (2). 4 Provided by the National Association of Realtors http://www.realtor.org/research-andstatistics/housing-statistics. 19

Real Estate Indicator models: 1 1 ε (2) 1 1 (3) ε Where HomeSalesj t is the volume of homes sales in state j at time t, as reported by the National Association of Realtors; HPIj t 1 is the house price index of state j at time t 1, as reported by the Federal Housing Finance Agency; and AssociatedTermi t is the search trends value at time t for the association-based term i (i=1..10) in the aggregated results of the word association game for real estate; Sj is a state level fixed effect; Tj is quarter dummy variable. We followed their forecasting method and used an autoregressive model presented in equation 3. Similar to the influenza epidemic predictions, we asked 100 participants (53% female, average age 30.6) to play the word association game where the task description was please write 5 terms that come to mind when seeing the phrase Buying a House (see Appendix A for a list of the top 10 terms associated by participants). The baseline model (equation 2 above) reported by Wu and Brynjolfsson displayed a good fit with an Adjusted of 0.973. Our model resulted in an Adjusted of 0.9882, higher than the highest reported results in their predictions models (0.984). 20

Initial claims for unemployment benefits The third set of data involves early estimation of the volume of initial claims for unemployment benefits. This economic index is published by the U.S. Department of Labor each Thursday, for the previous (Sunday Saturday) week and is considered an important measure of the state of the U.S. economy. 5 Early estimation of initial claims for unemployment using search trends data has been reported by Choi and Varian (2012). Nevertheless, they also report that a simple baseline model, presented in equation (4), performs very well to the point that linear regression estimation results seem to indicate a random walk (with a drift) behavior. 1 (4) Where UIC t is the logarithm of the seasonally adjusted volume of initial claims for unemployment for week t. Choi and Varian created a prediction model which incorporated both baseline information (seasonally adjusted initial claims for the previous week) as well as (seasonally adjusted) search trends for the current week based on Google s predefined categories of Jobs and Welfare...Unemployment, as identified by Google s automated category classifier. They evaluated this model out-of-sample using a one-week-ahead rolling prediction (i.e. using the data up until week (t-1) to train a model and measure its performance over week (t)) during a time period between January 2004 and July 2011. 5 Historical data is available at http://www.ows.doleta.gov/unemploy/claims.asp. 21

While their model was able to generate relatively accurate predictions of economic turning points, their overall results, measured by Mean Absolute Error (MAE) was 3.68%, whereas the MAE for the strong baseline model was 3.37%. This result suggests that the search trends data, based on the predefined categories, may have contained (mostly) overlapping information to the information contained in the previous week s claims data, in addition to some noise that may have harmed the out-of-sample predictive accuracy. We asked 100 participants (54% female, average age 32.8) to play the word association game where the task description was please write 5 terms that come to mind when seeing the phrase Unemployment (see Appendix A for a list of the top 10 terms associated by participants). We used this list of top 10 associated trends and rerun a simple linear regression model as detailed in equation (5). 1 (5) Where UIC t is the logarithm of the seasonally adjusted volume of initial claims for unemployment for week t and AssociatedTermi t is the search trends value for the association-based term i (i=1..10). 6 We applied this model using a similar one-step-ahead prediction model and a similar time period as in Choi and Varian 2012 (see Figure 3 for a comparison of the prediction model and actual unemployment claims data). Our prediction model obtained an out-of- 6 Associated terms as seasonally adjusted by subtracting the week average value for each term. 22

sample MAE value of 3.42%. While this value is not as good as the MAE value for the competent baseline model (3.37%), our predictive accuracy was better than the one reported by Choi and Varian 2012 (3.68%). This suggests that the association-based search terms contained less noise than the search volume identified by Google s automated classifier. Figure 3. A comparison of the Association-based model predictions with the actual reported claims for unemployment published by the U.S. Department of Labor. 23

Discussion Remarkably accurate predictions can be made by analyzing the aggregate search activities of the crowd. However, this prediction methodology has been hampered by the lack of an effective method for selecting query terms that are accurately associated with the predicted event. In this paper, we present a new crowd-based method for selecting relevant search terms that correspond to the underlying phenomenon and facilitate accurate early detection or prediction of real-world phenomenon with the use of aggregated search data. We study how to improve the keyword selection process by using a crowd-based online game. Particularly, we used a word association game design to collect the associative thoughts of workers to a focal phrase, a process that imitates the choice of search terms when using search engines. Thus we use one crowd to select the terms that a larger crowd will search for when seeking information about the phenomenon we seek to predict. We have performed three online games, on three different topics, asking participants to provide phrases that come to mind when they see a specific phrase. We find that the use of a word association game can effectively generate a set of relevant keywords that, when combined with search volume data, generates better predictions or equal predictions at lower cost. These results show that our methodology can be successfully applied to several domains, which exemplify its robustness. 24

Managerial implications Accurate measures of current events and predictions of future activities are one of the key challenges for mangers and policy makers. The use of search data has been shown to provide reliable estimates; however, its application to businesses was hindered due to the limitations of the selection methods of current terms. We argue that our new method extends the availability of the use of search data for predictions, especially when the exact relevant keywords are unknown. Even when some prior knowledge exists, our method can generate new related terms that can potentially improve predictive accuracy. Further, due to its simplicity and low cost, forecasts can be updated periodically to support strategic decisions. This method can be used for both short-term and long-term decisions. For example, better measurements of current demand trends can assist in shipment routing and planning of marketing activities in the short term. It may assist in early detection of problems in current products or services, as those phrases will appear in the word association results. With respect to long-term decisions, the word association and search trends may assist in production planning, and more interesting, it may reveal consumers needs for changes in the product or new products. Overall, in the era of increasing volumes of big data, our approach allows for simple and low-cost filtering of relevant information that can be used in measurements and prediction of business activities. 25

Limitation and future research While the use of search volume data has been shown to improve prediction models, it is important to note that people who perform online searches do not necessarily reflect a representative sample of the population. For instance, elderly people and people with low income tend to use the Internet less often, which could lead to inaccurate predictions in some domains. In addition, due to privacy constraints, Google makes search volume data available only when the number of searches of a specific term reaches a threshold that obstructs the possibility of discerning the identity of those who performed the searches from the aggregated data. As a result, small-scale phenomena, or events that occur in areas with low population density, will not be publicized by these search tools. In a similar manner, the use of crowd-based tools also does not generate a representative sample of the population and may not suit for areas with low population or areas with a low level of technology adoption. 26

Appendix A List of Aggregated Associated Terms Table 1. Top 10 Associated Terms by Cue Term Influenza Housing Sales Unemployment Term Association Term Association Term Association strength * strength strength sick 53% mortgage 50% poor 20% fever 47% expensive 18% money 20% cold 19% realtor 18% jobless 16% cough 18% location 16% depression 16% contagious 15% money 14% broke 12% germs 11% loan 12% homeless 12% shot 10% agent 8% bills 10% vaccine 10% interest rate 8% no money 10% influenza 9% real estate 8% Sad 10% * Association strength is the percentage of participants providing this word. 27

References Bayus, B. L. 2013. "Crowdsourcing New Product Ideas over Time: An Analysis of the Dell IdeaStorm Community," Management Science (59:1), pp 226-244. Boudreau, K. J., and Lakhani, K. R. 2013. "Using the Crowd as an Innovation Partner," Harvard Business Review (91:4), pp 60-69. Brabham, D. C. 2008. "Crowdsourcing as a model for problem solving an introduction and cases," Convergence: the international journal of research into new media technologies (14:1), pp 75-90. Choi, H., and Varian, H. 2012. "Predicting the present with google trends," Economic Record (88:s1), pp 2-9. D'Amuri, F., and Marcucci, J. 2012. "The predictive power of Google searches in forecasting unemployment," Bank of Italy Temi di Discussione (Working Paper) No (891). Du, R. Y., and Kamakura, W. A. 2012. "Quantitative Trendspotting," Journal of Marketing Research (49:4), pp 514-536. Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. 2008. "Detecting influenza epidemics using search engine query data," Nature (457:7232), pp 1012-1014. Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., and Watts, D. J. 2010. "Predicting consumer behavior with Web search," Proceedings of the National Academy of Sciences (107:41), pp 17486-17490. Howe, J. 2006. "The rise of crowdsourcing," Wired magazine (14:6), pp 1-4. Lakhani, K. R., Jeppesen, L. B., Lohse, P. A., and Panetta, J. A. 2007. The Value of Openess in Scientific Problem Solving, (Division of Research, Harvard Business School. McAfee, A., and Brynjolfsson, E. 2012. "Big data: the management revolution," Harvard business review October 2012, pp 2-9. 28

Nelson, D. L., McEvoy, C. L., and Dennis, S. 2000. "What is free association and what does it measure?," Memory & Cognition (28:6), pp 887-899. Nelson, D. L., McEvoy, C. L., and Schreiber, T. A. 2004. "The University of South Florida free association, rhyme, and word fragment norms," Behavior Research Methods, Instruments, & Computers (36:3), pp 402-407. Nelson, D. L., McKinney, V. M., Gee, N. R., and Janczura, G. A. 1998. "Interpreting the influence of implicitly activated memories on recall and recognition," Psychological review (105:2), p 299. Seebach, C., Pahlke, I., and Beck, R. 2011. "Tracking the Digital Footprints of Customers: How Firms can Improve Their Sensing Abilities to Achieve Business Agility," ECIS 2011 Proceedings). Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Year. "Cheap and fast but is it good?: evaluating non-expert annotations for natural language tasks," Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics2008, pp. 254-263. Von Ahn, L. 2006. "Games with a purpose," Computer (39:6), pp 92-94. Vosen, S., and Schmidt, T. 2011. "Forecasting private consumption: survey based indicators vs. Google trends," Journal of Forecasting (30:6), pp 565-578. Wu, L., and Brynjolfsson, E. 2009. "The future of prediction: how Google searches foreshadow housing prices and quantities," Proceedings of the 30th International Conference on Information Systems, paper 147. Phoenix, Arizona. Yan, T., Kumar, V., and Ganesan, D. Year. "Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones," Proceedings of the 8th international conference on Mobile systems, applications, and services, ACM2010, pp. 77-90. 29