A Hybrid Text Regression Model for Predicting Online Review Helpfulness

Transcription

1 Abstract A Hybrid Text Regression Model for Predicting Online Review Helpfulness Thomas L. Ngo-Ye School of Business Dalton State College tngoye@daltonstate.edu Research-in-Progress Atish P. Sinha Lubar School of Business University of Wisconsin-Milwaukee sinha@uwm.edu Business intelligence and analytics are playing an increasingly prominent role in many organizations. User-generated content and online social media open up new opportunities for businesses that can exploit and innovate with this new source of Web 2.0 data. In this paper, we concentrate on one important application predicting the helpfulness of online customer reviews. We frame it as a regression problem and apply text mining techniques. We propose a hybrid feature selection approach, which combines a filter with a wrapper, for a BI text regression problem. Based on online review data collected from Amazon.com, we demonstrate empirically that the hybrid approach produces the best prediction results among all the models examined. This study is the first to develop and validate a hybrid feature selection methodology for text regression in a BI context. Keywords: online reviews, text regression, feature selection, filter, wrapper Introduction Organizations are getting increasingly aware of the need to leverage enterprise data through business intelligence (BI) and analytics. Traditional BI applications process structured data consolidated in data warehouses to generate reports, perform OLAP, and conduct predictive analytics (Watson and Wixom 2007). In the age of Web 2.0, online social media presents new challenges and opportunities for organizations to make use of this new type of voluminous and high velocity data. Knowledge derived from BI 2.0 applications can drive business innovation and benefit businesses with better decision-making, enhanced performance, improved company strategy, and sustained competitive advantage (Turban et al. 2010). In this research, we focus on one important BI 2.0 application investigating the helpfulness of online customer reviews. To predict online review helpfulness, we propose and test an innovative and effective framework a hybrid feature selection approach for text regression. Online Customer Reviews and Review Helpfulness In recent years, online social media have become an integral part of contemporary life. Online customer reviews, blogs, Facebook updates and comments, and tweets dominate user-generated content on the Web. Such user-generated content is a gold mine for product manufacturers, service providers, platform integrators, and third parties. Businesses are interested in harvesting market intelligence to better understand their customers perspectives, so that they can better position their products, improve service offerings, and enhance brand image and customer loyalty. Among all the different types of user-generated content, online customer reviews have the most straightforward and focused impact on businesses. Potential consumers may make their purchase decisions partially based on the informal online reviews posted by their peers. Online customer reviews serve an important function by mitigating consumers information seeking needs. Both consumers and businesses are keen to learn useful information from this new resource. For popular products on major review websites, there are often hundreds or even thousands of customer reviews. The sheer number of reviews poses a serious information overload problem. If unmanaged, users BI Congress 3: Driving Innovation through Big Data Analytics, Orlando, FL, December 2012

2 Research Track: DW/BI/Analytics from an application perspective will either be overwhelmed or simply explore only the first few available ones. To facilitate users information seeking needs, one commonly used criterion to present online reviews is to list them based on their posted date with the most recent one being the first. However, the most recent reviews do not necessarily happen to be useful ones. Another frequently used criterion to present reviews is based on the number of readers votes to the question Was this review helpful? Yes or No. Based on the number of readers votes, the customer reviews are ranked on the helpfulness dimension. However, many reviews do not attract much attention because they are relatively new and do not have enough time to accumulate readers votes (Kim et al. 2006). Therefore, it is desirable to develop a mechanism to automatically predict the helpfulness of relatively new reviews. Review helpfulness offers crucial information of readers aggregated judgment of a review s overall value. Businesses should be sensitive to the review helpfulness. Ignoring helpful reviews and failing to identify them may put businesses at a disadvantage by missing a valuable learning opportunity. Previous studies have acknowledged the important practical business implications for studying review helpfulness (Liu 2010; Pang and Lee 2008). In recent years, online customer review helpfulness has attracted some attention from researchers in different disciplines (Duan et al. 2010; Kim et al. 2006; Mudambi and Schuff 2010). The construct review helpfulness is also sometimes framed as review usefulness (Ghose and Ipeirotis 2007; Zhang 2008), review quality (Liu et al. 2007; Pang and Lee 2008), or review utility (Liu 2010; Zhang and Varadarajan 2006). Online customer reviews are not equally helpful or useful. Unhelpful reviews can be screened out and excluded from review summaries (Liu et al. 2007). Another potential usage of review helpfulness is to treat it as a weight in calculating the overall valuation of a product, which is the weighted average of the polarity of each individual review (Zhang 2008; Zhang and Varadarajan 2006). Consistent with the literature, we use the number of helpful votes from readers as the target variable to build a learning model of review helpfulness. The text mining models we construct in this study are text regression models. While review helpfulness has been framed as a text regression problem, to the best of our knowledge, our study is the first one to apply a hybrid feature selection approach with both filter and wrapper to generate an optimal subset of features. We also show that our proposed hybrid model outperforms the filter only model reported in the literature. Text Regression and Feature Selection The Bag-of-Words (BOW) model has been widely used in the text mining field (Sebastiani 2002). In the BOW representation, a text is reduced to a set of unique words and their corresponding counts or weights in the document. A word s position in the text, the part-of-speech, and other high-level grammar information are not included in the model. Although BOW representation seems rudimentary, it is surprisingly resilient and useful in real-world applications. Moreover, the BOW model can be enhanced in various ways, such as removing stop-words and stemming. The BOW model has its theoretical roots in the information retrieval literature and it can capture the basic content of a text to a certain extent (Turney and Pantel 2010). Comparing to other more sophisticated representations, the BOW model still remains a viable choice for text mining. For the online review helpfulness regression problem, the features represent the weights of the words appearing in the reviews. The BOW model can be used to characterize the main topic, concept, and sentiment of review content. It has been demonstrated as a useful model for estimating the helpfulness of online reviews (Kim et al. 2006; Ngo-Ye and Sinha 2012). A common challenge in text mining is high dimensionality, because a document collection tends to have thousands of unique words. Such high dimensionality leads to not only high computation costs but also to overfitting, which is why dimension reduction can be very useful (Sebastiani 2002). Some important ideas on dimension reduction, such as ranking and projection-based methods, have been summarized in (Abbasi and Chen 2008). Filter vs. Wrapper Approach One useful way to classify feature selection techniques is based on whether they employ the filter or the wrapper approach (Hall and Holmes 2003). For filter-based feature selection methods, some type of

3 Hybrid Text Regression Model for Predicting Review Helpfulness relevance measure, such as information gain, is applied to evaluate the importance of features. The relevance measures are independent of the learning algorithm. However, for wrapper-based methods, the very learning algorithm is used for evaluating the importance of the features. The rationale of the wrapper approach is that considering how the algorithm and the training set interact will help achieve the best possible performance on a particular training set with a particular algorithm (Kohavi and John 1997). While filter-based methods have low computational complexity, wrapper-based methods have very high computational complexity. For large datasets with a high dimensionality, wrapper approach is too computationally expensive to be feasible (Hall and Holmes 2003). On the other hand, wrapper-based methods tend to have better performance due to the use of same learning algorithm for feature evaluation. Moreover, wrapper-based methods can automatically determine the best subset of features, while heuristics are needed for filter-based methods to determine how many features to select (Chou et al. 2010). Regressional ReliefF (RReliefF) Ngo-Ye and Sinha (2012) have elaborated on why many traditional feature selection techniques are not applicable for text regression, including those investigated in opinion classification (Abbasi et al. 2008). Among the suitable choices of feature selection techniques for text regression, RReliefF has recently been shown to be a very competitive method for predicting online review helpfulness (Ngo-Ye and Sinha 2012). RReliefF is a relatively new extension of the original Relief, a feature ranking method. The principle behind the Relief family of algorithms is that the ideal feature is the one that has a different realized value for instances belonging to different classes and the same value for instances belonging to the same class (Robnik-Sikonja and Kononenko 2003). As an instance-based method, Relief has strong contextual nature. Therefore, it can handle conditional dependencies between attributes. As a non-myopic attribute quality estimator, it can exploit local information to obtain global view. Thus Relief performs robustly in domains with strong conditional dependencies between attributes. In the specific domain of online reviews, the features or review words may have strong interactions and dependencies. Proposed Hybrid Approach Both the wrapper and filter approaches are well known in the feature selection community. However, the hybrid approach of combining filter and wrapper for classification problem (Chou et al. 2010) is relatively new. In the hybrid approach, the full feature set is first passed through a filter to get ranked. Then a proper subset of top ranked features is selected and passed to the wrapper. The wrapper selects the best subset of features to build the final model. The hybrid methodology possesses the advantages of both the filter and wrapper approaches. The computation cost is moderate and the performance is better than the filter approach. Moreover, the optimal subset of features is automatically determined (Chou et al. 2010). Although the hybrid approach has been applied to text classification, it has not been reported for text regression in the literature. In this study, we explore the efficacy of the hybrid approach when applied to text regression for predicting online review helpfulness. Our goal is to examine if the hybrid model improves the prediction accuracy of review helpfulness over the filter only model. We expect the hybrid model performs better than the filter only model, because the learning algorithm is wrapped into the feature evaluation and selection process. Therefore, the features chosen are more optimized for the specific algorithm to be used in the final learning stage. We next present the proposed conceptual framework in Figure 1. In Figure 1, the models are introduced in sequence. First, we have the baseline ZeroR model. It always uses the average value of the target variable in the training dataset as the predicted value for the target variable in the test dataset (Witten et al. 2011). The ZeroR model captures prior knowledge of the target variable, represented by its average value in the training dataset. To expand the ZeroR model, we next consider BOW-based models. Compared to the rudimentary ZeroR model, elements of the review content are now captured and modeled. In the first conceptual type of BOW-based models, we retain all the review words and create the BOW Full Model. We next apply the RReliefF dimension reduction method to produce the filter only model. We keep the top ranked 300 review words as features, consistent with the study by Ngo-Ye and Sinha (2012).

4 Research Track: DW/BI/Analytics from an application perspective Finally, we apply the wrapper method with the text regression algorithm support vector regression (SVR) to the filter only model, which contains the top ranked 300 review words. The subset of features selected by the wrapper enters the final hybrid model. Baseline ZeroR Model BOW Full Model Use All Review Words as Features for Text Regression Select Features with Filter Regressional ReliefF Filter Only Model Select Features with Wrapper Support Vector Regression Hybrid (Filter + Wrapper) Model Figure 1. Hybrid Feature Selection Approach for Text Regression Empirical Study and Results For our empirical study, we use 2718 online book reviews collected from Amazon.com. We next develop text regression models for predicting review helpfulness and empirically evaluate their performance. Each observation in the datasets is a unique review. Preprocessing of Review Text for BOW-based Datasets To generate datasets for the BOWFull model, we go through the following process. First, we tokenize the collected review text by removing all non-alphabetic characters. Next, we apply the Standard English stop-words filter to eliminate common stop-words, which do not carry substantial meaning. Then we cast all the terms to lower case. Finally, we employ the popular Porter Stemmer algorithm to reduce terms to their basic form. We have all the stemmed words that appear in our review text collection, which are used as independent variables for the BOWFull model. We use two types of index weighting schemes for BOW-based datasets. In the binary occurrence (BinaOccu) scheme, if a word appears in a review text, it is coed as 1. The absent word in a review text is coded as 0. In the term occurrence (TermOccu) scheme, the raw word count in a document is used as the term weight. In Table 1, we present the models, instantiated datasets, and the number of variables. Table 1. Instantiated Models and Number of Variables (including target class) Index Weighting Conceptual Models Datasets Number of Variables BinaOccu BOWFull A2718BinaOccu 7543 Filter Only A2718BinaOccuRel Hybrid (Filter + Wrapper) A2718BinaOccuRel300WRPGRF 98

5 Hybrid Text Regression Model for Predicting Review Helpfulness TermOccu BOWFull A2718TermOccu 7543 Filter Only A2718TermOccuRel Hybrid (Filter + Wrapper) A2718TermOccuRel300WRPGRF 83 Five Regression Performance Measures To gauge the regression performance of different models, we consider five measures (see Table 2). Four of them are error-based measures, where a smaller realized value implies better performance. One additional measure is the correlation coefficient, for which a larger realized value indicates better performance. Table 2. Five Regression Performance Measures MAE RMSE RAE RRSE CORR Mean absolute error Root mean squared error Relative absolute error Root relative squared error Correlation coefficient Regression Algorithm and Experiment Configuration For the BOW-based models, we adopt the state-of-the-art support vector regression (SVR) algorithm LibSVM s epsilon SVR (ε-svr) with linear kernel for our text regression problem. For the wrapper method, we use SVR to evaluate the features based on five-fold cross validation. For the search method of wrapper, we experiment with Greedy Stepwise. To achieve reliable estimation of model performance, we conduct 10 runs and within each run we have 10-fold cross validation (Witten et al. 2011). We report the overall average across 10 runs and 10 folds as the performance measure for a model. To compare two models, we match the corresponding 100 observations and run paired t- tests. Results of Text Mining Experiments To examine the efficacy of the proposed hybrid model, we compare it with the filter only model, the BOWFull model, and the ZeroR model. We set up the following comparison scenarios. First, we have two index weighting schemes (BinaOccu and TermOccu). Second, we have five regression performance measures. Therefore, we have a total of 2 X 5 = 10 unique scenarios. In Table 3, we report the regression performance of all models. We highlight the best one in each scenario in bold. We also conduct pairwise comparisons between the hybrid model and the filter only model. Table 3. Regression Performance of Hybrid, Filter, BOWFull, and ZeroR Models Performance Measures Index Weighting ZeroR BOWFull Filter Only Hybrid (Filter + Wrapper) MAE RMSE RAE RRSE CORR BinaOccu ** TermOccu ** BinaOccu ** TermOccu ** BinaOccu ** TermOccu ** BinaOccu ** TermOccu * BinaOccu ** TermOccu * From Table 3, we find that the hybrid models clearly outperform filter only models, BOWFull models, and ZeroR models. We also conduct paired t-tests between the hybrid model and the filter only model. In all

6 Research Track: DW/BI/Analytics from an application perspective 10 scenarios, the differences are statistically significant (* denotes p < 0.05 and ** denotes p < 0.001). Such consistent results strongly indicate the influence of the wrapper. In other words, the hybrid model of combining the filter and wrapper approaches is better than the filter only model. The hybrid approach proposed in this paper leads to better performance. In summary, among all the models examined in this study, the hybrid model is the most accurate one for predicting review helpfulness. The results also show that both the BOWFull model and the filter only model outperform the ZeroR model. Moreover, the filter only model performs better than the BOWFull model. Taken together, the empirical findings suggest that the conceptual framework presented in Figure 1 is a useful methodology for improving review helpfulness prediction incrementally. Next, we report the run times of the BOW-based models in Table 4. Table 4. Run Times for Hybrid, Filter, and BOWFull Models (in seconds) Index Weighting BOWFull Filter Only Hybrid (Filter + Wrapper) BinaOccu TermOccu Table 4 shows that the run time dramatically decreases from the BOWFull model to the filter only model, and again from the filter only model to the hybrid model. In real-world applications, after the features are determined by the feature selection process, the hybrid model has a strong computational advantage over the filter only model. The reason is that through the wrapper process, a much smaller feature set is selected for the hybrid model (see Table 1 for the number of variables in different BOW-based models). However, we need to point out that the enhanced accuracy and running time of the final hybrid model come with the additional computation cost of applying a wrapper method to select an optimal subset of features. But the computation cost of applying a wrapper is a one-time investment. After a subset of features for a review domain is determined, it can be reused over and over again. The significantly higher runtime efficiency of the hybrid approach makes it a very attractive option to employ. Conclusion and Future Directions It is worth noting that one of the main objectives of our study, if realized, would potentially provide a major benefit to business websites within a BI 2.0 context. That objective is to facilitate the estimation of the helpfulness of new reviews instantly so that businesses can dynamically adjust the presentation of those reviews. Toward that end, the proposed hybrid approach provides an innovative and practical framework for BI 2.0 applications. The empirical results demonstrate the applicability of the hybrid approach for the domain of predicting online review helpfulness. In this ongoing study, we are currently undertaking several experiments. In addition to the Greedy Stepwise search method, we are experimenting with Best First and Genetic Search. We employed SVR as the learning algorithm and also as the wrapper. Although SVR represents the state-of-the-art, we are currently examining other competitive algorithms. Since the target variable the number of helpful votes is a positive integer and SVR assumes the dependent variable to be a real number, we are considering other algorithms such as Poisson regression, which may be more suitable for the problem of predicting an integer outcome variable. For the dimension reduction techniques, we focused on the Regressional ReliefF for the filter-based approach. We are also examining other feature selection techniques, such as correlation-based feature selection, that have been reported to be effective. We empirically tested our proposed framework against book review data collected from Amazon.com. To make the findings more generalizable, we are evaluating other types of online reviews. We expect to have most of these results at the time of the BI Congress. Our current work differs from (Chou et al. 2010) in several ways. Their focus was on Internet abuse detection, whereas we focus on an important BI problem, that of predicting online review helpfulness. The major difference is that while they developed a hybrid text classification approach, we develop and test a hybrid text regression approach. We expand upon Ngo-Ye and Sinha s (2012) work by applying the hybrid approach to enhance their filter only model. This ongoing study makes important contributions to the literature. To the best of our knowledge, this is the first study to develop a hybrid feature selection approach for predicting online review helpfulness. It is

7 Hybrid Text Regression Model for Predicting Review Helpfulness also the first study to develop and apply a hybrid approach in the context of text regression. We build on the work of Chou et al. (2010), who had developed a hybrid text classification approach for detecting internet abuse. The initial findings of our study indicate that the hybrid approach is also attractive for text regression problems. The proposed hybrid framework, which makes use of both the filtered method and the wrapper method, is intuitively appealing and conceptually meaningful. The results from the empirical experiments we conducted demonstrate the viability and effectiveness of the proposed approach. With many interesting and challenging research issues ahead, the field of online social media is a fertile ground for research in the information systems discipline. Because our proposed framework is fairly general, it can be applied to different online social media studies. Given the increasing trend of exploiting Facebook and Twitter data for business intelligence, it would be interesting to conduct new empirical studies testing and extending our proposed framework in the context of tweets and Facebook posts and comments. The proposed framework could prove to be an effective and efficient method for businesses to harness market intelligence from online social media.

8 Research Track: DW/BI/Analytics from an application perspective References Abbasi, A., and Chen, H "CyberGate: A System and Design for Text Analysis of Computer Mediated Communications," MIS Quarterly (32:4), pp Abbasi, A., Chen, H., and Salem, A "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Transactions on Information Systems (26:3), pp. 12:1-12:34. Chou, C.-H., Sinha, A. P., and Zhao, H "A Hybrid Attribute Selection Approach for Text Classification," Journal of the Association for Information Systems (11:9), pp Duan, W., Cao, Q., and Gan, Q "Investigating Determinants of Voting for the "Helpfulness" of Online Consumer Reviews: A Text Mining Approach," in Proceedings of the Sixteenth Americas Conference on Information Systems, M. Santana, J. Luftman, A. Vinzé (eds.), Lima, Peru, pp Ghose, A., and Ipeirotis, P. G "Designing Novel Review Ranking Systems: Predicting Usefulness and Impact of Reviews," in Proceedings of the International Conference on Electronic Commerce (ICEC), Minneapolis, Minnesota, pp Hall, M. A., and Holmes, G "Benchmarking Attribute Selection Techniques for Discrete Class Data Mining," IEEE Transactions on Knowledge and Data Engineering (15:3), pp Kim, S.-M., Pantel, P., Chklovski, T., and Pennacchiotti, M "Automatically Assessing Review Helpfulness," in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia, pp Kohavi, R., and John, G. H "Wrappers for Feature Subset Selection," Artificial Intelligence (97:1-2), pp Liu, B "Sentiment Anlaysis and Subjectivity," in Handbook of Natural Language Processing, N. Indurkhya, and F. J. Damerau (eds.), Second Edition, pp Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., and Zhou, M "Low-quality Product Review Detection in Opinion Summarization," in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, pp Mudambi, S. M., and Schuff, D "What Makes A Helpful Online Review? A Study Of Customer Reviews On Amazon.com," (C. Saunders, Ed.) MIS Quarterly (34:1), pp Ngo-Ye, T. L., and Sinha, A. P "Analyzing Online Review Helpfulness Using a Regressional ReliefF- Enhanced Text Mining Method," ACM Transactions on Management Information Systems (3:2), pp. 10:1-10:20. Pang, B., and Lee, L "Opinion Mining and Sentiment Analysis," in Foundations and Trends in Information Retrieval, Vol. 2, pp Robnik-Sikonja, M., and Kononenko, I "Theoretical and Empirical Analysis of ReliefF and RReliefF," Machine Learning (53:1/2), pp Sebastiani, F "Machine Learning in Automated Text Categorization," ACM Computing Surveys (34:1), pp Turban, E., Sharda, R., Delen, D., and King, D Business Intelligence: A Managerial Approach, Second Edition, Upper Saddle River, NJ: Prentice Hall. Turney, P. D., and Pantel, P "From Frequency to Meaning: Vector Sapce Models of Semantics," Journal of Artificial Intelligence Research (37:January April), pp Watson, H. J., and Wixom, W. H "The Current State of Business Intelligence," IEEE Computer (40:9), pp Witten, I. H., Frank, E., and Hall, M. A Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, Burlington, MA, USA: Morgan Kaufmann. Zhang, Z "Weighing Stars: Aggregating Online Product Reviews for Intelligent E-commerce Applications," IEEE Intelligent Systems, (September/October), pp Zhang, Z., and Varadarajan, B "Utility Scoring of Product Reviews," in Proceedings of the ACM SIGIR Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, pp