Understanding Online Consumer Review Opinions with Sentiment Analysis using Machine Learning



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis

What is Candidate Sampling

Forecasting the Direction and Strength of Stock Market Movement

An Interest-Oriented Network Evolution Mechanism for Online Communities

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Calculation of Sampling Weights

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

An Alternative Way to Measure Private Equity Performance

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Mining Multiple Large Data Sources

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Can Auto Liability Insurance Purchases Signal Risk Attitude?

1. Measuring association using correlation and regression

Semantic Link Analysis for Finding Answer Experts *

A Secure Password-Authenticated Key Agreement Using Smart Cards

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

The OC Curve of Attribute Acceptance Plans

Gender Classification for Real-Time Audience Analysis System

Capacity-building and training

Multiple-Period Attribution: Residuals and Compounding

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

A DATA MINING APPLICATION IN A STUDENT DATABASE

Implementation of Boolean Functions through Multiplexers with the Help of Shannon Expansion Theorem

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

How To Analyze News From A News Report

DEFINING %COMPLETE IN MICROSOFT PROJECT

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Single and multiple stage classifiers implementing logistic discrimination

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Improved SVM in Cloud Computing Information Mining

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Performance Analysis and Coding Strategy of ECOC SVMs

Using Association Rule Mining: Stock Market Events Prediction from Financial News

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Multi-sensor Data Fusion for Cyber Security Situation Awareness

1 Example 1: Axis-aligned rectangles

Predicting Software Development Project Outcomes *

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Enterprise Master Patient Index

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

SIMPLE LINEAR CORRELATION

A Programming Model for the Cloud Platform

Design and Development of a Security Evaluation Platform Based on International Standards

Enabling P2P One-view Multi-party Video Conferencing

LAW ENFORCEMENT TRAINING TOOLS. Training tools for law enforcement officials and the judiciary

Project Networks With Mixed-Time Constraints

Exploiting Recommendation on Social Media Networks

Searching for Interacting Features for Spam Filtering

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

The Greedy Method. Introduction. 0/1 Knapsack Problem

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Overview of monitoring and evaluation

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

How To Calculate The Accountng Perod Of Nequalty

Detecting Credit Card Fraud using Periodic Features

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Lecture 2: Single Layer Perceptrons Kevin Swingler

ADVERTISEMENT FOR THE POST OF DIRECTOR, lim TIRUCHIRAPPALLI

IMPACT ANALYSIS OF A CELLULAR PHONE

Smart Home Security System Based on ANFIS

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

Statistical Approach for Offline Handwritten Signature Verification

Statistical Methods to Develop Rating Models

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Transcription:

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng Chrstopher C. Yang College o Inormaton Scence and Technology Drexel Unversty, PA, USA Xunng Tang College o Inormaton Scence and Technology Drexel Unversty, PA, USA Y. C. Wong Dgtal Lbrary Laboratory Chnese Unversty o Hong Kong, Hong Kong Chh-Png We Department o Inormaton Management Natonal Tawan Unversty, Tawan Abstract Wth the advent o Web.0 technologes, the Web has evolved to become a popular channel o communcaton and nteracton between Web users and onlne consumers. Socal meda, unlke tradtonal meda, have rch but unorganzed content contrbuted by users, oten n ragmented and sparse ashon. Users usually spend a lot o ther tme lterng useless normaton and yet are not able to capture the essence. In ths study, we ocus on user-contrbuted revews o products, whch many onlne consumers use to support ther purchase decsons by dentyng products that best t ther preerences. In the recent years, sentment classcaton and analyss o onlne consumer revews has drawn sgncant research attenton. Most exstng technques rely on natural language processng tools to parse and analyze sentences n a revew, yet they oer poor accuracy, because the wrtng n onlne revews tends to be less ormal than wrtng n news or journal artcles. Many opnon sentences contan grammatcal errors and unknown terms that do not exst n dctonares. Thereore, ths study proposes two supervsed learnng technques (class assocaton rules and naïve Bayes classer) to classy opnon sentences nto approprate product eature classes and produce a summary o consumer revews. An emprcal evaluaton that compares the perormance o the class assocaton rules technque and the naïve Bayes classer or sentment analyss shows that our proposed technques acheve more than 70% o the macro and mcro F-measures. Keywords: Opnons mnng, Web mnng, Electronc commerce, Machne learnng, Sentment classcaton, Sentment analyss, Text mnng, Socal meda analytcs Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 73

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. Introducton The massve volume o user-contrbuted content on the Web gets updated so requently that t s almost mpossble or search engnes to ndex t and oer real-tme searchng capablty. Yet users contnue to hunger or up-todate normaton to support ther tasks. In partcular, the Web s an excellent source or gatherng consumer revew opnons (Ble and McAulle, 008; Dellarocas, 003; Dng et al., 008; Lu and Yu, 008; Forman et al., 008; Godes and Mayzln, 004). On modern busness-to-consumer (BC) electronc commerce platorms, consumers not only browse onlne catalogs and make purchases but also search or opnons posted by other consumers to support ther purchase decson makng. Currently, many Web portals n addton to BC webstes (e.g., amazon.com) provde onlne consumer revew systems that enable users to submt and retreve consumer revews. For example, epnon.com, Retetall.com, and cnet.com are some popular onlne consumer revew webstes. They provde a combnaton o ormats or users to submt revew opnons, ncludng open text boxes, lsts o pros and cons, and ratngs on an n- pont scale. I a user clcks on a partcular product, he or she sees a lst o consumer revews contrbuted by others. However, t becomes tedous and tme consumng to browse through a long lst o consumer revews, especally or comparsons o several products. A revew summary or each product that lsts the pros and cons o each eature thus s desrable; a general ratng s not as sucent, because potental consumers cannot denty whch products that best match ther concerned or preerred product eatures. For example, some potental dgtal camera consumers mght worry about the prce and mage qualty, whereas others are more nterested n the battery le and lenses. To address such challenges, extant research has nvestgated two man problems, sentment classcaton and sentment analyss (Lu et al., 005). Sentment classcaton determnes the sentment orentaton, whether postve or negatve, o an opnon text. On the other hand, sentment analyss extracts the product eatures that an opnon text descrbes. Because most onlne consumer revew webstes provde separate nput sectons or pros, cons, and ratngs, the sentment orentaton s explct, but the descrbed product eatures reman hdden n the text. Accordngly, sentment analyss s the ocus o ths paper. Recent research on sentment analyss reles on natural language processng and lngustc technques (Scad et al., 007; Hu and Lu, 004a; Lu et al., 005; Popescu, and Etzon, 005; Zhang and Varadarajan, 006), whch perorm well when the text can be parsed accurately by natural language processng tools. However, text n Web revew opnons generally s less rgorous than the wordng n ormal documents, such as busness reports, news documents, or journal artcles. The text n Web revew opnons oten does not conorm to lngustc and grammatcal rules, and t even mght not nclude complete sentences. In addton, many new terms appear n Web revew opnons, ncludng techncal terms and name enttes such as mp3, 3G, and Pod. In response, we nvestgate n ths study a machne learnng approach or classyng the product eatures descrbed n Web revew opnons, even the language s normal. The organzaton o the remander paper s as ollows: In Secton, we revew pror related studes. In Secton 3, we present the machne learnng approach or classyng consumer revew opnons, ncludng class assocaton rules and the naïve Bayes classer. We next descrbe the desgn o our emprcal evaluatons and dscuss mportant expermental results n Secton 4. We conclude n Secton 5 wth a summary and some potental urther research deas. Related Work As Web.0 technques become ncreasngly popular, the number o onlne consumer revews, orums, and blogs are expandng rapdly. However, t s dcult or users to dgest all ths normaton unless an automatc summary s avalable. Socal meda summarza- 74 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. ton takes ragmented messages as nputs and produces an overvew o the usergenerated content. In partcular, sentment classcaton and analyss attempt to summarze onlne consumer revew opnons and thereby enable users to compare consumer revews across multple products by hghlghtng the pros and cons o ther product eatures. When Web users submt opnons about a partcular product, they wrte postve and negatve comments about vared product eatures. To provde a detaled summary and comparson across multple products, we need to compare the rato o postve and negatve comments about each product eature across multple products. Thereore, potental consumers can easly denty the product that receves postve comments n relaton to the product eatures n whch they are most nterested. Sentment classcaton determnes the orentaton o a revew opnon, regardless o the product eatures, whereas sentment analyss extracts revew opnons nto specc product eature classes (Lu et al., 005). Socal Meda Summarzaton Socal meda summarzaton ntegrates Web opnons through the uncton o topc modelng to generate a hgh qualty summary. Its crtcal component s the extracton o aspects o an object that users rate requently n a set o onlne revews (Ble and McAule, 008; Me et al., 007; Ttov and McDonald, 008; Zha et al., 004). For example, Ttove and McDonald (008) employ mult-gran latent Drchlet allocaton (LDA) to extract ratable aspects by clusterng mportant terms accordng to a local topc. Wang et al., (008) propose a novel model to create a compressed summary, whle retanng the man characterstcs o the orgnal set o documents. They rst calculate sentence-to-sentence smlartes usng semantc analyss to construct a smlarty matrx, then conduct symmetrc matrx actorzaton to group sentences nto clusters. Fnally, they select the most normatve sentences to represent the set o documents. Although ths model has perormed well wth well-structured data, t does not necessarly work eectvely wth Web.0 content, whch tends to be less organzed. Lu and Zha (008) propose a summarzaton technque based on sem-supervsed LDA and probablstc latent semantc analyss (PLSA) models. They employ a well-wrtten expert revew as a template or ncorporatng other opnons scattered across varous sources. Accordngly, ths technque generates an opnon summary that conssts o expert revews and supplemental opnons. However, the qualty o summary depends on the selected expert revew. Socal meda summarzaton generates summares by topc modelng but does not work at the sentence level to denty specc product eatures on whch a user comments. That s, topcs n socal meda summarzaton are not necessarly the product eatures n onlne consumer revews. Nor does ths technque am to classy the orentaton o the product eatures that onlne consumers revew. The man objectve nstead s to extract representatve and normatve sentences that summarze a vast collecton o socal meda content. As mentoned, we ocus on onlne consumer revews n ths study. The two related research problems are sentment classcaton and sentment analyss. Sentment Classcaton Sentment classcaton s a document classcaton task, whch nvolves two classes: postve and negatve. Sentment classcaton thereore consders sentment rather than topcs n tradtonal document classcaton. Several supervsed learnng approaches or sentment classcaton have been nvestgated. Webe et al., (1999) and Hatzvassloglou and Webe (000) denty nouns and adjectves, whch are ndcatve o postve or negatve opnons; Turney (00) uses mutual normaton between term phrases and postve and negatve words such as excellent and poor to denty opnons. We et al., (006) employ two comprehensve lsts o postve and negatve words rom the General Inqurer (avalable at http://www.wjh.harvar.edu/~nqurer/) to acltate sentment classcaton tasks. To solve the context-dependent and conlctng opnon Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 75

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. word problems, Dng et al., (008) requre extensve manual eort and hghly depend on natural language processng rules. In general, these sentment classcaton technques rely on sets o postve and negatve lexcons to denty other ndcatve terms that mght appear n revew opnons. However, postve and negatve ndcators vary or derent knds o products. The sets o postve and negatve ndcators are not nterchangeable but talored or specc products. Recent onlne consumer revew systems also provde user nteraces that separate postve and negatve nputs, rather than usng only one nput eld n a ree text ormat. Thus, sentment classcaton becomes less challengng n practce, because users submt ther postve and negatve opnons separately. Sentment Analyss Sentment analyss s more complcated than sentment classcaton: It classes revew opnons nto several product eature classes, whch vary n number across derent types o products. Several research works pertanng to sentment analyss use supervsed or unsupervsed learnng. For example, Hu and Lu (004a; 004b) and Lu et al. (005) use the NLProcessor lngustc processor (see http://www.nogstcs.com) to parse revew opnons and extract nouns and noun phrases. Wth the use o an assocaton rule mnng algorthm, they then dscover all requent product eatures n these noun phrases extracted prevously. Sentences wthout any dscovered product eatures and opnon words wll not be classed. However, the accuracy o ths unsupervsed learnng method depends on the perormance o the parsng by NLProcessor, and extensve manual eort s needed to adjust the taggng result o NLProcessor beore the revew opnons can eectvely be classed. Jndal and Lu (006a; 006b) propose another technque that ntegrates pattern dscovery and supervsed learnng approaches to ndenty comparatve sentences n text documents. Ther technque s useul or sentmental analyss, because t can extract product eatures and opnon words n comparatve sentences. However, t also reles on POS taggng and keywords, and t has trouble dealng wth normal Web language exstng wdely n onlne revews and web blogs. We et al. (010) propose a semantc-based approach that explots lsts o postve and negatve adjectves dened n General Inqurer to recognze opnon words semantcally and thereby extract product eatures. However, ths technque requres the avalablty o lsts o postve and negatve lexcons. Popescu and Etzon (005) use a set o doman-ndependent extracton patterns, predened n a Web normaton extracton system (KnowItAll) to nstantate specc extracton rules or each product eature class. Kobayash et al. (004; 005) also adopt an normaton extracton approach to extract product eatures. However, predened extracton patterns are requred or these normaton-extracton-based technques. In ths study, we nstead propose the use o two supervsed learnng technques, class assocaton rules and the naïve Bayes classer, to assgn product eatures wthout usng lexcon sets, natural language processng, or predened extracton patterns. Rather, we use only a tranng data set. Supervsed Learnng or Classyng Consumer Revew Opnons Supervsed learnng smulates the way humans learn rom ther past experences to acqure knowledge and thus perorm practcal tasks n decson makng and classcaton. Supervsed learnng has been wdely used or document classcaton. Speccally, t takes a set o preclassed tranng documents to develop a classcaton model, whch then can classy any new documents nto one or more predened classes. However, n sentment analyss or onlne revew opnons, the unt o analyss s a sentence, rather than a document. A consumer revew contrbuted by a Web user conssts o multple sentences, each o whch may reer to one or more product eatures. For example, the sentence The dgtal camera takes good pctures but t s not expensve at all descrbes two product eatures: mage qualty and prce. It s also possble 76 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. that a sentence does not descrbe any product eature. For nstance, a revewer may tell a story o usng a product whle on vacaton: I took the Canon G7 wth me n my summer vacaton at Yosemte. Ths sentence only descrbes the event, nothng about hs or her sentment on any product eatures. Thereore, the number o product eatures n an opnon sentence can range rom 0 to n, where n s the number o possble product eatures o the ocal product. In our prelmnary study, we have ound that more than 50% o consumer revew sentences do not descrbe any product eatures, yet documents n document classcaton are always classed nto one or more classes o topcs. Ths large amount o nose makes sentment analyss not trval. In addton, dentyng the product eature(s) rom a sentence s more challengng than dentyng a document topc, because a document s much longer and contans more term eatures to support classcaton. Furthermore, the class o a document can be determned accordng to multple representatve term eatures n a document. However, an opnon sentence may contan only one term eature that descrbes a product eature. Accurately extractng ths term eature rom a short sentence also s nontrval. We model the problem o conductng a sentment analyss o consumer revew opnons as a supervsed learnng task. For a product P, there s a set o consumer revews R = {r 1, r,, r R }. For each revew r, there s a set o opnon sentences S = {s 1, s,, s S }. For each product P, there also s a set o product eature classes F = { 1,,, F }. Some term eatures, T j= {t j1, t j, }, are assocated wth each j. For example, AA and lthum are term eatures assocated wth the product eature battery or a dgtal camera, whereas GB and Compact Flash are term eatures assocated wth the product eature memory. In the ollowng, we detal the two supervsed learnng technques that we use or classyng consumer revew opnons, ncludng class assocaton rules and the naïve Bayes classer. Class Assocaton Rules The goal o the class assocaton rule mnng s to extract assocatons between term eatures n consumer revew opnons and product eatures or a partcular product that cooccur requently. A set o preclassed opnon sentences provdes tranng examples or determnng the class assocaton rules. Each opnon sentence can be labeled wth one or more product eatures j, or no product eature, that s, none. The class assocaton rule mnng extracts all ruletems wth support equal to or hgher than a prespeced mnmum support threshold and condences equal to or hgher than a prespeced mnmum condence threshold. We dene a ruletem as (t jk, j ), where t jk T j and j F, and we establsh the class assocaton rule as: t jk j, where t jk T j and j F. Wth a labeled opnon sentence, we rst remove stop words that do not bear any semantcs. All ungrams and b-grams then wll be extracted rom the sentence. In ths study, we do not consder n-grams that are longer than two terms, because the requences o most longer n-grams are too low to be ncluded n class assocaton rules. Moreover, n most cases, these n-grams are not vald words but combnatons o broken words. Usng the extracted ungrams and b-grams rom an opnon sentence, we generate a set o canddate ruletems by assocatng each ungram or b-gram wth every product eature labeled or the opnon sentence. Subsequently, the set o canddate ruletems are stored and then accumulated as we process all labeled opnon sentences. Ater all o the canddate ruletems have been generated rom the set o tranng opnon sentences, class assocaton rule mnng extracts the rules usng two parameters, mnmum support threshold and mnmum condence threshold. That s, a class assocaton rule, t, s deduced : count( t, ) Support( t, ) = mnmumsupport threshold N count( t, ) Con ( t, ) = mnmumconerencetheshold count( t) Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 77

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. where count(t, ) s the number o opnon sentences wth the term eature (ungram or b-gram) t labeled, count(t) s the number o opnon sentences wth the term eature t, and N s the total number o opnon sentences or tranng. As we dscussed, the class assocaton rules we extract are not lmted to assocatons between a sngle term (.e., ungram) and a product eature. In some cases, phrases (.e., b-grams) appear n opnon sentences to descrbe product eatures, such as lash card n assocaton wth the product eature Memory. The extracted class assocaton rules rom these ungrams and b-grams may conlct though. Let t 1 be a ungram, t be a bgram, and t consst o t 1. In ths case, t 1 a conlcts wth t b a b. For example, lash Flash and lash card Memory conlct. The term eature lash s assocated wth the product eature Flash, but the term eature lash card s assocated wth Memory. When such conlct occurs, the rule extracted rom a b-gram (.e., phrase) overrdes the rule extracted rom a sngle term, rather than comparng the support or condence levels o the two conlctng class assocaton rules, because a phrase descrbes a product eature more speccally than a sngle term. Naïve Bayes Classer The naïve Bayes classer s a probablstc classer, based on the Bayes theorem. It assumes class-condtonal ndependence, such that the chance a term appears n an opnon sentence s ndependent o the chances that other terms appear n the same opnon sentence or a gven product eature. It treats each opnon sentence as a bag o terms, so the probablty o a term appearng also s ndependent o ts poston n the opnon sentence. Accordngly, the naïve Bayes classer estmates the posteror probablty o each product eature, gven the target opnon sentence os, accordng to the Bayesan rule: os) = ) os os) ) where ) s the probablty that the product eature appears, os ) s the condtonal probablty that the opnon sentence os occurs gven, and os) s the probablty that the opnon sentence os occurs. Evdently, we can gnore os) when estmatng os), because t s dentcal or all product eatures, and the relatve values o os) can determne product eature(s) n the opnon sentence os: p ( os) ) os ) The pror probablty o a product eature, ), s computed as: n( ) ) = n where n represents the total number o opnon sentences n the tranng data set and n( ) denotes the number o opnon sentences n the tranng data set that are labeled. Assumng that the target opnon sentence os has a set o terms T and a term t k n T s ndependent o all other terms n T. Accordngly, we can calculate the condtonal probablty that the opnon sentence os occurs, gven, or os ), as: p ( os ) = t ) tk T k Furthermore, the probablty o the occurrence o a term t k, gven a product eature, or P(t k ), also can be computed as the number o opnon sentences that nclude t k, gven that the opnon sentences are labeled, dvded by the number o opnon sentences labeled n the tranng data set. That s, n( tk, ) p ( tk ) = n( ) Accordngly, we can derve the posteror probablty o the target opnon sentence os beng assgned to a product eature, os), as ollows: n( ) os) ) os ) = n tk T n( tk, ) n( ). 78 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. Usng ths ormulaton, we classy the target opnon sentence os to the product eature wth the hghest posteror probablty. That s, ArgMax os) F The naïve Bayes classer s a powerul classcaton tool; however, ts perormance depends greatly on the term eatures t k selected rom T or sentment analyss. Besdes, ts ecency declnes wthout such eature selecton. In an opnon sentence, there are many rrelevant and redundant terms. For example, users may dscuss the camera they have purchased and the pctures they have shot. The terms purchase and shoot appear requently n opnon sentences about cameras, but these terms do not contrbute to the classcaton o opnon sentences to product eatures and thus should be regarded as noses. By lterng these terms, we lkely mprove classcaton accuracy and ecency, as well as the nterpretablty o the classer. In ths study, we nvestgate two eature selecton metrcs, normaton gan and ch-square (χ ), to select the representatve term eatures or the naïve Bayes classer. Inormaton Gan The normaton gan metrc assesses the amount o normaton obtaned by a term t or class predcton usng the absence and presence o t n the tranng data set (Yang and Pedersen, 1997). A term wth a hgh normaton gan can reduce the normaton needed to classy the data set; that s, t reduces the mpurty or dsorder o the data set. The expected normaton needed to classy a gven data set usng a term t s called the entropy o t. To compute the entropy o a term t, we must consder the dstrbuton o all product eatures or the presence and absence o t, t) and t ), respectvely: E t ) ( t) = t) t)log t) t ) t )log F F The orgnal entropy, based on the dstrbuton o the tranng data set across all product eatures, s: E ) ( F) = ) log F The normaton gan or a term t s the expected reducton n entropy by consderng the term, or G(t) = E(F) E(t): G( t) = + t ) F F ) log t ) log ) + t) t ) F t) log t) I the normaton gan s low, the presence or absence o the term s not mportant or determnng the product eature class. For ths study, we select only those terms wth normaton gans equal to or hgher than a predened threshold or the naïve Bayes classer. All other terms wth normaton gans lower than the threshold are dscarded. Ch-Square (χ ) The χ metrc evaluates the statstcally sgncant derence between proportons or a term and a product eature class. It thus measures whether observatons o two varables, expressed n a contngency table, are ndependent. The χ metrc between a term t and a product eature s calculated as ollows: χ ( t, ) = ( t ) t ) t ) t )) ) ) t) t) For each term and each product eature n the tranng data set, we compute ts χ value. To evaluate the goodness o a term, we aggregate cross-class χ values, generally usng ether the maxmum χ or the average χ method, as ollows: χ χ max avg ( t) = Maxχ ( t, ) F ( t ) = ) χ ( t, ) F I a term lacks strong classcaton power, ts χ values across the product eature classes wll be close. In other words, the derences between the χ values across product eatures classes are small. In ths case, ths term would not be useul or the naïve Bayes classer, because ts contrbutons to derent product eature classes are approxmately the same, and t does not help denty a correct product eature class. In ths study, we there- Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 79

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. ore lter out terms or whch the derence between the maxmum χ and the average χ s less than a predened threshold δ (.e., we remove term t χ ( t) χavg ( t) < δ ). max Emprcal Evaluaton We have conducted experments to evaluate emprcally the perormance o our proposed supervsed learnng technques (.e., class assocaton rules and naïve Bayes classer) or sentment analyss. In the rst experment, we evaluated the mpact o the mnmum support and mnmum condence thresholds on the eectveness o the class assocaton rules technque. In a second experment, we nvestgated the mpact o the two term eature selecton metrcs (.e., normaton gan and χ ) on the eectveness o the naïve Bayes classer. Fnally, n our thrd experment, we compared the perormance o the class assocaton rules technque and naïve Bayes classer or sentment analyss. Data Set To prepare a data set or emprcal evaluaton purposes, we developed a Web crawler to gather onlne consumer revews rom amazon.com about sx dgtal camera models. For these sx dgtal camera models, we collected 14 consumer revews and 3,000 opnon sentences, wth an average o 14.0 sentences n each consumer revew. The sx dgtal camera models were: Canon EOS 0D 8.MP Dgtal SLR Camera Nkon D70 Dgtal SLR Camera Kt Canon Powershot SD300 4MP Dgtal Elph Camera wth 3x Optcal Zoom Sony Cybershot DSCP00 7.MP Dgtal Camera 3x Optcal Zoom Canon Powershot S IS 5MP Dgtal Camera wth 1x Optcal Image Stablzed Zoom Canon Powershot A95 5MP Dgtal Camera wth 3x Optcal Zoom Ater segmentng the collected consumer revewers nto sentences, we labeled each sentence as contanng zero or more product eatures. Across ths whole data set, we dented eght product eatures: (1) battery, () lash, (3) mage qualty, (4) lens, (5) memory, (6) prce, (7) usablty, and (8) vdeo. Evaluaton Metrcs We employed precson, recall, and F- measure as evaluaton metrcs; they are common n normaton retreval and document classcaton research. Precson measures the number o correctly classed tems out o the total classed by a classcaton technque, and recall measures the amount o correctly classed tems o those manually classed as the gold standard. The F- measure s the harmonc mean o precson and recall, whch oers a better measure than the arthmetc mean o precson and recall, because t s not strongly aected by extreme values o ether. For any product eature, the precson (p ), recall (r ), and F-measure (F measure ) can be computed as ollows: p r TP = TP + FP TP = TP + FN F measure p r = p + r where TP s the number o opnon sentences correctly labeled wth by a classcaton technque, FN s the number o opnon sentences ncorrectly labeled wth other product eatures by a classcaton technque that should be labeled wth accordng to the gold standard, and FP s the number o opnon sentences ncorrectly labeled wth by a classcaton technque. We used mcro and macro measurements o precson and recall to evaluate the overall perormance. The mcro and macro mea- 80 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. surements are useul to determne a classcaton technque under nvestgaton perorms better n partcular product eature class or perorms equally well across all product eature classes because the class szes may not be unorm. Thus, mcro - precson = mcro - recall = mcro F - measure F F TP TP F + F + TP TP F F FN mcro - precson mcro - recall = mcro - precson + mcro - recall macro- precson = macro- recall = F F F macro F - measure F r p FP macro - precson macro - recall = macro - precson + macro - recall Experment I We used a veold cross-valdaton to evaluate the mpact o the mnmum support and condence thresholds on the class assocaton rules technque. Speccally, the data set was parttoned nto ve subsets o approxmately equal sze. For each old, a sngle subset served as the testng set and the remanng our subsets were combned to orm the tranng set. The overall eectveness was then estmated by averagng the eectveness obtaned rom these ve olds. We selected support values rangng rom 0.005 to 0.007 and condence values rom 0.35 to 0.65. Fgure 1 shows the macro F- measure o the class assocaton rules technque wth these support and condence values; Fgure shows the mcro F-measure o ths technque wth the selected support and condence values. The optmal macro F- measure was 7.6%, and the best mcro F- measure was 70.77%. These optmal values emerged when the mnmum support value was 0.006 and the mnmum condence value was 0.5. Furthermore, the mcro F-measure decreased quckly when the mnmum condence threshold decreased below 0.5 but only slghtly when the mnmum condence threshold ncreased beyond 0.5. In contrast, the eectveness o the class assocaton rules technque was less senstve to the mnmum support threshold. It can be explaned by two observatons. The opnon sentences are relatvely short, and thereore, there are relatvely ewer terms n each opnon sentence. In addton, there are also relatvely ewer labeled opnon sentences or some partcular product eatures. That means these product eatures are not requently dscussed n the onlne consumer revews. The mpact o the unbalanced data appeared n the orm o senstvty to the mnmum condence threshold. Experment II As we dscussed n Secton 3., term eature selecton s lkely to mprove the perormance o the naïve Bayes classer. In ths experment, we agan used a ve-old crossvaldaton to nvestgate the mpact o the two term eature selecton metrcs, normaton gan and, on the eectveness o the naïve Bayes classer. Beore computng the normaton gan and values, we ltered out rare terms wth sentence requences o less than 10, because rare terms lack sucent support and thus would not be useul or our classcaton. Fgure 3 shows the macro F-measure o the naïve Bayes classer usng normaton gan or term eature selecton. Fgure 4 shows the mcro F-measure or the same scenaro. The optmal macro F-measure was 70.19%, and the optmal mcro F-measure was 67.19%, whch occurred when the normaton gan value was 0.01. Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 81

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. 0.75 0.7 0.7175 0.715 0.715 Macro F-score 0.71 0.7075 0.705 0.705 0.7 0.6975 0.695 0.695 0.69 0.35 0.4 0.45 condence 0.5 0.55 0.6 0.65 0.007 0.0065 0.006 0.0055 support 0.005 Fgure 1 - Macro F-measure wth derent mnmum support and condence thresholds 0.71 0.705 0.7 0.695 0.69 0.685 0.68 0.675 0.67 Mcro F-score 0.665 0.66 0.655 0.65 0.645 0.64 0.635 0.63 0.65 0.6 0.35 0.4 0.45 condence 0.5 0.55 0.6 0.65 0.007 0.0065 0.006 0.0055 support 0.005 Fgure -Mcro F-measure wth derent mnmum support and condence thresholds 8 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 macro recall macro precson macro -score 0.4 0.00 0.004 0.006 0.008 0.01 0.01 0.014 0.016 0.018 0.0 Inormaton Gan Fgure 3 - Macro F-score values wth derent normaton gan thresholds or the naïve Bayes classer 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 mcro recall mcro precson mcro -score 0.4 0.00 0.004 0.006 0.008 0.01 0.01 0.014 0.016 0.018 0.0 Inormaton Gan Value Fgure 4 - Mcro F-score values wth derent normaton gan thresholds or the naïve Bayes classer Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 83

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. 0.8 0.75 0.7 0.65 0.6 macro recall macro precson macro -score 0.55 100 10 140 160 180 00 0 40 60 80 Ch Square Value Fgure 5 - Macro F-score values wth derent χ thresholds or the naïve Bayes classer 0.8 0.75 0.7 0.65 0.6 0.55 mcro recall mcro precson mcro -score 0.5 100 10 140 160 180 00 0 40 60 80 Ch Square Value Fgure 6 - Mcro F-score values wth derent χ thresholds or the naïve Bayes classer 84 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. In turn, Fgure 5 shows the macro F-measure o the naïve Bayes classer that used the χ metrc or term eature selecton; Fgure 6 contans the concomtant mcro F-measure. The optmal macro F-measure o 7.77% and optmal mcro F-measure o 68.9% emerged when the χ value was 0. It s also noteworthy that the optmal macro-precson and macro-recall o the naïve Bayes classer usng χ consstently were hgher than those o the naïve Bayes classer usng normaton gan. However, the optmal mcro-precson o the naïve Bayes classer usng χ was hgher than that o the naïve Bayes classer usng normaton gan, whereas the optmal mcrorecall usng χ was lower than the one usng normaton gan. Thereore, the results overall ndcated that the χ metrc or term eature selecton obtaned better perormance n the naïve Bayes classer. Experment III To compare across the class assocaton rules technque and naïve Bayes classer wth normaton gan and χ as term eature selecton metrcs, we created Table 1 to present ther macro-precson, macro-recall, macro F-measure, mcro-precson, mcrorecall, and mcro F-measure. The class assocaton rules technque obtaned comparable macro F-measures to those o the naïve Bayes classer wth χ, wth a derence o only 0.51%. However, the class assocaton rules technque acheved a hgher mcro F- measure than dd the naïve Bayes classer wth χ, at a derence o 3.34%. In contrast, the naïve Bayes classer wth χ obtaned substantally hgher macro- and mcroprecson than the class assocaton rules technque, whereas the latter obtaned substantally hgher macro- and mcro-recall than the naïve Bayes classer wth χ. The derences n macro- and mcro-precson were 8.48% and 6.33%, respectvely, whereas the derences n macro- and mcro-recall were 7.80% and 9.70%, respectvely. Thus, the class assocaton rules technque classed more sentences wth correct product eatures but also generated more errors (alse postves). The naïve Bayes classer nstead made more accurate assgnments but also mssed more sentences n the classcatons (alse negatves). The class assocaton rules technque obtaned hgher recall because t classed opnon sentences to product eatures whenever there was sucent support and condence based on a term eature that appeared n the opnon sentence. However, these terms also could appear n other sentences that were not dscussng the predcted product eature, whch sacrced precson. The naïve Bayes classer used the term eature dstrbuton to determne the probablty o an opnon sentence beng classed to a product eature, such that some term eatures contrbuted to a product eature but others lowered the chance. Usng the dstrbuton o terms n an opnon sentence, the naïve Bayes classer acheved hgher precson, but t also produced more alse negatves by rejectng some opnon sentences that should have been classed to a product eature. In general, the supervsed learnng approach usng the class assocaton rules technque or the naïve Bayes classer acheves satsactory classcaton eectveness. The natural language processng approach commonly employed by pror studes requres excessve manual eorts to mody ncorrectly parsed sentences, whereas our proposed supervsed learnng technques do not requre any natural language processng tools or taggng. Concluson The Web.0 has acltated user nteractons on Internet platorms, n whch people share ther opnons and contrbute ther content. Electronc commerce thus has extended beyond BC to nclude consumer-toconsumer marketplaces. Consumers have a desre to compare products beore makng a purchase decson, and onlne consumer revews provde valuable normaton that enables them to denty specc products that t ther personal preerences. However, the vast volume o onlne consumer revews avalable on the Web makes browsng through them and achevng a systematc comparson manually not trval. In response, Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 85

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. Table 1 - Comparson o best results provded by derent term eature selecton metrcs Naïve Bayes Classer Class Assocaton Rules Inormaton Gan χ Macro average Mcro average Precson 73.96% 76.1% * 67.64% Recall 66.79% 69.70% 77.50% * F measure 70.19% 7.77% * 7.6% Precson 68.30% 73.80% * 67.47% Recall 66.11% 64.70% 74.40% * F measure 67.19% 68.9% 7.6% * we develop supervsed learnng technques or sentment analyss o onlne consumer revews. We have nvestgated the class assocaton rules technque and the naïve Bayes classer as methods to classy product eatures that consumers descrbe n ther revews, whch then should produce a summary o comparsons between products at the product eature level. Rather than comparng general ratngs o products, sentment analyss allows users to compare and denty products accordng to ther preerred product eatures. In our emprcal evaluaton, the class assocaton rules technque and the naïve Bayes classer wth χ as the term eature selecton metrc produce comparable results, n terms o ther macro F-measure, though the ormer perorms better on the mcro F-measure. However, the two methods acheve derent results n terms o precson and recall. The naïve Bayes classer wth χ or term eature selecton perorms substantally better n both macro- and mcroprecson; the class assocaton rules technque perorms substantally better or both macro- and mcro-recall. In the uture, we shall nvestgate the mpact o the sze o tranng set on the eectveness o these proposed supervsed learnng technques. In addton, we shall explore potental unsupervsed learnng approaches or sentment analyss, whch s challengng because opnon texts tend to be short and sparse, and classyng opnon texts by the smlartes o such short and sparse sentences s dcult. Concept mappng represents a potental soluton. However, relyng on exstng ontology s not an deal soluton, because the terms used n socal meda are constantly evolvng. Eectve ontology learnng methods and approprate concept mappng mechansms are potental solutons to ths problem. Reerences C. Scad, K. Berho, E. Chang, M. Felker, H. Ng, and C. Jn, Red Opal: Product-Feature Scorng rom Revews, Proceedngs o the 8th ACM Conerence on Electronc Commerce, San Dego, CA, June 007, pp. 18-191. D. M. Ble and J. D. McAule, Supervsed Topc Models, Advances n Neural Inormaton Processng Systems (NIPS), 008. C. Dellarocas, The Dgtzaton o Word o Mouth: Promse and Challenges o Onlne Feedback Mechansms, Management Scence (49:10), October 003, pp. 1407-144. X. Dng, B. Lu, and P. S. Yu, A Holstc Lexcon-based Approach to Opnon Mnng, Proceedngs o the Internatonal Conerence on Web Search and Web Data Mnng, Palo Alto, CA, February 008, pp. 31-40. C. Forman, A. Ghose, and B. Weseneld, Examnng the Relatonshp between Revews and Sales: The Role o Revewer Identty Dscloser n Electronc Markets, Inormaton Sys- 86 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. tems Research (19: 3), September 008, pp. 91-313. D. Godes, and D. Mayzln, Usng Onlne Conversatons to Study Word o Mouth Communcaton, Marketng Scence (3:4), Fall 004, pp. 545-560. V. Hatzvassloglou, and J. Webe, Eects o Adjectve Orentaton and Gradablty on Sentence Subjectvty, Proceedngs o 18th Internatonal Conerence on Computatonal Lngustcs (COLING), Saarbrucken, Germany, 000, pp. 99-305. M. Hu and B. Lu, Mnng and Summarzng Customer Revews, Proceedngs o the 10th ACM SIGKDD Internatonal Conerence on Knowledge Dscovery and Data Mnng, Seattle, WA, 004a, pp. 168-177. M. Hu, and B. Lu, Mnng Opnon Features n Customer Revews, Proceedngs o Amercan Assocaton or Artcal Intellgence (AAAI) Conerence, 004b, pp. 755-760. N. Jndal and B. Lu, Identyng Comparatve Sentences n Text Documents, Proceedngs o the 9th Annual Internatonal ACM SIGIR Conerence on Research & Development on Inormaton Retreval (SIGIR 06), Seattle, WA, 006a, pp. 44-51. N. Jndal and B. Lu, Mnng Comparatve Sentences and Relatons, Proceedngs o the 1st Natonal Conerence on Artcal Intellgence (AAAI-006), July 006b, Boston, MA, pp. 1331-1336. N. Kobayash, K. Inu, and Y. Matsumotto, Collectng Evaluatve Expressons or Opnon Extracton, Proceedngs o the Frst Internatonal Jont Conerence on Natural Language Processng (IJCNLP-04), Hanan Island, Chna, March 004, pp. 596-605. N. Kobayash, R. Ida, K. Inu, and Y. Matsumotto, Opnon Extracton Usng A Learnng-based Anaphora Resoluton Technque, Proceedngs o the nd Internatonal Jont Conerence on Natural Language Processng (IJCNLP-04), Jeju Island, Korea, October 005, pp. 173-178. B. Lu, M. Hu, and J. Cheng, Opnon Observer: Analyzng and Comparng Opnons on the Web, Proceedngs o 005 World Wde Web (WWW) Conerence, Chba, Japan, May 005, pp. 34-351. Y. Lu and C. Zha, Opnon Integraton through Sem-supervsed Topc Modelng, Proceedng o the 17th Internatonal Conerence on World Wde Web, New York, NY: ACM, 008, pp. 11-130. Q. Me, X. Lng, M. Wondra, H. Su, and C. Zha, Topc Sentment Mxture: Modelng Facets and Opnons n Weblogs, Proceedngs o the 16th Internatonal Conerence on World Wde Web 007, New York, NY: ACM, 007, pp. 171-180. A. Popescu and O. Etzon, Extractng Product Features and Opnons rom Revews, Proceedngs o the Conerence on Human Language Technology and Emprcal Methods n Natural Language Processng, Vancouver, Canada, 005, pp. 339-346. I. Ttov and R. McDonald, Modelng Onlne Revews wth Mult-gran Topc Models, Proceedng o the 17th Internatonal Conerence on World Wde Web, New York, NY: ACM, 008, pp. 111-10. P. Turney, Thumbs Up or Thumbs Down? Semantc Orentaton Appled to Unsupervsed Classcaton o Revews, Proceedngs o the 40th Conerence on Assocaton or Computatonal Lngustcs (ACL), Phladelpha, PA, 00, pp. 417-44. D. Wang, T. L, S. Zhu, and C. Dng, Multdocument Summarzaton va Sen- Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 87

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. tence-level Semantc Analyss and Symmetrc Matrx Factorzaton, Proceedngs o the 31st Annual Internatonal ACM SIGIR Conerence on Research and Development n Inormaton Retreval, Sngapore, 008, pp. 307-314. C. We, Y. Chen, C. Yang, and C. C. Yang, Understandng What Concerns Consumers: A Semantc Approach to Product Feature Extracton rom Consumer Revews, Journal o Inormaton Systems and E-Busness Management (8: ), March 010, pp.149-167. C. We, C. S. Yang, and C. N. Huang, Turnng Onlne Product Revews to Customer Knowledge: A Semantcbased Sentment Classcaton Approach, Proceedngs o the 10th Pacc Asa Conerence on Inormaton Systems, Kuala Lumpur, Malaysa, 006. J. Webe, R. Bruce, and T. O Hara, Development and Use o a Gold Standard About the Authors Chrstopher C. Yang s an assocate proessor n the College o Inormaton Scence and Technology at Drexel Unversty. He has also been an assocate proessor n the Department o Systems Engneerng and Engneerng Management and the drector o the Dgtal Lbrary Laboratory at the Chnese Unversty o Hong Kong, an assstant proessor n the Department o Computer Scence and Inormaton Systems at the Unversty o Hong Kong and a research scentst n the Department o Management Inormaton Systems at the Unversty o Arzona. Hs recent research nterests nclude socal meda analytcs, Web.0, securty normatcs, health normatcs, Web search and mnng, knowledge management, and electronc commerce. He has publshed over 00 reerred journal and conerence papers n Journal o the Amercan Socety or Inormaton Scence and Technology (JASIST), Decson Support Systems Data Set or Subjectvty Classcatons, Proceedngs o the 37th Conerence on Assocaton or Computatonal Lngustcs (ACL), College Park, MD, 1999, pp. 46-53. Y. Yang and J. O. Pedersen, A Comparatve Study on Feature Selecton n Text Categorzaton, Proceedngs o the 14th Internatonal Conerence on Machne Learnng, July 1997, pp. 41-40. C. Zha, A. Velvell, and B. Yu, A Crosscollecton Mxture Model or Comparatve Text Mnng, Proceedngs o the 10th ACM SIGKDD Internatonal Conerence on Knowledge Dscovery and Data Mnng, Seattle, WA, 004, pp. 743-748. Z. Zhang and B. Varadarajan, Utlty Scorng o Product Revews, Proceedngs o the 15th ACM Internatonal Conerence on Inormaton and Knowledge Management, Arlngton, VA, November 006, pp. 51-57. (DSS), IEEE Transactons on Systems, Man, and Cybernetcs, IEEE Transactons on Image Processng, IEEE Transactons on Robotcs and Automaton, IEEE Computer, IEEE Intellgent Systems, Inormaton Processng and Management (IPM), Journal o Inormaton Scence, Graphcal Models and Image Processng, Optcal Engneerng, Pattern Recognton, Internatonal Journal o Electronc Commerce, Appled Artcal Intellgence, ISI, WWW, SIGIR, ICIS, CIKM, and more. He has edted several specal ssues on multlngual normaton systems, knowledge management, Web mnng, socal meda, and electronc commerce n JASIST, DSS, IPM, and IEEE Transactons. He chared and served n many nternatonal conerences and workshops. He has also requently served as an nvted panelst n the NSF and other government agences revew panels. He can be reached at chrs.yang@drexel.edu. 88 Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010

Understandng Onlne Consumer Revew Opnons wth Sentment Analyss usng Machne Learnng / Yang et al. Xunng Tang s a Ph.D. canddate n the College o Inormaton Scence and Technology at Drexel Unversty. Hs research nterests are socal computng and Web mnng. He has publshed n IEEE Intellgent Systems, Annals o Inormaton Systems, IEEE Internatonal Conerence on Intellgence and Securty Inormatcs, ACM SIGKDD Workshops on Intellgence and Securty Inormatcs. Y.C. Wong receved a M.Phl. degree rom the Chnese Unversty o Hong Kong. She was a research assstant n the Dgtal Lbrary Laboratory. Her publcaton has appeared n the Proceedngs o the Internatonal Conerence on Electronc Commerce. Chh-Png We receved a BS n Management Scence rom the Natonal Chao-Tung Unversty n Tawan, R.O.C. n 1987 and an MS and a Ph.D. n Management Inormaton Systems rom the Unversty o Arzona n 1991 and 1996. He s currently a proessor o Department o Inormaton Management at Natonal Tawan Unversty. Pror to jonng Natonal Tawan Unversty n 010, he was a proessor o Insttute o Servce Scence and Insttute o Technology Management at Natonal Tsng Hua Unversty n Tawan and a proessor o Department o Inormaton Management at Natonal Sun Yat-sen Unversty n Tawan. He was also a vstng scholar at the Unversty o Illnos at Urbana-Champagn n Fall 001 and the Chnese Unversty o Hong Kong n Summer 006 and 007. Hs papers have appeared n Journal o Management Inormaton Systems (JMIS), European Journal o Inormaton Systems, Decson Support Systems (DSS), IEEE Transactons on Engneerng Management, IEEE Sotware, IEEE Intellgent Systems, IEEE Transactons on Systems, Man, Cybernetcs, IEEE Transactons on Inormaton Technology n Bomedcne, Journal o the Amercan Socety or Inormaton Scence and Technology, Inormaton Processng and Management, Journal o Database Management, and Journal o Organzatonal Computng and Electronc Commerce, etc. Hs current research nterests nclude normaton retreval and text mnng, knowledge dscovery and data mnng, knowledge management, multdatabase management and ntegraton, and data warehouse desgn. He has edted specal ssues o Decson Support Systems, Internatonal Journal o Electronc Commerce, Electronc Commerce Research and Applcatons, and Inormaton Processng and Management. He can be reached at the Department o Inormaton Management, Natonal Tawan Unversty, Tape, Tawan, R.O.C; cpwe@m.ntu.edu.tw. Pacc Asa Journal o the Assocaton or Inormaton Systems Vol. No. 3, pp.73-89 / September 010 89