EMOTION DETECTION IN CUSTOMER CARE

Size: px
Start display at page:

Download "EMOTION DETECTION IN EMAIL CUSTOMER CARE"

Transcription

1 Computational Intelligence, Volume 59, Number 000, 2010 EMOTION DETECTION IN CUSTOMER CARE NARENDRA GUPTA, MAZIN GILBERT, AND GIUSEPPE DI FABBRIZIO AT&T Labs - Research, Inc. Florham Park, NJ USA {ngupta,mazin,pino}@research.att.com Prompt and knowledgeable responses to customers are critical in maximizing customer satisfaction. Such messages often contain complaints about unfair treatment due to negligence, incompetence, rigid protocols, unfriendly systems, and unresponsive personnel. In this paper, we refer to these messages as emotional . They provide valuable feedback to improve contact center efficiency and the quality of the overall customer care experience, which in turn results in increased customer retention. We describe a method that uses salient features to identify emotional in the customer care domain. Salient features in customer care related are expressions of customer frustration, dissatisfaction with the business, and threats to either leave, take legal action and/or report to authorities. Compared to a baseline system using word unigram features, our proposed approach significantly improves emotional detection performance. Key words: Boosting; Customer Service; Emotional ; Salient Features; Support Vector Machines; Text Classification. 1. INTRODUCTION is becoming the preferred communication channel for customer service. For customers, it is a way to avoid long hold times on call center phone calls and to keep a record of information exchanges with the business. For businesses, it offers an opportunity to best utilize customer service representatives by evenly distributing the work load over time, and for representatives, it allows time to research issues and respond to customers in a manner consistent with business policies. Businesses can further exploit the offline nature of this communication channel by automatically routing involving critical issues to specialized representatives. Besides direct concerns related to products and services, businesses can ensure that messages complaining about unfair treatment due to negligence, incompetence, rigid protocols and unfriendly systems, are always handled with care. Such , referred to as emotional , is critical to reduce churn i.e., retain customers who otherwise would have taken their business elsewhere, and, at the same time, they are a valuable source of information for improving business processes together with the quality of the overall customer care experience, which in turn results in increased customer retention. In service-oriented businesses, a large number of customer messages may contain routine complaints. While such complaints are important and are routinely addressed by customer service representatives, this work aims to identify a critical subset of such messages called emotional . We define emotional as customer messages to a provider complaining about specific aspects of product or service, where the severity of the complaints and the customer dissatisfaction could lead to loss of business or brand value, or litigation and penalties. Emotional may contain abusive and probably emotionally charged language, but we are mainly interested in identifying messages where, in addition to flames (Spertus, 1997), the customer included a description of the problem they experienced with the company providing the service. In the context of customer service, customers express their concerns in many ways. Sometimes they communicate their negative emotions or opinions by using phrases like disgusted, you suck and by using orthographic features like all upper-case words and repeated exclamation marks. In other cases, there is a minimum emotional involvement and they use factual phrases such as you overcharged, or take my business elsewhere. In many cases, emotional, opinion and factual components are actually present. In this work, we have identified eight different ways customers use to express themselves by phrases in emotional . Throughout this paper, these will be referred to as salient features. We have also identified three different orthographic ways customers use to express their emotions. These will be referred to as orthographic features. We treat identification of emotional as a text classification problem, and show that the salient and the orthographic features significantly improve the C 2010 The Authors. Journal Compilation C 2010 Wiley Periodicals, Inc.

2 2 COMPUTATIONAL INTELLIGENCE identification performance. Compared to a baseline system using only the word unigram features, our proposed approach resulted in an F-measure improvement from to in detecting emotional . The rest of the paper is organized as follows. In Section 2, we provide a summary of previous work and its relationship with our contribution. In Section 3, we explain why we view identification of emotional as text classification problem. Section 4 describes the rationale behind our choice of classification techniques. In Section 5, we illustrate our method for detecting emotional by using salient and orthographical features. In Section 6, we present a series of experiments demonstrating improvements in classification performance over a word unigram baseline. In Section 7, we conclude by highlighting the contributions of this work. 2. PREVIOUS WORK Extensive research has been done on emotion detection. In the context of human-computer dialogs, although richer features including acoustic and intonation are available, there is a general consensus (Litman and Forbes- Riley, 2004b; Lee and Narayanan, 2005) that lexical evidence can significantly improve the accuracy of emotion detection. Research has also been done in predicting basic emotions (also referred to as affects) within text (Alm et al., 2005; Liu et al., 2003). In Alm et al. (2005), to render speech with a prosodic contour conveying the emotional content of the text, one of 6 types of human emotions (e.g., angry, disgusted, fearful, happy, sad, and surprised) are identified for each sentence in the running text. Deducing such emotions from lexical constructs is a hard problem as evidenced by little agreement among annotators. A Kappa value 1 (Cohen, 1960) of was shown in Alm et al. (2005). Liu et al. (2003) have argued that the absence of affect laden surface features i.e., keywords, from the text does not imply absence of emotions, therefore they have relied more on commonsense knowledge. In our work also, the absence of emotion or opinion laden keywords or phrases, does not automatically make the message non-emotional. The presence of other salient features could still make the message as emotional. Furthermore, instead of deducing types of emotions and opinions in each sentence, we are interested in knowing if the as a whole conveys an emotional content or not. In addition, we are also interested in the intensity and cause of those emotions and opinions. There is also a body of work in areas of creating Semantic Orientation dictionaries (Hatzivassiloglou and McKeown, 1997; Turney and Littman, 2003; Esuli and Sebastiani, 2005) and their use in identifying sentiment 2 laden sentences and polarity (Yu and Hatzivassiloglou, 2003; Kim and Hovy, 2004; Hu and Liu, 2004) of those sentiments, i.e., whether those sentiments are positive or negative. While such dictionaries provide a useful starting point, their use alone does not yield satisfactory results. In Wilson et al. (2005), classification of phrases containing positive, negative or neutral sentiments is discussed. For this problem, they show high agreement among human annotators (Kappa of 0.84). They also show that labeling phrases with positive, negative or neutral sentiments only on the basis of presence of keywords from such dictionaries, yields a classification accuracy of 48%. An obvious reason for this poor performance is that word semantic orientations are context dependent. In addition to Wilson et al. (2005), Pang et al. (2002) and Dave et al. (2003) have attempted to mitigate this problem by using supervised classification methods exploring the role of the syntactic and semantic context in which the opinion words are expressed. They report classification results using a number of different sets of features, including word unigrams. Wilson et al. (2005) reports an improvement (from 63% to 65.7% accuracy) in performance by adding a host of features extracted from syntactic dependencies. Similarly, Gamon (2004) demonstrates that the use of deep semantic features along with word unigrams improve the performance. Pang et al. (2002) and Dave et al. (2003) on the other hand demonstrated that word unigrams provide the best classification results. This is in line with our experience as well and could be related to the sparseness of the data. We also used supervised methods to detect emotional . To train predictive models we used word unigrams and a number of binary features indicating the presence of words/phrases from specific dictionaries. There is also a body of work in classifying messages according to their speech acts (Searle, 1969). For example, Lampert et al. (2010) discussed a text classification approach for detecting messages 1 Kappa usually denoted by κ, is a statistical measure of inter-annotator agreement. In its computation chance agreement is accounted for. κ = 1 when there is complete agreement among the annotators, and κ 0 when there is no agreement. Kappa is negative only when the annotators agree less than by chance agreement. 2 In opinion mining literature often the word sentiment is used to refer to the opinions. Such opinions could be directly expressed through phrased like is not good or indirectly through phrased like it is depressing

3 EMOTION DETECTION IN CUSTOMER CARE 3 containing Request for Action. Like our work, they also use supervised learning to train classification models, but report a poor inter-annotator agreement (Kappa of 0.681). For this reason, Lampert et al. (2010) used only 505 unanimously agreed training instances. Similar to our work, they also show that, even with a small data set, highly discriminating features significantly improve text classification performances. While we use salient features, in Lampert et al. (2010) different regions of messages, called zones, are used to improve the classification performance. In the context of -based customer care, Busemann et al. (2000) propose a traditional classification approach to classify customer messages in one of the 47 categories defined for a specific domain. Although they address classification in customer care settings, their objective is not to detect emotional , but to identify the nature of customer requests to assist a representative to quickly compose the response. In their work, Busemann et al. (2000) use features extracted from text by shallow parsing and orthographic analysis. They also explored several machine learning algorithms and show that Support Vector Machines (SVMs) (Vapnik, 1998) perform best for their task. Spertus (1997) presents a system called Smokey which recognizes hostile messages and is quite similar to our work. While Smokey is interested in identifying messages that contain abusive or insulting language, i.e., flames, our research on emotional also looks for the reasons for such flames. Besides word unigrams, Smokey uses rules to derive additional features for classification. These features are intended to capture different manifestations of the flames; for example by noun appositions like you bozos, imperatives like get lost and insults like sick, idiotic. In Smokey, to extract these features, rules are applied on syntactic parses of the sentences. In our work we also used rules. However, because of inaccurate grammatical structures, typos and misused punctuation in , we used table look-up to derive salient features of emotional CLASSIFICATION VS. REGRESSION We have defined emotional as the customer s messages expressing relatively severe complaints and relatively high levels of dissatisfaction. Therefore, identification of emotional requires ranking the messages with respect to the severity of complaints and the level of customer dissatisfaction, and subsequently selecting the top ranking messages using some optimal thresholding criteria. Ordering by severity, may suggest to adopt an ordinal regression model trained using labeled, for instance, with a severity scale from 1 to 5. Higher ranks indicate higher gravity of complaints and level of customer dissatisfaction. We did not pursue this approach for the following reasons. (1) It is hard to get consistently labeled data to train regression models. As demonstrated in Pang and Lee (2005), although humans can discern more than one grade of positive or negative judgments, labeler agreement degrades when rank differences are small. (2) In our experience with a similar problem in the domain of restaurant rating prediction 3 discussed in Gupta et al. (2010a), classification performs as well as or better than regression. We believe the reason for this is that such ranking tasks are inherently classification tasks. Labelers first make a classification decision between low/high ranks. Then, they pick the rank from the lower/upper half of the rank range. In this process, annotators agreement is quite uniform when making their initial classification decision, but diverges around close ranking values. 4. CLASSIFICATION ALGORITHMS We experimented with Boosting (Schapire, 1999) and SVMs. We used BoosTexter (Schapire and Singer, 2000), an implementation of AdaBoost (Schapire and Singer, 1999), and LLAMA which among other machine learning algorithms has an implementation of SVMs. LLAMA is a highly-scalable machine learning toolkit developed at AT&T. Similar to LIBSVM (Chang and Lin, 2001), in LLAMA, SVMs are also implemented by using Platt s Sequential Minimal Optimization algorithm (Platt, 1999). Haffner (2006a) has shown that classification performances obtained on several applications by using LLAMA are identical to those obtained 3 Like the emotional problem, where severity of complaints and the level of dissatisfaction have to be ranked. In the restaurant rating prediction problem customers opinions about restaurant features are ranked on a scale of 1 to 5 using the customer provided textual review.

4 4 COMPUTATIONAL INTELLIGENCE by using LIBSVM. Nevertheless, LLAMA was preferred because it is an AT&T internally-approved software for building deployable and production-grade applications. To augment the results presented in Gupta et al. (2010b) using boosting, we experimented with SVMs since these algorithms represent two broad categories of large margin classifiers (Haffner, 2006b). A detail discussion of these algorithms can be found in the references cited above. Here we summarize their properties which we found useful for the analysis of our experimental results Boosting AdaBoost is a feature-based learning method. In this type of learning method, only a few features are considered necessary to express the classification model. To minimize overfitting, Adaboost uses l 1-norm for regularization during training. As a consequence, many features are assigned to zero weights and are ignored in the learned models. BoosTexter uses word ngrams as features. Therefore, it is particularly suited for text classification and has been successfully applied to numerous text classification applications (Gupta et al., 2005) at AT&T. Boosting builds a highly accurate classifier by combining many weak base classifiers, each one of which may only be moderately accurate. In BoosTexter, weak classifiers are essentially tests on a feature value, e.g., testing for the presence of a word ngram. Boosting constructs the collection of base classifiers iteratively. On each iteration t, the boosting algorithm supplies the base learner weighted training data and the base learner generates a base classifier h t. Set of non negative weights w t encode how important it is that h t correctly classifies each training example. Generally, training examples that were most often misclassified by the preceding base classifiers will be given the most weight to force the base learner to focus on the hardest examples. BoosTexter uses confidence rated (Schapire and Singer, 1999) base classifiers that for every example x (in our case it is the customer message) output a real number h(x) whose sign (-1 or +1) is interpreted as a prediction (+1 indicates emotional message), and whose magnitude h(x) is a measure of confidence. The output of the final classifier is f(x) = P T t=1 ht(x), i.e., the sum of confidence of all classifiers ht built iteratively over t iterations. We map the real-valued predictions of the final classifier onto a confidence value between 0 and 1 by a logistic function; conf(x = emotional message) = e f(x). This provides a convenient mechanism to set the rejection threshold for detection of emotional . To minimize overfitting in BoosTexter, the number of iterations, i.e., the number of base classifiers generated, must be selected using validation data. This is what makes BoosTexter a feature-based learning method. In the final model, only the features used in base classifiers are considered useful Support Vector Machines SVMs (Support Vector Machines) is an example-based learning method where weights to only a few examples called Support Vectors, are considered necessary to express the classification model. SVMs use l 2- norm for regularization of the models during training. As a consequence all input features are assigned non-zero weights and are present in the final model. Given a set of M training input vectors X = {x 1,..., x M } and their corresponding target labels Y = {y 1,..., y M } where each y i { 1, 1}, SVMs, in their simplest form, learn linear classifiers characterized by the hyperplane w T x = 0 4 in the input feature space. A test example that lies on the left/right side of this hyperplane respectively i.e., w T x < 0 or w T x > 0 is classified as negative/positive respectively. SVMs look for the separating hyperplane (w T x = 0) which maximizes separation margin between two hyperplanes separating positive examples i.e., w T x = 1 and the negative examples i.e., w T x = 1. The margin is the minimum distance between these two hyperplanes in the direction of the weight vector w, and its value is 2 P Nk=1 w 2 k Training examples falling on the positive and negative hyperplanes are called support vectors. So far we have described hard margin SVMs, where classification errors on the training set are not permitted. SVMs are generalized as soft margin SVMs, which allow some errors, i.e., vectors that are inside or on the wrong side of the margin. In such cases classification errors can be quantified as max `0, 1 y iw T x i, and the learning. 4 To include a bias term the input vector x is extended to include an extra element with value 1.

5 EMOTION DETECTION IN CUSTOMER CARE 5 process entails minimization of the following loss function. MX L = max 0, 1 y iw T x i + 1 NX wk 2 (1) C i=1 Notice the second term in this equation, which is supposed to maximize the margin, is the regularization term which allows preference to simpler models and avoids over training. This constrained optimization problem can be solved in its dual form where the weight vector are expressed as a combination of training support vectors: MX w = α iy ix i (2) i=1 The α i, are Lagrange multipliers. They parametrize the weight vectors in dual form and can be used to compute the classifier output as w T x = P M i=1 αiwt x i. More generally, in the dual representation, only computations of inner products between input vectors are needed (without ever needing an explicit representation of the weight vector w). This property of the dual form of SVMs also allows learning of higher order models tractable. It also enables, under some conditions, expansion of feature the space by using relations among the original features. We used this feature of SVMs to experiment with polynomial kernels. Another feature of the SVM implementation in LLAMA, which is important for our work, is the ability to assign different weights to different sets of features. For example, for two sets of features x 1 and x 2, if one assigns weights a 1 and a 2 respectively, then a linear classification model corresponding to the following relationship is learned. y = sign a 1(w1 T x 1 x 1 ) + a2(wt 2 k=1 «x 2 x ) 2 Here individual feature vectors are normalized to fall on the unit sphere. As shown in Herbrich and Graepel (2002), for linear classifiers this results in the optimal SVMs with respect to much tighter bounds on the margins. We also used this feature of LLAMA to explore and report the nature of the feature space. For test examples, SVMs output their distance from the dividing hyperplane. This distance reflects the confidence in the classifier output. In our experiments, we used this confidence value to establish acceptance/rejection thresholds on the classifier output. (3) 5. EMOTIONAL DETECTION USING SALIENT FEATURES Emotional is a reaction to perceived excessive loss of time and/or money by customers. Expressions of such reactions in are salient features of emotional . For in the context of customer service we have identified the 8 features listed below. While many of these features are of a general nature and can be present in most customer service related emotional , in this paper we make no claims about the completeness of this list. (1) Negative Emotions: Customers express their negative emotions explicitly by phrases like it upsets me or I am frustrated. Customers also express such emotions using the orthographic features listed in the next subsection. (2) Negative Opinions: Customers express their negative opinions about the company by evaluative expressions like dishonest dealings or disrespectful. These could also be insulting expressions like stink, suck and idiots. Sometimes a clear distinction between expressions of Negative Opinions and Negative Emotions is difficult. For example in sentence That is insulting to me, insulting can be interpreted both as an opinion and an emotional expression. (3) Business Threats: Customers threaten to take their business elsewhere by expressions like business elsewhere or look for another provider. These expressions are neither emotional nor evaluative opinions, but they explicitly refer to cancelling a service or a major change in their relationship with the business. (4) Report Threats: Customers threaten to report to authorities using phrases like federal agencies or consumer protection. These are domain dependent names of official agencies. The mere presence of such names implies customer threat. (5) Legal Threats: Customers threaten to take legal action by using phrases like seek retribution or lawsuit. These expressions may also not be emotional or evaluative in nature.

6 6 COMPUTATIONAL INTELLIGENCE (1) You are making this very difficult for me. I was assured that my <SERVICE> would remain at <CURRENCY> per month. But you raised it to <CURRENCY> per month. If I had known you were going to go back on your word, I would have looked for another Internet provider. Present bill is <CURRENCY>, including <CURRENCY> for <SERVICE>. (2) I cannot figure out my current charges. I have called several times to straighten out a problem with my service for <PHONENO1> and <PHONENO2>. I am tired of being put on hold. I cannot get the information from the automated phone service. FIGURE 1. samples: 1) emotional; 2) neutral (6) No Justification: Customers try argue that the company has no justification for treating them the way it did. A common way of doing this is by saying things like I am a long time customer, or a loyal customer, etc. The semantic orientations of most phrases used to express this feature are positive. Therefore, simply looking for negative opinions to label an message as an emotional will not be appropriate, unless more sophisticated analysis is carried out to identify the opinion holders and the targets of the opinions (see Wiebe et al. (2005)). (7) Disassociation: When customers are not happy about a service or think that they have been wronged, they tend to disassociate themselves from the company. They express this disassociation by using phrases like you people, your service representative, etc. These are analogous to the rule classes Noun Appositions and Second-Person respectively as described in Spertus (1997). (8) Causes: There is always a reason for an emotional message. Customers typically use verb phrases like grossly overcharged, on hold for hours, etc. to describe what was done wrong to them. Many such phrases contain customer opinions. We distinguish them from Negative Opinions using the following criteria. a) Generally Causes are verb phrases and Negative Opinions are adjectives or noun phrases. b) Targets of Negative Opinions are objects or concepts of general nature, while those of Causes are more specific. For example, dishonest dealings is an expression of Negative Opinions and threatening collection calls of Causes. When such a distinction is not possible, we consider the presence of both features. While labeling the training data (see Section 6 for details), labelers look for salient features within the message and evaluate the severity of complaints and level of customer dissatisfaction. For example, the first fragment of message in Fig. 1 contains several salient features: a) difficult for me expressing Negative Emotions 5 ; b) you raised it expressing the Causes; c) looked for another Internet provider expressing Business Threats. It is labeled as emotional because the customer complaint is severe to the point that the customer may cancel the service. The second message fragment in Fig. 1 also contains several salient features: a) tired expressing Negative Emotions; b) cannot get information and problem with my service expressing Causes. It is not emotional because the customer perceived loss is not severe to the point of service cancellation. This customer would be satisfied if she/he receives the requested information in a timely fashion. In addition to the word unigrams, we also use eight binary valued salient features of emotional for training/testing the emotional classifier. To extract salient features from an , eight separate lists of phrases customers use to express each of the salient features were manually created. If any of the phrase(s) from a list is present in a message then the corresponding salient feature is set to 1 otherwise it is set to 0. These lists are extracted from the training data and can be considered as basic rules to identify emotional . In the labeling guide for emotional , labelers were instructed to look for salient features in the and keep a list of encountered phrases. Excerpts of these lists are shown in Table 1. We manually enriched these lists by: a) using general knowledge of English, by adding variations to existing phrases and b) searching a large body of messages (different from that used for testing) for different phrases in which seed words from known phrases participated. For example, from the known phrase lied to we used the seed word lied and found the phrase blatantly lied. Using these lists, we extracted eight binary salient features for each message, indicating the presence or the absence of phrases from the corresponding list in the . 5 Difficulties cause sadness.

7 EMOTION DETECTION IN CUSTOMER CARE 7 Salient feature Negative emotions Negative opinions Business Threats Report Threats Legal Threats No Justification Disassociation Total Counts Example phrases 61 emotional distress, hassle, hate, powerless, unhappy, upset, sick and tired, pissed off, infuriating, aggravation, ashamed, disgusted, frustrated, feel strongly, drive me nuts, nightmare, outraged, pain, makes me angry, last staw, insulting 158 abuse, arrogant, atrocious, bullshit, clueless, not helpful, worst way possible, worst service, worse, wrongful, unreasonable, unhelpful, unfriendly, unbelievable, stupid, stinks, ridiculous, rude, predatory pricing, outrageous, liar, negligent, moron, insulting, not competent, absolutely disgusting, fraudulent 44 another company, business elsewhere, move my account, reconsidering my provider, discontinue, cancelling everything, changing my phone company, leave you, close my account 22 bbb, better business bureau, chamber of commerce, utility board, federal agencies 13 attorney, illegal, suing, retribution, litigation, lawsuit, lawyer 12 long time customer, loyal customers, faithful, good customer, have your service since 14 you folks, your customer service, your service representatives, your representative, your company, your people, your service, you people, you bozo, your kind, your so-called Causes 77 bait and switch, coporate greed, double talk, duped, empty promises, excuses, extortion, false advertising, false promises, go back on your word, not honor your word, on hold for hours, never showed up, lied, overcharged, playing games, punish, rake me over the coals, rectify, rigamarole, rip us off, run around, screwed up, shortchanged, swindled, threaten, violated, threatening collection calls, paying more TABLE 1. Example phrases for different Salient Features In Table 2 we show the distribution of each salient feature in messages labeled as emotional in the training data. As can be seen although salient features have a higher tendency to occur in emotional , they frequently also occur in non-emotional Orthographical Features Besides direct expressions of their emotional state in words, users also use other orthographical means to express their emotions. In this work, we have used three such orthographical features in messages. (1) Repeated punctuations: Users tend to use repeated question marks or exclamations to express their frustration and anxiety. An example is shown in Fig. 2. In our implementation we consider the presence of this feature in a message only if it contains more than 1 instance of such repeated punctuations. (2) Uppercase words: Use of some words in upper case in a message is typically interpreted as shouting. An example is shown in Fig. 3. For this work we considered the presence of this feature if the number of upper case words in the message is more than three but less than 50%. (3) length: In general, we observed that emotional messages are not short. Users with severe complaints and a higher degree of dissatisfaction tend to write longer messages.

8 8 COMPUTATIONAL INTELLIGENCE Salient feature Number of occurrences In Emotional Negative Emotions % Negative Opinions % Business Threats % Report Threats % Legal Threats % No Justification % Disassociation % Causes % TABLE 2. Distribution of salient features in emotional in training data I was just wondering how many s and telephone calls does it take to actually get my voic activated????? Do I have to switch to <COMPANY1> or <COMPANY2> to get this resolved???? I have spent about 3 hours of my time either on hold, ing, or explaining to my associates why they can t leave a message... Just wondering and VERY frustrated!!!!! FIGURE 2. samples: Using repeated punctuations to express emotions Sir: Today, I woke up to turn on the TV and there was NOTHING of the channels I had chosen AVAILABLE. My home phone has been DISCONNECTED! My correspondence to the receipt of a Disconnection Notice was NOT ADDRESSED by you AT ALL. So, a few minutes ago I made payment of <CURRENCY> ONLY. I am REQUESTING that my <SERVICE> REMAIN DISCONNECTED as I don t need it. I m ALSO REQUESTING that the <SERVICE> BE TERMINATED, the box be removed from my home, and the antenna be removed and replaced with my <COMPANY> ANTENNA. This was handled badly and I SURELY DON T WANT IT REPEATED. Thanks for nothing FIGURE 3. samples: Using upper case words to express emotions In our data we did not find many emoticons. Therefore, we did not use them as features. This is not surprising because intuitively people are not inclined to put funny symbols in highly charged, argumentative messages EXPERIMENTS AND EVALUATION We performed several experiments to compare the performance of our emotional classifier. For these experiments, we labeled 620 messages as training examples and 457 messages as test examples. These examples were randomly selected from a corpus of over two million customer . Training examples were labeled independently by two different labelers 7 with a relatively high degree of agreement among them. Kappa value of was observed. Compare this with Kappa of reported for emotion labeling tasks (Alm and Sproat, 2005; Litman and Forbes-Riley, 2004a). Because of the relatively high agreement among these labelers, we did not feel the need to check the agreement among more than 2 labelers. Table 3 shows that emotional messages are about 12-13% of the population. Due to the limited size of the training data we used a cross validation (leave-one-out) technique on the test set to evaluate the outcomes of different experiments. In this round-robin approach, each example from the test set is classified using a model trained on all remaining 1076 (620 plus 456) examples. Test results on all 457 test examples are averaged. Throughout all of our experiments, we computed the classification performance of detecting emotional 6 Noted by one of the anonymous reviewers of this paper 7 One of the labelers was one of the authors of this paper and the other also had a linguistic background.

9 EMOTION DETECTION IN CUSTOMER CARE 9 Set Number of examples Emotional Training % Test % TABLE 3. Distribution of emotional using precision, recall and F-measure. Specifically, we used balanced F 1 score giving equal weights to recall and precision. One could also use other evaluation criteria depending on recall and precision trade-offs. Notice that for our test data, a classifier with majority vote has a classification accuracy of 87%, but since none of the emotional are identified, recall and F-measure are both zero. On the other hand, a classifier which generates many more false positives for each true positive, will have a lower classification accuracy but a higher (non-zero) F-measure than the majority vote classifier. Fig. 4 shows precision/recall curves for different experiments using boosting. The black circles represent the operating point corresponding to the best F-measure for each curve. Actual values of these points are provided in Table 4. As a baseline experiment, we used word unigram features to train a classifier model. Word unigram features were extracted by selecting only those words in the that appeared at least 3 times in the training data. Word matching was done in case insensitive manner. The graph labeled as Unigram Features in Fig. 4 shows the performance of this classifier. The best F-measure in this case is only Obviously, this low performance can be attributed to the small training set and the large feature space formed by word unigrams. Recall Precision F-measure Word Unigram Features Rule based: Thresholding on Salient Features counts Salient Features Word Unigram & Salient Features Word Unigram, Salient & Orthographic Features Word Unigram & Random Features TABLE 4. Recall and precision corresponding to best F-measure using boosting 6.1. Salient Features The baseline system was compared with a similar system using salient features. First, we used a simple classification rule that we formulated by looking at the training data. According to this rule, if an message contains 3 or more salient features it is classified as an emotional . We classified the test data using this rule and obtained an F-measure of (see row labeled as 3 in Table 4). Since no confidence thresholding can be used with a deterministic rule, its performance is indicated by a single point marked by the gray circle in Fig. 4. This result clearly demonstrates the high utility of our salient features. To verify that the salient feature threshold count of 3 used in our simple classification rule is the best, we also evaluated the performance of the rule for the salient features threshold counts of 2 and 4 (rows labeled as 2 and 4 in Table 4). In our next set of experiments, we trained a classifier model using salient features with and without word unigrams. Corresponding cross validation results on the test data are annotated in Table 4 and in Fig. 4 as Salient Features and Word Unigrams & Salient Features, respectively. Incremental improvement in best F- measure clearly shows: a) BoosTexter is able to learn better rules than the simple rule of identifying three or more salient features; and b) even though salient features provide a significant improvement in performance,

10 10 COMPUTATIONAL INTELLIGENCE FIGURE 4. Precision/Recall curves for different experiments using boosting. Large black circles indicate the operating point with best F-measure there is still discriminative information in unigram features. A direct consequence of the second observation is that the detection performance can be further improved by extending or refining the phrase lists and/or by using more labeled data to exploit the discriminative information in the word unigram features. To test the contribution of orthographic features of we also added them in the feature set. However, as shown in the row annotated as Word Unigram, Salient and Orthographic Features in Table 4, detection performance deteriorated. With boosting this was a counter intuitive result. We will discuss the possible reasons for this at the end of this section. Salient features of emotional are the consequence of our knowledge of how customers react to their excessive loss. To empirically demonstrate that eight different salient features used in identification of emotional do provide complementary evidence, we randomly distributed the phrases in eight lists. We then used them to extract eight binary features in the same manner as before. Best F-measure for this experiment is shown in the last row of Table 4, and labeled as Word Unigram & Random Features. The degradation in performance (more than 10% absolute in F-measure) of this experiment clearly demonstrates that salient features used by us provide complementary and not redundant information Results using SVMs To compare the results obtained from a feature based learning algorithm 8 with an example based learning algorithm, we repeated our experiments using SVMs. Table 5 shows the results obtained from linear SVMs and Table 6 shows the results obtained by using second order polynomial SVMs. Results from second order 8 We also experimented with MaxEnt classifier, which like boosting is also a feature based learning algorithm, and provided results similar to boosting.

11 EMOTION DETECTION IN CUSTOMER CARE 11 polynomials show a similar trend, but are not as good as those of linear polynomials. We believe this may be due to the data sparseness problem. Recall Precision F-measure Word Unigram Features Salient Features Word Unigram & Salient Features Word Unigram, Salient & Orthographic Features TABLE 5. Recall and precision corresponding to best F-measure using linear SVMs Recall Precision F-measure Word Unigram Features Salient Features Word Unigram & Salient Features Word Unigram, Salient & Orthographic Features TABLE 6. Recall and precision corresponding to best F-measure using second order polynomial SVMs From these tables one can observe that the performance of both linear and second order polynomial SVMs are slightly inferior to boosting. However, adding orthographic features improves their performance for both linear and second order polynomial SVMs. Our results show that salient features are more discriminative of emotional messages than the unigram words in the text. This implies that while modeling, more weights must be given to salient features than to the word unigram features. Recall that boosting is feature-based and automatically gives more weights to more discriminating features. Furthermore, since boosting uses l 1-norm for regularization, in BoosTexter the number of features considered is the same as the number of iterations of boosting. Other features are completely ignored. To avoid overfitting, the results of the boosting experiments reported in Table 4 are obtained by tuning the number of iterations on a development data. SVMs, on the other hand, are example based and use l 2-norm for regularization. All features of support vector examples are used (see equation 2). This could be one reason why the detection performance of SVMs in our experiments is inferior to boosting and why addition of orthographic features are helpful. In the former case many noisy features are being considered by SVMs while in the latter case many discriminatory features being ignored by boosting. Similar to boosting where the number of iterations needs to be optimized, the weight assignments to different feature sets must also be tuned in SVMs (see equation 3) to obtain the best performance. We carried out experiments with different weight settings (parameters a 1 and a 2 in equation 3) for salient features and unigram word features using linear SVMs. Fig. 5 shows the best F-measure obtained by using the linear SVMs for different ratios of weights for salient and unigram word features. Best performance is obtained when weight on salient features is more than 2 times that on unigram word features. The best detection performance (F-measure of 0.746) is obtained by using both salient and orthographical features along with unigram word features. This amounts to a 42% improvement over the baseline F-measure obtained by using only unigram word features. 7. CONCLUSIONS Customer messages complaining about unfair treatment are often emotional and are critical for businesses. They provide valuable feedback for improving business processes and coaching agents. Furthermore,

12 12 COMPUTATIONAL INTELLIGENCE FIGURE 5. Best F-measure obtained by using linear SVMs for different weights settings for salient features and for unigram word features careful handling of such helps to improve customer retention. In this paper, we presented a method for emotional identification. We analyzed the messages in the context of customer service and identified 8 salient features. We demonstrated higher agreement between two labelers labeling emotional using the presence/absence of salient features in the messages. We also demonstrated that extracting salient features and certain orthographical features from messages and using them to train classifier models results in significantly better models, even when very little training data is used. Compared to a baseline classifier which uses only the word unigrams features, the addition of the salient and orthographical features improved the F-measure from to We compared and analyzed the classifier performances using two different categories of large margin classifiers, i.e., feature based (boosting) and example based (SVM), and demonstrated that SVMs perform better than boosting when a set of features (salient and orthographic features) are more discriminating than others (unigram word features). Labeling large amount of training and test data necessary to build accurate models for identifying emotional messages is very expensive. This work demonstrates that the model building process can be bootstrapped by using the domain knowledge (list of phrases that make up the salient features) obtained by analyzing small amount of data. This seed knowledge can then be used in unsupervised or semi-supervised manner to automatically extract more such knowledge from a large corpus of customer . In our current work we are focusing on this problem by leveraging publicly available semantic orientation dictionaries, and by syntactic pattern matching. 8. ACKNOWLEDGMENT The authors thank Patrick Haffner for his help with text classification, particularly on his guidance on how to leverage features in LLAMA to obtain the best classification models. The authors also thank Amanda Stent for reviewing this paper and providing valuable suggestions and corrections.

13 EMOTION DETECTION IN CUSTOMER CARE 13 REFERENCES Alm, C. and Sproat, R. (2005). Emotional sequencing and development in fairy tales. In Proceedings of the First International Conference on Affective Computing and Intelligent Interaction. Alm, C. O., Roth, D., and Sproat, R. (2005). Emotions from text: machine learning for text-based emotion prediction. In HLT 05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages , Morristown, NJ, USA. Association for Computational Linguistics. Busemann, S., Schmeier, S., and Arens, R. G. (2000). Message classification in the call center. In In Proceedings of the sixth conference on Applied Natural Language Processing, pages Morgan Kaufmann. Chang, C.-C. and Lin, C.-J. (2001). Libsvm a library for support vector machines. Software available at cjlin/libsvm. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW, pages Esuli, A. and Sebastiani, F. (2005). Determining the semantic orientation of terms through gloss classificaion. In Proceedings of 14th ACM International Conference on Information and Knowledge Management (CIKM- 05), pages , Bremen, DE. Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data large feature vectors and the role of linguistic analysis. In Proceedings of COLING 2004, pages , Geneva, Switzerland. Gupta, N., Tur, G., Hakkani-Tür, D., Banglore, S., Riccardi, G., and Rahim, M. (2005). The AT&T Spoken Language Understanding System. IEEE Transactions on Speech and Audio Processing, 14(1), Gupta, N., Di Fabbrizio, G., and Haffner, P. (2010a). Capturing the stars: Predicting ratings for service and product reviews. In Procceedings of the Workshop on Semantic Search at North American Chapter of the Association for Computational Linguistics (NAACL-2010), Los Angeles, CA. Gupta, N., Gilbert, M., and Di Fabbrizio, G. (2010b). Emotion detection in customer care. In Proceedings of the HLT-NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, USA. Haffner, P. (2006a). Fast transpose methods for kernel learning on sparse data. In Proceedings of the 23rd international conference on Machine learning, pages ACM New York, NY, USA. Haffner, P. (2006b). Scaling large margin classifiers for spoken language understanding. Speech Communication, 48(3-4), Hatzivassiloglou, V. and McKeown, K. (1997). Predicting the semantic orientation of adjectives. In Proceedings of the Joint Association of Computational Linguistics and European Chapter of Association of Computational Linguistics Conference (ACL/EACL), pages Herbrich, R. and Graepel, T. (2002). A pac-bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48(12), Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages Kim, S.-M. and Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of the International Conference on Computational Linguistics (COLING). Lampert, A., Dale, R., and Paris, C. (2010). Detecting s containing requests for action. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 10, pages , Morristown, NJ, USA. Association for Computational Linguistics. Lee, C. M. and Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), Litman, D. and Forbes-Riley, K. (2004a). Annotating student emotional states in spoken tutoring dialogues. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue (SIGdial), Boston, MA. Litman, D. and Forbes-Riley, K. (2004b). Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting of the Association for Compuational Linguistics (ACL), Barcelone, Spain. Liu, H., Lieberman, H., and Selker, T. (2003). A model of textual affect sensing using real-world knowledge. In IUI 03: Proceedings of the 8th international conference on Intelligent user interfaces, pages , Miami, Florida, USA. ACM Press.

14 14 COMPUTATIONAL INTELLIGENCE Pang, B. and Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Association for Computational Linguistics (ACL), pages Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79 86, Philadelphia, Pennsylvania. Platt, J. C. (1999). Using analytic QP and sparseness to speed training of support vector machines. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, Cambridge, MA. MIT Press. Schapire, R. (1999). A brief introduction to boosting. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). Schapire, R. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), Schapire, R. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2), Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press. Spertus, E. (1997). Smokey: Automatic recognition of hostile messages. In In Proc. of Innovative Applications of Artificial Intelligence, pages Turney, P. and Littman, M. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21(4), Vapnik, V. N. (1998). Statistical Learning Theory. John Wiley & Sons. Wiebe, J., Wilson, T., and Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39, Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In HLT 05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages , Morristown, NJ, USA. Association for Computational Linguistics. Yu, H. and Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

Emotion Detection in Email Customer Care

Emotion Detection in Email Customer Care Emotion Detection in Email Customer Care Narendra Gupta, Mazin Gilbert, and Giuseppe Di Fabbrizio AT&T Labs - Research, Inc. Florham Park, NJ 07932 - USA {ngupta,mazin,pino}@research.att.com Abstract Prompt

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Robust Sentiment Detection on Twitter from Biased and Noisy Data Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research lbarbosa@research.att.com Junlan Feng AT&T Labs - Research junlan@research.att.com Abstract In this

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Sentiment analysis for news articles

Sentiment analysis for news articles Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Anindya Ghose, Panagiotis G. Ipeirotis {aghose, panos}@stern.nyu.edu Department of

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS

FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS Gautami Tripathi 1 and Naganna S. 2 1 PG Scholar, School of Computing Science and Engineering, Galgotias University, Greater Noida,

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Crowdfunding Support Tools: Predicting Success & Failure

Crowdfunding Support Tools: Predicting Success & Failure Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining. Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 mchang21@uiuc.edu http://flake.cs.uiuc.edu/~mchang21 Research

More information

Sentiment-Oriented Contextual Advertising

Sentiment-Oriented Contextual Advertising Sentiment-Oriented Contextual Advertising Teng-Kai Fan, Chia-Hui Chang Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320, ROC tengkaifan@gmail.com,

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

Sentiment Classification. in a Nutshell. Cem Akkaya, Xiaonan Zhang

Sentiment Classification. in a Nutshell. Cem Akkaya, Xiaonan Zhang Sentiment Classification in a Nutshell Cem Akkaya, Xiaonan Zhang Outline Problem Definition Level of Classification Evaluation Mainstream Method Conclusion Problem Definition Sentiment is the overall emotion,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Capturing the stars: predicting ratings for service and product reviews

Capturing the stars: predicting ratings for service and product reviews Capturing the stars: predicting ratings for service and product reviews Narendra Gupta, Giuseppe Di Fabbrizio and Patrick Haffner AT&T Labs - Research, Inc. Florham Park, NJ 07932 - USA {ngupta,pino,haffner}@research.att.com

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Fine-grained German Sentiment Analysis on Social Media

Fine-grained German Sentiment Analysis on Social Media Fine-grained German Sentiment Analysis on Social Media Saeedeh Momtazi Information Systems Hasso-Plattner-Institut Potsdam University, Germany Saeedeh.momtazi@hpi.uni-potsdam.de Abstract Expressing opinions

More information

Sentiment Analysis for Movie Reviews

Sentiment Analysis for Movie Reviews Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH 1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,

More information

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko

More information

Classification of Virtual Investing-Related Community Postings

Classification of Virtual Investing-Related Community Postings Classification of Virtual Investing-Related Community Postings Balaji Rajagopalan * Oakland University rajagopa@oakland.edu Matthew Wimble Oakland University mwwimble@oakland.edu Prabhudev Konana * University

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

WHITE PAPER Speech Analytics for Identifying Agent Skill Gaps and Trainings

WHITE PAPER Speech Analytics for Identifying Agent Skill Gaps and Trainings WHITE PAPER Speech Analytics for Identifying Agent Skill Gaps and Trainings Uniphore Software Systems Contact: info@uniphore.com Website: www.uniphore.com 1 Table of Contents Introduction... 3 Problems...

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

WE DEFINE spam as an e-mail message that is unwanted basically

WE DEFINE spam as an e-mail message that is unwanted basically 1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir

More information

Microblog Sentiment Analysis with Emoticon Space Model

Microblog Sentiment Analysis with Emoticon Space Model Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

Semantic Data Mining of Short Utterances

Semantic Data Mining of Short Utterances IEEE Trans. On Speech & Audio Proc.: Special Issue on Data Mining of Speech, Audio and Dialog July 1, 2005 1 Semantic Data Mining of Short Utterances Lee Begeja 1, Harris Drucker 2, David Gibbon 1, Patrick

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

A Survey on Product Aspect Ranking Techniques

A Survey on Product Aspect Ranking Techniques A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Mining Opinion Features in Customer Reviews

Mining Opinion Features in Customer Reviews Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

Designing Novel Review Ranking Systems: Predicting Usefulness and Impact of Reviews

Designing Novel Review Ranking Systems: Predicting Usefulness and Impact of Reviews Designing Novel Review Ranking Systems: Predicting Usefulness and Impact of Reviews Anindya Ghose aghose@stern.nyu.edu Panagiotis G. Ipeirotis panos@nyu.edu Department of Information, Operations, and Management

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Trading Strategies and the Cat Tournament Protocol

Trading Strategies and the Cat Tournament Protocol M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

How To Learn From The Revolution

How To Learn From The Revolution The Revolution Learning from : Text, Feelings and Machine Learning IT Management, CBS Supply Chain Leaders Forum 3 September 2015 The Revolution Learning from : Text, Feelings and Machine Learning Outline

More information

Mimicking human fake review detection on Trustpilot

Mimicking human fake review detection on Trustpilot Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Semantic Sentiment Analysis of Twitter

Semantic Sentiment Analysis of Twitter Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Semantic parsing with Structured SVM Ensemble Classification Models

Semantic parsing with Structured SVM Ensemble Classification Models Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Comparative Experiments on Sentiment Classification for Online Product Reviews

Comparative Experiments on Sentiment Classification for Online Product Reviews Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui Department of Computer Science School of Computing National University of Singapore cuihang@comp.nus.edu.sg Vibhu

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information