Sentiment Analysis and Opinion Mining in Collections of Qualitative Data

Sentiment Analysis and Opinion Mining in Collections of Qualitative Data Sergej Zerr, Nam Khanh Tran, Kerstin Bischoff, and Claudia Niederée Leibniz Universität Hannover / Forschungszentrum L3S, Hannover, Germany zerr@l3s.de, NTran@L3S.de, bischoff@l3s.de, niederee@l3s.de Abstract. In social sciences, a tremendous body of data is being collected by observing or interviewing people. Such qualitative data forms a valuable source for later secondary research. One major challenge, though, is the preservation of privacy of the interviewees even after longer time periods of archival storage. Modern sentiment analysis techniques could help to judge the sensitivity of particular textual content and help the data provider to remove sensitive data from unauthorized eyes, thus reducing manual processing of large collections of primary material. Besides, mining opinions enables enhanced data access, e.g., by finding negative attitudes about a topic. In this paper we will describe properties of qualitative social science data with respect to sentiment analysis. We compare it to datasets used in the literature, identify main challenges, and provide directions for solving them. By discussing how to exploit state-of-the-art techniques to leverage the (secondary) exploration of archived qualitative data we hope to foster interdisciplinary dialogue. Keywords: digital humanities, qualitative data, sentiment analysis 1 Introduction The Sociological Research Institute (SOFI) in Göttingen (Germany) carried out a number of studies observing working situation in German automobile and shipyard industry after the rapid economic growth in post-world War II Germany - the so-called German economic miracle. Findings of these studies had a significant impact on the working situation in German industry. Intelligent access to this data would turn such data collection into a valuable source for secondary research, e.g., for longitudinal (meta)analysis or historical investigations. Within the scope of the project Gute Arbeit ( Good Work ) we are developing tools for enabling rich exploratory access to this data for secondary research. Reusing such sources is not only a challenging and time consuming task, e.g. regarding the selection of an appropriate subset, capturing context, etc. Moreover, behind each document there is a particular person whose privacy need to be respected by the data provider and secondary analyst and technically preserved by the data provider. Modern sentiment analysis (or opinion mining) techniques could help to judge the sensitivity of a particular document, paragraph, or even sentence and help the data provider to remove extremely sensitive

2 data from unauthorized eyes. For example, a highly negative statement about your own employer may be problematic when made somehow traced back, in particular once the interviewee climbed up the hierarchy in the very same enterprise. Moreover, since usually also the company is assured non-disclosure, overly critical statements may be especially harmful (besides of course confidential information). Second, for the secondary researcher those techniques could help to automatically find passages with interesting points of view on a particular subject and reduce manual processing of large collections of primary material. For example, our project is interested in how peoples concepts of good work evolved over the last decades. However our literature analysis revealed that due to the specificity of qualitative data, straight-forward application of state-of-the-art sentiment analysis tools is not always feasible even after modification. 2 Data Our corpus consist of qualitative data, in German language, from studies of the Sociological Research Institute (SOFI) in Göttingen. The data consists of a variety of (case) studies typically including worker interviews and observation at the workplace. It was collected within about 50 projects during a period of over 40 years, starting from the 60 s (i.e Volkswagen and German dockyard studies). one of the latest studies contains 41 interviews with individuals and groups of the vehicle manufacturing company Auto 5000, which was set up inside the Volkswagen complex in Wolfsburg, Germany in 2001. This lower cost model company was set up aiming at keeping manufacturing jobs in Germany instead of moving production to other areas of Europe. Interviews include, for example, the employment history of the formerly unemployed workers and engineers as well as topics like shift work, team work, or relations between regular Volkswagen and Auto 5000 employees. For comparison and illustration, we also use an English dataset, namely the case study on Changing Organizational Forms and the Re-shaping of Work [1]. Each case (some examples are: airlines, ceramics manufacturer, hotel services, etc.) has transcriptions or summaries of in-depth Face-to-face interviews conducted in England and Scotland between 1999 and 2002. Participants were managers and employees at all levels, sometimes also union representatives. Examples below are taken from these interviews. 3 Related Work and Challenges Dealing with qualitative interview data we face general challenges to sentiment analysis (see e.g. [2]) but find some peculiarities. For example, it is typically assumed that the subject (e.g., a YouTube video or market item) is known and that the sentiment can be estimated quite well already using simple vocabulary based techniques. In our dataset, however, indirect sentiment expressions are dominating and the vocabulary is less explicit and considerably less aggressive compared to the Web materials widely used in the literature. Instead

3 Fig. 1: The structure of opinions employers are dependent on their company and thus tend to express criticism rather subtile, or deliberately decide not to mention certain problematic topics, or to use reported speech. Often the sentiment can be only estimated after careful analysis of the aspects highlighted of the subject rather than on adjectives used to describe those. Fig. 1 summarizes the pattern structures we plan to detect in our data set. The Object is in our case an interviewee who expresses a specific opinions about a number of Subjects(also called opinion targets). A subject could be a person, specific item like a particular instrument or abstract concept and events. Each subject receives an opinion expression which can be either positive or negative (presented as +/-). In this section we will identify and discuss some of the challenges to be faced while extracting patterns described above. Detection of Subjective Expressions: User generated content is the major data source in literature about opinion mining. One property of such data is that a particular Web user is often hidden behind a virtual identity and behaves more freely than she would do in the real life. Generally, Web users are rarely concerned about careful selection of words and expressions. High precision in positive/negative sentiment analysis on such datasets is achieved not least due to explicit emotional adjectives (for example ugly, idiot vs. perfect, favorite [3]). In our studies the interviews were recorded face-to-face and the sentiment is often obscured. In following, we are using example sentences, extracted from our English dataset described in Section 2. There is a number of seemingly neutral expressions actually having a hidden positive or negative sentiment: The text (company rules) says it should be achievable but again the reality, the experience from some people has been otherwise. Sometimes an expression only appears subjective with respect to vocabulary without being it

4 (here the term good does not carry sentiment value): We are here to give them a service, clean their aircraft. it s got to have a good standard and quality of clean. Subject Identification: Typically state-of-the-art approaches assume that the document contains opinions on one main subject expressed by the author of the document (e.g. Product review, YouTube video etc.). In our case the subject(s) have first to be detected. For example the interviewee in one document can express opinions about multiple subjects such as colleagues, boss, company, family, government, etc. Moreover, a subject may be complex having different aspects. The authors [4] addressed the problem of target detection for French telephonic surveys and forum entries by developing a grammar using linguistic patterns like Target state Verb Adjective (e.g. My boss is great ). User opinions on events and impact of opinions in social Web over time was considered in [5], similarly, in our project we are interested in event descriptions and temporal opinion development analysis. Context Dependency: The expression It was cold in contexts of skiing weather and restaurant food would have completely different polarity [6]. Similarly, in different cases the same terms may also differ with respect to their degree of sentiment. The latter was considered in [7]. Indirect Sentiment: Just a vocabulary with positive/negative examples alone would not be sufficient when judging opinions. Sometimes it depends less on the expressed terms and more on the subject attributes being highlighted. Although in the literature direct and indirect attributes are distinguished [8], the impact on highlighting and omitting particular attributes was not yet considered. In order to stay polite people often speak mainly about positive aspects (e.g of the work) even if they are less important. Opinion Order: Although the expressions The work is hard, but the salary is high or The salary is high, but the work is hard share the terms as well as the topic, they are quite different in terms of sentiment. 4 Approach Step 1 - Rich Annotation Editor: In contrast to most datasets used in the literature, our dataset is missing any definite features, like favorite assignments, or (dis-)likes in Web2.0, that could directly be used for estimating the sensitivity degree of a document. This makes manual annotation of the dataset a necessity. To capture as many important properties as possible we are developing an annotation editor for gathering a high quality gold standard data. The annotator can read the source text on the left panel of the editor Fig.2(1). Selecting a piece of text and pressing new topic/concept (2) will create a new selection section. Here four buttons are present and active as soon as the annotator selects

5 Fig. 2: Annotation Editor some text in the left panel. Clicking on instance (3) will add the current text selection as a new instance of active subject (e.g., My Chef and My Boss ) and its particular aspect(4). Clicking on the (5) positive or negative button will add the selection as support for the corresponding sentiment. The corpora will be annotated by social scientists who were collecting and working with the data. The manual assessment of sentiment will serve as a gold standard/ground truth, which will be used as a training corpus for deriving models for automatic identification. Step 2 - NLP Analysis: First, the annotated set will be manually analyzed by the social scientists and a set of formal rules describing sentimental/neutral expressions will be defined. We will continue with the analysis using NLP tools and extract a set of further feature candidates like part of the speech, parse tree structure, typical idioms, etc. Finally we will conduct classification experiments and plot precision recall curves for evaluation of the feature selection. Especially we are interested to find out, to what degree we can automatically answer questions like What is a sentiment value of a particular document and Are there sensitive documents in the given set. Aggregation of polarity over different aspects and subject level granularity are particularly interesting issues. Step 3: Tools Development: Finally our goal is to implement a toolbox for estimating sentiment in qualitative data, apply those to our dataset and open parts of the archive to secondary research. Further analysis could provide insights on the average situation at the workplace with respect to sentiment expressions at given time points and make it comparable to other times and workplaces.

6 5 Conclusion In this paper we describe the directions for tackling the problem of sentiment analysis within corpora of qualitative research data. The challenge is first to detect subjective expressions given the absence of explicit, clearly sentimental vocabulary. In the next step the corresponding subjects need to be identified. Finally, the relations between sentiment degree of expressed opinions and the sensitivity of the documents needs to be analyzed. We plan to develop and evaluate corresponding tools as well as to apply those on an existing set of qualitative interviews within the German project Gute Arbeit. Acknowledgments The work was supported by the project Gute Arbeit nach dem Boom (Re-SozIT) funded by the German Federal Ministry of Education and Research (BMBF) under mark 01UG1249C within the ehumanities line of funding as well as by the European project ARCOMEM (GA270239). References 1. Marchington, M., Rubery, J., Willmott, H.: Changing organizational forms and the re-shaping of work : Case study interviews, 1999-2002 [computer file] (2004) 2. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1-2) (January 2008) 1 135 3. Siersdorfer, S., Chelaru, S., Nejdl, W., San Pedro, J.: How useful are your comments?: analyzing and predicting youtube comments and comment ratings. In: Proceedings of the 19th international conference on World wide web. WWW 10, New York, NY, USA, ACM (2010) 891 900 4. Goujon, B.: Text mining for opinion target detection. In: Intelligence and Security Informatics Conference (EISIC), 2011 European. (2011) 322 326 5. Maynard, D. Bontcheva, K.R.D.: Challenges in developing opinion mining tools for social media. In: @NLP can u tag usergeneratedcontent?! Workshop at LREC 2012, Istanbul, Turkey 6. Krestel, R., Siersdorfer, S.: Generating contextualized sentiment lexica based on latent topics and user ratings. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media. HT 13, New York, NY, USA, ACM (2013) 129 138 7. Stylios, G., Tsolis, D., Christodoulakis, D.: Mining and estimating users opinion strength in forum texts regarding governmental decisions. In Iliadis, L., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S., eds.: Artificial Intelligence Applications and Innovations. Volume 382 of IFIP Advances in Information and Communication Technology. Springer Berlin Heidelberg (2012) 451 459 8. Xiao, R.: Corpus creation. In: Handbook of Natural Language Processing. (2010) 385 403