Thesis Proposal Using Argument Analysis to Define a Structure for User Generated Content

What is one step in the analysis of discussions?
What type of text is identified as being represented by a span of text?
Where is the research done on the subject of argument structure?

Transcription

1 Thesis Proposal Using Argument Analysis to Define a Structure for User Generated Content Clare Llewellyn E H U N I V E R S I T Y T O H F G R E D I N B U Doctor of Philosophy Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 2012

2

3 Abstract Conversation is how people compare ideas, solve problems and identify disagreements as part of daily life. The internet contains several good information sources which can provide facts and general information, but it is less obvious where to find meaningful discussions and arguments that are grounded with evidence. User generated content can be found in social networking sites, add comments sections of news articles, blogs or micro blogs and information collation sites such as Wikipedia. Any evaluations that occur in these discussions must be conducted by humans because any argument structure is not presented explicitly and therefore cannot be automatically analysed. If a user wishes to hear all sides of an argument it can be difficult to find all the right information. This work aims to automatically identify and extract the structure of arguments from user generated content. Thereby, allowing the users of this content to see the structured arguments contained within, with the claims, counter claims, and grounds. Presenting this structure can allow users to more easily navigate the content thereby enabling the user to become part of the conversation in a way that is arguably more useful to them. The steps in the process, and therefore the problems that this work aims to tackle, are the identification of discussions on different topics, the identification of spans of text that illustrate the discussion, classification of text into an argument structure so as to define the relationships between spans of text and to present this information to users. iii

4 Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Clare Llewellyn) iv

5 Table of Contents 1 Introduction The Problem Motivation Objective Background Argument Theory Argument Data Structures Applications Techniques for Identifying and Classifying Argumentation in Text Classifying Arguments Identifying Arguments Approach Work In Progress Data Sets Topic Detection Argument Analysis and Classification Conclusions from work so far Future Work Approach Plan Bibliography 51 v

6

7 Chapter 1 Introduction 1.1 The Problem The scientific method requires that empirical and measurable evidence can be found to support a hypothetical explanation of a phenomenon. The hypothesis can be used to predict the outcomes in a certain scenario, which can then be evaluated through evidence. Academics can test their theories through discussion, allowing the evaluation of a hypothesis with regard to evidence provided by others. Conversation, or discourse, is how people compare ideas, solve problems and identify disagreements as part of daily life. In traditional print media journalists, academics and other writers perform the tasks of collating and curating information to present in a structured way to consumers. These professionals summarise the debates and add in their own claims and evidence. The consumer of this content is then free to consult this evidence and determine their own beliefs. In the Internet domain there is an overwhelming amount of information available, some is collated and curated in this manner but much is not. There are several good information sources which can provide facts and general information but it is less obvious where to find meaningful discussions and arguments that are grounded with evidence. Tools are freely available that allow users to be both producers and consumers of content and to be part of the wider internet conversation. Users add data to social networking sites, they use the add comments sections of news articles, they blog or micro blog and they contribute to articles for information collation sites such as Wikipedia. Any evaluations that occur in these discussions must be conducted by humans because any argument structure is not presented explicitly and therefore cannot be automati- 1

8 2 Chapter 1. Introduction cally analysed. If a user wishes to hear all sides of an argument (the full argument space) it can be difficult to find all the right information. 1.2 Motivation The models of presentation in current user generated content systems often mean the context of a post is lost. Looking at information without this context can give a skewed version of the argument. Most user generated content systems allow users to comment in a linear way, presenting comments as a list with each addition being added to the top or the bottom of the list. Additionally many sites allow users to comment upon the comments of other users, thereby creating a thread like structure. Often this data is presented by time of addition, so the user sees the most recently added data first, or by a user rating system, often represented using votes for or against particular posts (comment systems) or by the reposting of information (blogs, Twitter). Unfortunately the speed with which comments are added is extremely variable and this can make it difficult to follow the conversations and for users to comment at the most appropriate locations. If a rating system is used, a post that references or quotes another may appear out of context and the point of the post may be obscured. Users are generating a large volume of data. The volume of data can be overwhelming therefore users may adopt the approach of only reading what they are interested in. This can lead to cyberpolarization where users with opposing opinions are not aware of the evidence from the opposing side (Faridani et al., 2010). Often the users only look at the most popular or the most recent content, depending on the display system used, and therefore can form a misleading impression of the overall discussion. The current thread style is no longer a good interface for interacting with this user generated data as it can lose the context and variety of the discussion. It is a generally held belief that users who post moderate, well-thought-out opinions can often be shouted down by extremists and discussions can thus end with neither side appreciating the other s point of view and even becoming insulting. This can lead to those with interesting opinions not posting as they fear they will either not be read or they will be shouted down and insulted (Faridani et al., 2010). It can be difficult for a user to determine the most useful location to add their own information so that it can be found by others. Allowing the users of content to see the structure of arguments, with the claims, counter claims, replies and grounds, would allow the users to navigate a large volume of content whilst retaining the context of

9 1.2. Motivation 3 the original post. It would enable users to see the full discussion space whilst allowing them to easily disregard comments that are not well thought out or are insulting. This type of structure could facilitate knowledge acquisition and enable the user to become part of the conversation in a way that is arguably more useful to them. There is no easy way to gain an overview of the opposing sides of a discussion. For example within academia a user must read many journal papers or blogs and have conversations with specialists in the areas they wish to investigate. Conversations can be conducted online through commenting on specific papers or blogs or identifying the experts through social networking tools and contacting them directly. A tool that analyses these conversations to determine the structure of the discussion would make the content easier to navigate and make these conversations more accessible, increasing the usefulness of this content and aiding scholarship. In essence, argument is a conversation which involves different points of view. The initial point of view is a claim and this claim can be countered with another claim. The claims are backed up with evidence which can also be referred to as grounds. An individual constructs an argument claim that represents their position, and in this they explain to others the evidence that has led to their position (the claim and the grounds that they use to support this). A counter claim challenges the original position, providing further evidence or information. This is intended to cause the composer of the original claim to re-evaluate their position and accept the counter claim. Those who have been involved in the argument have been made aware of other perspectives on the original problem thus allowing the participants to gain new knowledge that may change or confirm their perspective, encourage them to conduct further investigation, or they may use this evidence to solve other problems at a later date. It is proposed here that user generated discussions can be expressed as arguments with different points defined as claims, counter claims and evidence and that this structure can be automatically extracted from the content and presented to the user for further interaction. Presently, as discussed further in chapter 2, there are techniques that can be used to identify possible argument components in text, and there are tools that can be used to classify these components and standardised structures for storing and displaying argumentation. However these processes have not been integrated and used to solve the problems defined above that relate specifically to user generated content.

10 4 Chapter 1. Introduction 1.3 Objective The objective of this work is to automatically identify and extract the structure of arguments from user generated content, to display this structure and to help a user identify where they should add their own content. In Chapter 2 there is a discussion on current approaches for extracting arguments from user generated content. This work differs from other areas of argument analysis as it aims to extract arguments from within a stream of conversations from user generated content sources such as Twitter, comments on news articles and other media as yet undefined; therefore specific conversations must be identified and appropriate sections of text extracted. In addition, there is an intention to combine the extraction of the argument structure with tools available for visualisation of these arguments. These tools will be extended to enable the users to add their own content at appropriate locations. The proposed steps in this process are inspired by fields such as opinion mining. Bal and Saint-Dizier (2009) analyse product reviews to determine current public opinion on commercial products; this work is described in more detail in Chapter 2. The plan for the extraction of the data is adapted from the steps they define. The steps in the process, and therefore the problems that this work aims to tackle, are: 1. Identify discussions on different topics 2. Identify spans of text that illustrate the discussion 3. Classify into a structure so as to define the relationships between spans of text 4. Present this information to users The objective of this document is to briefly cover the current research that is relevant, to describe the experimentation that has taken place up to this point and to outline and provide a plan for the future direction of this work. Chapter 2 presents a brief overview of the field of argument theory. The current standards for argument structure and ontologies are surveyed and the applications which are based upon those structures are described. A review of the current literature that describes processing methods for extracting arguments from text has been conducted and is presented in section. This section describes both current machine learning techniques for classifying text into an argument structure and other approaches such as rhetorical structure theory and the theory of argument discourse markers that may provide techniques that will be applicable to this process. Chapter 3 describes the approach that is being taken to this work. The data set that is currently being used and the work that has been completed is described.

11 1.3. Objective 5 Chapter 4 describes the work that will be conducted in the next two years and a plan is presented for the successful completion of this project.

12

13 Chapter 2 Background The study of argumentation builds upon ideas from many disciplines including logic, dialogue theory, artificial intelligence, agent systems, linguistics, and argument theory. It has application in computer supported collaborative learning, the semantic web and artificial intelligence. In investigating argument analysis, ideas from argument theory, rhetorical structure theory and computer supported collaborative work are considered. 2.1 Argument Theory Argument theory is the evaluation of conclusions by logically reasoning over claims that are based on premises. In 2008 Douglas Walton defined three major types of argument (Walton et al., 2008), (Rahwan, 2008b): Deductive, if the premise is true the conclusion is true Inductive, argument supported by generalisations from empirical evidence Defeasible (or presumptive), the conclusions are plausible given the premise, but can be proved false by new evidence. Historically the deductive model was used to evaluate an argument; if the premise can be evaluated as true then the conclusion is therefore true. More recently there has been a shift to a more human type of evaluation of arguments where a degree of uncertainty in the conclusion is possible. This is because using deductive logic to evaluate arguments does not work well in certain situations. For example considering the premise that books are normally longer than 1 page, deductive reasoning could not evaluate this premise as true until all the books that will ever exist have been evaluated in order to determine if they are longer than a page. At this current time 7

14 8 Chapter 2. Background defeasible argumention methods are preferred when evaluating arguments. Defeasible argumentation occurs when the conclusion is accepted in the light of present evidence but can be changed if new evidence is provided, so that books are normally longer than one page can be evaluated as true until many books with a single page are found. A premise can be assumed to be true until there is something which disproves it. This allows reasoning in the absence of complete knowledge. In 1958 Stephen Toulmin noted that natural human argumentation differs from the logics in deductive arguments but shares the same basis of claims and evidence, and from this he concluded that formal logic was different from that used in academic discussion (Toulmin, 2003). He produced a model of this academic style of argumentation and founded the modern study of argumentation to reflect this. The Toulmin model of argumentation is applied to practical problems where reasoning is a process of testing ideas. A claim is provided then justified, for an argument to succeed a good justification of this claim must be made. This argumentation model is not as rigorous as those that use logic but provides enough evidence to allow rational acceptance (Walton et al., 2008). It is proposed here that this more natural description of argumentation can be identified in text and used to determine a structure for that content to make it easy to navigate the conversations in user generated content. The Toulmin model (Toulmin, 2003) of argumentation can be used to provide a structure for argument. This flexible approach is based upon human argumentation which reflects the nature of discussions found in user generated text. The scheme is made up from 6 components: 1. Claim - A conclusion to be evaluated 2. Ground - Evidence and data, used to support the claim 3. Warrant - A statement authorising the ground from the claim (to bridge the gap between claim and ground) 4. Backing - Credentials that certify the statement expressed in the warrant (when the warrant itself is not enough / credible) 5. Rebuttal - Recognising the restrictions which may legitimately be applied to the claim 6. Qualifier - Expressing the degree of force or certainty of the claim The aim of analysing text would be to identify the claims provided and to identify where those claims are justified. For an argument to succeed it should provide good justification of a claim. For this a claim, ground and warrant are always needed but a

15 2.1. Argument Theory 9 qualifier, backing and rebuttal are not essential. The process would analyse the text to initially identify the claim and ground and when these are found, attempt to find the other components listed above. Argument structure takes many forms from single premise arguments to linked, convergent, serial and divergent arguments. In a linked argument there are two or more premises that are dependent on each other and when combined they support the argument conclusion. In convergent arguments each premise individually supports the argument. With serial arguments, the conclusion of an argument acts as the premise for another. A divergent argument means that a premise can support multiple argument conclusions. These structures can also be combined to model complex argumentation (Rahwan, 2008b). Argument theory can be used to automate reasoning, the evaluation of arguments to determine what is true. Schemes are used to present the argumentation structure which can then be automatically evaluated. Walton has spent many years investigating and defining argumentation through a number of these schemes. He provides extensive descriptions of 96 argument schemes that are designed to represent patterns of human argument and reasoning (Walton et al., 2008). Each scheme has a set of critical questions that can be used to reason over the arguments to evaluate it and establish its strength. Walton also presents dialogue types to provide context for argument schemes. Each type has specific speech events. There are 6 types (Walton, 2000): 1. Persuasion - resolution of a conflict of opinion to resolve or clarify an issue 2. Negotiation - to make a deal that is satisfactory to all participants 3. Inquiry - to find or verify evidence in order to evaluate a hypothesis 4. Deliberation - to decide the best course of action in a practical situation / choice 5. Information seeking - to acquire or give information 6. Eristic - to fight and quarrel without any reasonable goal Obviously there are criticisms of these approaches. Many real arguments are composites of 2 or more of the dialogue types and of the various structures defined above (Rahwan, 2008b). A way to evaluate an argument, as proposed by Walton et al. (2008), is through the answering of critical questions. As these questions are dependent on each of the argument schemes, there are therefore many. This makes it hard to evaluate using this scheme (van Eemeren et al., 2008, 2012). Critics of Toulmins model highlight that it does not allow methods for reasoning over the argument in order to

16 10 Chapter 2. Background evaluate it through the proposing of critical questions, as in Waltons schemes for argumentation (Cartwright and Atkinson, 2009). In summary it is very difficult to reason and form conclusions automatically. Humans are much better at this type of task than machines. The most efficient approach to this problem may be a system that utilises machines for tasks such as the extraction and summary of the argumentation points and allow humans to evaluate these arguments. 2.2 Argument Data Structures The aim of structuring data is to provide a basic overview that allows users to more easily understand it. When this content contains a discussion or argument the best way to structure the data is to map it to an argument structure. This allows the user to see the main points of the discussion. The claims and the evidence associated with those claims give the user an indication of where the argument is strong and where it is weak. The points where the claims have weak evidence suggest the most likely places for the evolution of the argument. Much of the work in argumentation structure comes from the education domain where maps are made by students to define arguments on certain topics. These students discuss a given question and provide claims and counter claims and shape this information into a giant map which allows others to visualise the discussion. When this is done manually it can take a very long time to cover all views on a topic. For example, it is estimated that the production of an argument map describing the debate on whether computers think took 7,000 hours to complete (Metzinger, 1999). Semantic web technologies such as RDF and ontologies are currently used to define the basis for many existing argument systems. The currently available ontologies allow a feature rich representation. The expression of an argument in an ontological structure allows discourse structure and component relationships to be accessible to computation so that the data can be navigated and compared automatically. The data can be represented in XML, for example, the Argument Markup Language (AML), which is an XML based language for annotating claims and premises (Reed and Rowe, 2004; Rahwan, 2008a). Various semantic web models for argumentation are also available including: a hypertext based approach, the Issue-Based Information Systems (IBIS) ( 2012), a biological approach the SWAN/SIOC, which is an alignment of two ontologies the Semantic Web Applications in Neuromedicine (SWAN) and Semantically-Interlinked Online Communities

17 2.3. Applications 11 (SIOC) (Schneider et al., 2010; Schneider and Passant, 2012) and the ScholOnto ontology (Buckingham Shum et al., 2000) which is used to support debate in digital libraries. The Argument Interchange Format (AIF) is an interchange format has been proposed for these different ontologies (Rahwan, 2008a). AIF is an ontology of argument related concepts from several different schemes. It is expressed in the Web Ontology Language (OWL) which offers a rich feature set and allows inference using descriptive logics (Mcguinness and van Harmelen, 2004). Currently it only represents a limited set of argument concepts and must be extended to capture other argument formalisations and schemes. In this format an argument is made up of argument entities represented as nodes. Each node has attributes such as title, creator, creation date, certainty, acceptability. It is made up of I-nodes (information) and S-nodes (schemes).the I-nodes represent passive information such as claim, premise, grounds and the S-nodes are the patterns of reasoning. The schemes belong to a class and are classified into types (such as rule of inference scheme, conflict scheme and preference scheme). AIF can also be expressed using the Resource Description Framework (RDF) (Lassila et al., 1998) as AIF-RDF which although less expressive allows interaction with other RDF based technologies (Rahwan, 2008a). Although many of these formats share a similar underlying RDF structure they are not compatible with each other. There are some indications that AIF-RDF standard may emerge as the standard that is adopted in this domain, since it allows the representation of the largest number of argument schemes. But, the extensive functionality of AIF-RDF also means that users must identify the logical aspects of argumentation, and as this can be very complex, this may limit the uptake of the system. As there is no single unified ontology as yet it is difficult to commit to using a single ontological structure. The adoption of a standard would promote interoperability, but at this point there is a reluctance to commit to one for this work, but rather to remain flexible and adapt to the ontologies as needed for display purposes. 2.3 Applications Argument structure within content can be defined either manually or automatically. Internet tools have been developed that encourage users to take part in debates (Cartwright and Atkinson, 2009). Generally these tools required the user to add the structure as they add the content. Once added the content is held in a repository and visualisation

18 12 Chapter 2. Background techniques are used to express the structure. There are several examples of argument repositories; in general they do not interoperate with each other as they do not use the same underlying format, and the argument structures provided are either very simple or very complex depending upon the purpose they are designed for (Rahwan, 2008a). A shared ontology or format would be required to support integration of arguments from different tools, or an ontology mapping tool would be necessary to align these different formats. A brief review of some of the tools available are presented below. 1. Debateabase is from the International Debate Education Association, a not-forprofit organisation which promotes, amongst other things, critical thinking. It is a database of argument questions where the pros and cons of the arguments are listed. These pros and cons for each question are written by experts for those new to the debate to learn the arguments. There are for and against points for each debate and counter points to those points. The site is well presented, has a nice look and feel and good functionality, however the arguments themselves are confusing and the logic of the debate is not very clear ( 2. Truthmapping is a volunteer site funded by donations. Users create arguments by stating premises and claims, critiques and rebuttals can be added, the arguments can be chained together and linked to evidence added. The site is used to learn and practice the logic of debates. Therefore in the logic of the site, what is a premise, a claim and conclusion is very clear. The maps of the argument are good. ( 2012). 3. Araucaria is from the University of Dundee. It is a tool for analysing, reconstructing and diagramming arguments. Users load the text of an argument into the tool and then highlight pieces of text to use as the basis of their argument: the premises and conclusions. These are added to the argument structure in diagrammatical form. This tool can also be used to manually evaluate the argument and each claim can be assigned a level of good / bad or strong / weak. Some of the argument schemes from Walton can also be used. Arguments can be used to search the Araucaria database which looks for text, structure and scheme matching. AraucariaDB is a repository of arguments that can be searched using Xpath ( 2012; Reed and Rowe, 2004).

19 2.4. Techniques for Identifying and Classifying Argumentation in Text Paramenides from the University of Liverpool uses Waltons sufficient conditioning scheme for practical reasoning. It allows the public to post responses to political policy proposals; the proposals are structured using an argument scheme and the users can add in their views and evaluate the arguments ( parmenides/, 2012; Cartwright and Atkinson, 2009). 5. ClaiMaker from the Open University uses the ScholOnto ontology, which supports basic reasoning, to model users views of academic research papers. They have a database of claims made in papers which can be queried. The approach is based upon that taken in Compendium a tool also from the Open University, that enables concept mapping using the IBIS ontology (Uren et al., 2003). Each specific language is designed for a specific tool and therefore is closely tied to the tool. A standardised format for content or a common ontology or a way of mapping between the current structures would enable data to be shared amongst sites. 2.4 Techniques for Identifying and Classifying Argumentation in Text Classifying Arguments Machine learning tools have been used to automatically classify arguments (Witten and Frank, 2005; Rose et al., 2008). Machine learning algorithms require a good set of features that strongly predict the classes and a feature space that is as small as possible to reduce processing time. The following section looks at the different approaches, algorithms and features that have been used to identify argumentation in text. An example of an area where argument theory has been implemented is in the field of Computer Supported Collaborative Work (CSCW). It is used to monitor and assist in collaborative work and learning. Systems have been developed that are used by students in online debates; these use analysis of the argument structure to make sure students stick to the proposed topic areas and to identify interactions where prompts can be used to assist students. These prompts encourage the students to ground and warrant their claims and help them to come to productive conclusions. Rose et al. (2008) use argument theory to classify text in the CSCW domain. In their 2008 paper they describe how they identify features that can be extracted from the text that are useful in predicting types of discourse interactions. A framework is

20 14 Chapter 2. Background used to analyse the text based upon a classification scheme from Weinberger and Fischer (2006). This scheme contains two initial levels of argumentation: the micro-level of argumentation, an individual argument consisting of a claim supported by a ground with warrant possibly specified by a qualifier; and the macro-level of argumentation, argumentation sequences where single arguments are connected to create an argumentation pattern such as argument and counter-argument. The specific items of the Weinberger and Fischer (2006) classification system used: 1. Epistemic activity (formal argument quality) 2. Micro level of argumentation 3. Macro level of argumentation 4. Social modes of co-construction 5. Reaction to previous contribution 6. Reaction to script (prompt) 7. Quoted text The underlying aim of the work by Rose et al. (2008). was to automate the classification of textual data from computer supported collaborative learning tools into a coding scheme. This coding scheme was used to investigate how argumentative knowledge construction occurred in online discussions. The work used the TagHelper Tool; this is a corpus analysis environment built on top of the Weka machine learning toolkit. The text analysed comprised 250 discussions from online discussion boards. Human annotators split each message from the discussion board in to several spans of text which represent the different communication acts. The thread structure of the messages and the relationships between the text spans within each message were recorded for use by the machine learning algorithm. In total 1,250 coded text segments were extracted and classified by humans using the multidimensional coding scheme from Weinberger and Fischer (2006). Rose et al. (2008) considered two tasks: which text features worked well to distinguish classes and an exploration of the suitability of different supervised machine leaning algorithms for this task. The algorithms that were tested were Naive Bayes, Support Vector Machines and Decision Trees. As a baseline they classified using SVM, Naive Bayes and Decision Trees algorithms and the top 100 unigram features. In addition to this they tested 7 different sets additional of features: Unigrams

21 2.4. Techniques for Identifying and Classifying Argumentation in Text 15 Unigrams and line length Unigrams and part of speech (POS) bigrams Unigrams and bigrams Unigrams and punctuation Unigrams and stemming Unigrams and rare words removed Unigrams, line length, punctuation, and rare words removed The classification was evaluated by determining the level of agreement between the two classifications of the data, automatic and manual. They found that in general using the SVM algorithm gave the most agreement. There is an emphasis in the Rose et al. (2008) work on the selection of features that assist in classification. The features which are identified as useful are: punctuation, unigrams, POS bigrams, line length, containing a non-stop word, stemmed words, and rare words. They found that bigrams substantially reduced performance as each one was relatively rare. They found that with this data set the most predictive features are unigrams and punctuation. This work also investigated if the context of the text would prove useful in classification, this was investigated through the analysis of the relationship of the text spans to each other and their depth in the thread. Sequential learning techniques were used to establish the context of the span of text. It was thought that this may particularly help with certain types of spans of text, for example a reaction to previous contribution cannot occur in an initial position, a specific discourse act is set up by those preceding it. The depth of the thread that the text span appears in is included as a feature by coding it using a Finite State machine feature which has two states, Q0 as initial state at the top of the message and Q1 when a quoted span of text is encountered. It remains in this state until a prompt is reached indicating the span of text that related to something someone else has said. The results showed that context and depth features assisted in identifying Microlevel argumentation and depth features helped in identifying Macro-level argumentation and social modes of co-construction. It is suggested that this is because complex levels of argumentation like counter-examples are more likely to refer to already mentioned information whereas claims are not. The set of features as described by Rose et al. (2008) was extended by Abbott et al. (2011) to assist in recognising disagreement in online discussion forums. The focus of this work was on the recognition of argument structure by identifying agreement

22 16 Chapter 2. Background and disagreement. The machine learning toolkit Weka was used with Naive-Bayes and JRip algorithms. In addition to a base set of n-gram features, several meta-post features were used such as, poster id and time between posts. The experimentation with these meta post features gave a small increase in agreement between automatic classification and human annotation from 63% to 68%. This approach may be useful when investigating classifying text into argument structure: users tend quote each other in their comments to allow them to break down posts into individual points of argument which they can then respond to in isolation, and this meta-post information could be used to determine which is the quoted text and which is the new text and thereby differentiate between claims and counter claims. Mishne and Glance (2006) discuss disputative comments, where users post comments that disagree with the previous comments with the intention of provoking debates and arguments. The disputative comments were identified using a decision tree built from manually classified comments. The most predictive features were found to be: the use of question marks early in the text, the number of comments in the thread, and the polarity of the first sentence of the comment. They also found that phrases that occur often in debates such as I don t think that or you are wrong and the words not and but are strong features. Somasundaran et al. (2008); Somasundaran and Wiebe (2010) found other specific cue phrases that are strong features for use in this type of classification. Discourse markers that are strongly associated with pragmatic functions can be used to predict the class of content, therefore useful features include the presence of a known marker such as actually, because, but, i believe, i know. In general, it has been shown that it is difficult to vastly improve a baseline that uses only unigrams as features, although simple rules defined over the content can improve the features and therefore the classification. However, there is some indication that specific features, such as cue words or POS features, can be used to identify specific aspects of argumentation and in classifying the sentences into claims, grounds and qualifiers. These approaches will be explored to determine whether the classes taken from the Toumlin approach can be identified and extended. Teufel and Moens (1997, 2002); Teufel et al. (1999) discuss how argumentative classification of extracted sentences can be used to summarize research articles and provide a degree of context for the extracted sentences that is missing in other summarization approaches. They extract sentences from the articles to form a summary but keep the overall rhetorical structure, rather than picking the highest rated sentences.

23 2.4. Techniques for Identifying and Classifying Argumentation in Text 17 The rhetorical structure is used to aid extraction of text that describes specific aspects of a paper, whether the sentence represents amongst other things, a main goal of the article, a problem or a contrast to someone else s work. This research uses the CARS (Create a Research Space) model described by Swales (1990) for expressing how academics write papers. This model describes a framework of how, in general, authors of academic papers structure their writing by making specific moves such as pointing out weakness in others research. These moves can be identified using rhetorical markers but these markers mean different things depending on where they are in the text, the same marker might mean a different thing in a related work section as opposed to a conclusion. They identify what they call meta comments in the text, these are phrases like we have presented a method for to fill certain slots in the abstract (such as method). They use a Naive Bayes classifier to identify the sentences that fill a specific role in the summary to be created, such as a method role. This is done by extracting abstractworthy sentences and the classification of the rhetorical role for that sentence. They classify the text into basic and non-basic rhetorical categories which are taken from Swales CARS model (Swales, 1990). The basic categories are: Background, generally accepted statements Others, statements made by others Own, statements made by the author In addition to this there is: Aim, which can be used to categorise the entire text Textual, providing information on the structure of the text Contrast, comparing the authors work with the work of others Basis, the work that the author builds upon. The features that are used in the machine learning are quite complex and include features based on the structure, citation, syntactic, semantic and content aspects of the text. The features based on structure are created by determining the explicit structure of the text, such as position of a text span within the full text, the section, and the paragraph. Whether the text contains a citation (self or to others) is included as a feature. The syntactic features are based on tense and whether it contains modals, voice and negation. An example of the semantic features include the action type of the first sentence. Examples of content features are if the sentence contain key words as

24 18 Chapter 2. Background defined by tf/idf or words from the title. The results from this work were variable but there is a strong indication that the approach of considering the context of the rhetorical marker aided in the classification of the text spans Identifying Arguments Contrasting Idea Identification Argumentation in text can be discovered through identifying sentences that contain contrasting ideas. A contrasting idea is identified as a sentence that summarises a claim and counter claim. This can be used to identify a contrast between two previous points. Contrasting idea identification is used by Sndor and Vorndran (2009) to identify key sentences for use in highlighting key threads for reviews when pre-reviewing papers. They use natural language processing (NLP) through the rule-based dependency parser Xerox Incremental Parser (XIP) (Kaplan, 2006). XIP looks for sentences using gramma rules, these rules contain particular discourse functions (De Liddo and Buckingham Shum, 2010; De Liddo et al., 2012). The features of the key sentences are assigned by applying the concept-matching framework. It appears that this concept matching framework is made up of a database of sets of words which is manually curated to indicate specific aspects of the text, such as a summary. The words from this database are used as seeds and added to incrementally as data from a specific domain is processed, thereby reflecting the way points are discussed in a domain specific way. Once parsed the key sentences are defined as those that describe the main points of a paper, thus allowing a reviewer to evaluate the flow of the ideas within the article. The sentences identified do not describe concepts but instead argue about concepts. They have found that a sentence may not present enough context to the user and suggest that a larger text span may be useful. They also find that they get a very large number of false positives which they suggest may be overcome by expanding the text span beyond the sentence level. De Liddo et al. (2012) also use the XIP parser to aid in Contested Collective Intelligence, the theory of the processes involved in working with ideas that are not proven and can be contested. The contrasting ideas are extracted using XIP and used to analyse problems and aid critical thinking by exploring opposing or diverging opinions. Sentences in the text are labelled as contrasting ideas. Opposing ideas are matched and used for summarisation and contrast. An annotation and knowledge mapping sys-

25 2.4. Techniques for Identifying and Classifying Argumentation in Text 19 tem, Cohere, is used to model the contested ideas and present them to users. The two approaches presented here suggest that using a rule based parser based on discourse functions would be useful for identifying sentences that contain argumentation. Unfortunately the rules used by XIP are not in the public domain, therefore current approaches to discourse analysis are investigated in the next section Rhetorical Structure Theory Rhetorical structure theory involves the identification of the relations between spans of text and the clauses involved in the relationship. These relations are organised into a hierarchical structure (Mann and Thompson, 1987, 1988). Rhetorical semantic relations indicate how the spans of text are joined - the relationship between these spans. Schemes devised using this theory describes how spans of text can co-occur. There are differences in the way different groups precisely define these schemes and the terms used to define them, but the different schema can be aligned (Blair-Goldensohn et al., 2007). Rhetorical semantic relations could be used to identify related spans of text that could be classed as using argumentation structure such as claims, counter claims, premises and conclusions. The relations are described in different ways dependent on the scheme followed, but generally the type of relations that would be useful in argument analysis include those described as CONTRAST, CAUSE-EXPLANATION- EVIDENCE, CONDITION and ELABORATION. Marcu and Echihabi (2002) describe an approach to automatically define and locate these types of relation. CONTRAST and CAUSE-EXPLANATION-EVIDENCE are often signalled by cue phrases (Example 1) such as because or however or they can be expressed implicitly (Example 2). Detecting implicit expressions is more difficult and much of the work in this field is devoted to this. 1. Example 1: because of the expenses scandal there have been a spate of resignations in the government 2. Example 2: the government was once again beset by scandal. After several key resignations Marcu and Echihabi (2002) look for the relations using defined patterns or rules. Models of the rhetorical semantic relations are defined and identified in the text using a Naive Bayes classifier. The models were learned automatically from a mas-

26 20 Chapter 2. Background sive dataset. A large number of examples are identified from a corpus: for identifying CONTRAST sentences were extracted that were linked using the phrase but, CAUSE-EXPLANATION-EVIDENCE sentences were identified as those linked by because or thus. They are able to improve the quality of training examples using topic segmentation and syntactic heuristics to filter out training examples which are invalid. Sporleder and Lascarides (2008) also extend the work by identifying features based on POS and argument structure. They looked for linguistic cue phrases to identify, extract and label training data which is used in classification. This is thought to be suitable for dealing with relations that are difficult to determine using just cue phrases. Linguistic cue phrases are made up of textual features such as word stems, POS and tense. These are then used to expand upon the type of training data that can be used to classify the rhetorical semantic relations. They use Decision Trees, rather than Naive Bayes as the algorithm in the machine learning process. The features that are used in the classification are: Positional features: if the text is at the beginning or end of a paragraph Length of spans Lexical features: words which are not cue words but might indicate certain relations POS: particularly verbs, nouns and adjectives in spans Temporal features: tense and aspect measured by modality, aspect, voice and negation. Syntactic features: complexity measured by number of NPs, VPs, PPs, ADJPs and ADVPs Cohesion Features: measured by the number of pronouns, ellipses and if a text span ends in a VP ellipse. Blair-Goldensohn et al. (2007) have refined the approach using parameter optimisation, topic segmentation and syntactic parsing. The parameter optimisation included (amongst others): Laplace smoothing Using a vocabulary of 6400 stems: replacing all others with a pseudo-token. Removing the stoplist: as the words frequently in a stop list are useful in the model

27 2.4. Techniques for Identifying and Classifying Argumentation in Text 21 Using a minimum frequency cut off: any pair with a frequency of less that 4 is removed Blair-Goldensohn et al. (2007) also reiterate that topic segmentation is important as related instances are unlikely to cross topic boundaries. They defined a topic boundary as at least three intervening sentences or a paragraph break. Their system does not seem to work significantly better but there is some evidence that it may perform better on a larger corpus Global Rhetorical Moves As described in the classification section, Teufel and Moens (1997, 2002); Teufel et al. (1999) assigned sentences to argumentative roles to create and evaluate summaries of academic papers. The work is described as finding global rhetorical moves as opposed to rhetorical structure theory moves (Marcu, 1997) which are generally focused on local relationships. Text spans are identified which characterise the articles they come from, such as: unfortunately this work does not solve problem X. The emphasis of the Teufel and Moens (1997, 2002); Teufel et al. (1999) work is that a text span has different meaning depending where it is in the text, for example in a future work section the example sentence would express some reservations about the work presented in the article and in an early section of the text it would indicate that this work is trying to solve problem X. This approach could be applicable to argumentation. If in user generated content a text span such as the example above is presented towards the end of a user generated comment this type of phrase may indicate the user has expresses a reservation about the claim that they are making, what Toulmin would describe as a Qualifier, if it was towards the beginning of a comment it would indicate the following text is a counter argument to a previous claim. Teufel and Moens (1997, 2002); Teufel et al. (1999) restrict their work to academic writing as they think there are communication goals that are specific to this domain, but adapting this approach to various types of user generated content may provide interesting results Argument Discourse Markers Discourse markers or cue phrases link the section they are introducing and the section that precedes. They are generally made up of conjunctions, adverbs and prepositional

28 22 Chapter 2. Background phrases (Fraser, 1999). Argument discourse markers are words or phrases that link argumentative points. Tseronis (2011) describes three main approaches to describing argument markers: Geneva School, Argument within Language Theory and the Pragma-dialectical Approach. The Geneva School approach describes the various relations in language, in a similar fashion to rhetorical structure theory. There are 3 main types of markers/connective, organisation markers, illocutionary function markers (the relations between acts) and interactive function markers. Among the interactive markers there are 3 sub groups: a relationship between the argument and the master act, a relationship between the counter-arguments and the master act and a reformulation of the acts that proceed. Argument within Language Theory is a study of individual words and phrases. The words identified are argument connectors: these describe an argumentative function of a text span and change the potential of it either realising or de-realising the span. The adverbs and adjectives that accompany verbs and nouns change the argumentative force. These can therefore be used as cue words to identify argument connections and force. The Pragma-dialectical Approach looks at the context beyond words and expressions that directly refer to the argument. It attempts to identify words and expressions that refer to any moves in the argument process (Tseronis, 2011). In a similar fashion to the work of Marcu and Echihabi (2002) the approach is to create a model of an ideal argument and annotate relevant units. Tseronis (2011) analysed text to identify markers of argumentative moves. The 4 units are indications of: 1. stand points (confrontation stage) 2. challenge to defend a stand point (opening stage) 3. argument schemes and related discussions (argumentation stage) 4. maintaining or withdrawing a standpoint of doubt (concluding stage) Aspects of all of these approaches could be used to define rules that identify text spans which contain argumentative text and also do indicate the argumentative role that these text spans take such as claim or counter claim Additional Techniques In addition to identifying and classifying specific text spans that contain argumentation points, other aspects of the user generated content could provide cues as to how to identify and structure the appropriate information. These techniques may include