A Preliminary Study of Comparative and Evaluative Questions for Business Intelligence

Transcription

1 2009 Eighth International Symposium on Natural Language Processing A Preliminary Study of Comparative and Evaluative Questions for Business Intelligence Nathalie Rose T. Lim, Patrick Saint-Dizier, Brigitte Gay, and Rachel Edita Roxas answer is not directly lifted from the source text. Instead, natural language text is constructed from the results of the processing. New types of questions like comparative and evaluative questions are targeted for research as indicated in [1]. It is of interest to study comparative and evaluative expressions (in questions) because of the challenges and issues associated with processing them. These include the following aspects: 1) Multiple styles: Comparative expressions may be expressed in different ways. They can be bipredicational expressions, cross-class comparisons, degree comparisons, and these can appear in nouns, verbs, adjectives, adverbs, and even implicitly denoted. For this research, the focus will be on degree comparisons and explicit and implicit denotations from nouns (like maturity from reached maturity), verbs (like win), adjectives (like good strategy), and adverbs (like fast in evolve fast), since these types appear in actual questions raised in the domain being considered. Degree comparisons refer to the extent of applicability of a certain comparative expression or predicate. Samples of which are the predicates better and active. 2) Inferencing synonymous terms: Determining the synonyms and what these entail is an issue not specific to comparative and evaluative QA, but also to general QAs. However, in certain domains, terminologies have different or more specific semantic meaning due to the context. For example, a hub entails different things depending on the context (eg., transportation hub where hub is a location versus transaction hub where hub is a company). 3) Accessing semantic dimension: The semantic dimension being referred to here is the list of quantifiable measures, properties, and criteria that are associated with the comparative expression or predicate. For example, expensive is associated with the property of cost. 4) Determining ranges and limits for comparison: Values of identified properties or criteria to be used for comparison between objects can be taken from various source texts. However, evaluation (not comparison) of certain criteria are more complex if there is no set standard of measurement and is dependent on the object being evaluated. Using expensive as the example, the ranges of values for determining if a book is expensive is different with Abstract Comparative and evaluative question answering (QA) systems provide objective answers to questions that involve comparisons and evaluations based on a quantifiable set of criteria. As evaluations involve inferences and computations, answers are not lifted from source text. This entails the need for correct semantic interpretation of comparative expressions, converting them to quantifiable criteria before data can be obtained from source text, processing these information, and formulating natural language answers from the result of the processing. As business intelligence (BI) requires comparisons and interpretations of seemingly unrelated facts, a QA system for this domain would be beneficial. This paper presents a study of some comparative and evaluative questions that are raised in the domain of business intelligence. How these questions are processed is also discussed. I. INTRODUCTION C ONSIDER the following questions: Which European companies had the most alliances in year 2008? and Did Company X take more risk than Company Y in the past year?. The first question is an evaluative question. An evaluative question involves the computation or evaluation of at least one property or criteria. In this case, the criteria are explicitly stated in the question (i.e., most alliances which is equivalent to the number of alliances). However, in many cases, the properties involved are not explicitly stated, as in the case of the predicate take-risk in the second question. The predicate take-risk would have to be broken down to the properties such as number of transactions and the types of partners. Basis and constraints are defined by an expert in business intelligence. Evaluations and computations can be done for different objects for comparison purposes. And this comparative question is depicted in the second example question above. Thus, comparative and evaluative question-answering (QA) involves inferences in terminology, determining the properties involved for evaluation, and computation and comparison before an answer can be given. As such, the Manuscript received August 12, N. Rose T. Lim is affiliated with both De La Salle University in Manila, Philippines and Universite Paul Sabatier in Toulouse, France. She can be reached through phone: (632) ; fax: (632) ; nats.lim@delasalle.ph. P. Saint-Dizier is with IRIT, France. ( stdizier@irit.fr). B. Gay is with Groupe ESC Toulouse in France. ( b.gay@esctoulouse.fr). R. Edita Roxas is with De La Salle University Manila in the Philippines. ( rachel.roxas@delasalle.ph) /09/$ IEEE 35

2 determining if a house is expensive. The next section discusses the domain of business intelligence (BI) and types of questions that can be raised. II. BUSINESS INTELLIGENCE Business intelligence (BI) is an area in business and economy which aims at identifying trends in business and in any kind of strategic development (e.g., research themes, political orientations) from thousands of seemingly isolated facts. The globalization of markets for technology as well as fast innovation diffusion through complex networks of business relationships have created a major competitive challenge to corporate leaders. Companies and governments are experimenting with new approaches to the management of business relationships. Corporate strategies involving mergers, acquisitions, spin-offs, and a plethora of alliances are creating smaller, decentralized operational units within and across the boundaries of companies or countries. One role of BI is to help them (companies and governments) understand and master their position in global industries [2]. Software tools can be used to facilitate the analysis that they need to make. This entails that information is processed and structured to display extracted entities and semantic relationships among them. Therefore, there are at least two types of software tools that are necessary. One is a graphical tool that displays coarse-grained information, e.g., all commercial links between companies or countries. This is a kind of radiography of a situation over a certain period (generally, a year) with thousands of links between nodes representing companies. The graphs allow spatial analysis to identify business units or alliances which are larger than just companies. Evolution over a few years is often of much interest. A number of software are now able to handle this, among which is Tetralogie [3]. The other type of tool involves a more fine-grained analysis based on knowledge base constructed from news and other data. It includes determining requirements from the question, extracting the information from the knowledge base as per requirement, and processing these to derive the answer. These may be implemented through database queries (e.g., via SQL statements). However, this implies that the set of questions that can be raised are predetermined. Also, queries are far less natural and user-friendly than human language and these do not allow generation of cooperative responses. Thus, a QA system is better suited to the needs of the users in this domain. Though there are several QA systems, questions mainly focus on factoid, definitions, or lists. Comparative and evaluative questions are seldom tackled [4],[5]. For this study, the corpus is the set of news relevant to biotechnologies from years 2004 to The source information revolves around transactions between companies. Thus, answers to questions are mainly based on information extracted and processed from these news articles. Other needed information not present in the news is extracted or derived from other web sources. From a basic question to compare company transactions like: Which companies have the most number of transactions?, there could be variations and additional constraints added to it. The following subsections list the different classifications of comparisons that may be combined to form a single comparative or evaluative question. A. Spatial Scope Questions may include spatial qualification of the company or the transaction. Sample stem questions could be in (but not limited to) the form of: 1) Which companies in Asia 2) Which transactions in Europe 3) Which cities in 4) Which country 5) Which continent B. Categorial Scope Companies are categorized into public or private and products involved in the transactions fall under certain sectors. Sample stem questions could be in the form of: 1) Which <category> companies 2) Which in the <sector name> sector C. Temporal Scope In BI, the temporal aspect is crucial. It specifies the scope of the analysis to be done. Sample stem questions could be in the form of: 1) in <year>? 2) from <start> to <end>? <start> and <end> may be exact dates, but usually is indicated as the inclusive years. The <end> may also be specified as present. Finally, it is also possible that the temporal scope is implicit depending on the criteria. D. Directly Quantifiable Criteria Questions that involve computations before the answer can be discerned may involve a combination of different directly quantifiable criteria, like: 1) Number of transactions (may be all transactions in general or a specific transaction) 2) Number of partners 3) Amount involved in the transaction 4) Number of products E. Non-Directly Translatable Criteria Some adjectives may be used to encompass a series of criteria. These have different semantic meanings and interpretations depending on the domain (or even the expert). Some terms include: 1) active (as in active companies) 2) stable (as in stable partners) 3) risky (as in risky transactions) 4) innovative (as in innovative products) 5) fast (as in fast evolution)

3 In the next section, we show some related works and studies on the semantic meaning of comparatives, applications involving comparative expressions, and QA related to BI. In section 4, a discussion on how questions are processed is presented. The discussion includes details of how comparative and evaluative expressions are categorized and interpreted. Lastly, we conclude with issues that we have considered and the research directions we plan to take. III. RELATED WORKS Comparisons may be in relation to properties within the same object, degree of comparisons of the same property between different objects, or different properties of different objects [6]. The properties at stake in the comparison are embedded in the semantics of the words in the question, and possibly in the context that comes with the question. To date, there is obviously no widely available lexical resource containing an exhaustive list of comparative predicates, applied to precise terms, together with the properties involved. These can possibly be derived, to a limited extent, from existing resources like FrameNet [7] or from an ontology where relationships between concepts and terms can be mapped. However, this is tractable for very simple situations, and in most cases, identifying those properties is a major challenge. Friedman [8] presents a general approach to process comparative expressions by syntactically treating them to conform to a standard form containing the comparative operator and the clauses that are involved in the comparison. Another approach would be to automatically extract comparative relations in sentences via machine learning. In [9], the approach used is to determine whether the expression is non-equal gradable, equative, or superlative. By identifying the type of expression, the type of comparison may be determined from the semantics of the predicate and the properties of the objects through the pairability constraints. What is missing is the exploration on semantic and conceptual issues and their dependence to context, users, and domains. Olawsky [10] attempts to study the semantic context by generating a set of candidate interpretations of comparative expressions. Then, the user is prompted to choose among these to specify his intent. Some QA systems, like [11], can handle comparative expressions including cross-class comparisons on a range of different domains. However, these involve having a different backend knowledge representation system and the frontend QA system has to be customized before it can answer queries in the new domain. In addition, both of these systems only consider comparisons based on quantifiable predicates (i.e., those measurable by count, mass, or value). Also, predicates with non-directly translatable properties that are dependent on domain or context, to our knowledge, have not been explored. On the domain of BI, MUSING [12] aims to use the semantic web and combine with rule-based and statistical methods for knowledge acquisition and reasoning for providing financial analysis complying with Basel II requirements. The presentation in terms of input expected (whether these are natural language questions and whether these involve comparative expressions) and the output to be generated are undisclosed from the available documentation. IV. PROCESSING COMPARATIVE AND EVALUATIVE QUESTIONS General QA systems involve the processes of question analysis, information retrieval, answer determination, and response generation. For comparative and evaluative QA systems, the processes are redefined. The question analyzer must identify the comparative expressions in the question and decompose it into meaningful constituents, among which are those properties that will be evaluated. When predicates are decomposed into properties, then pertinent information can be extracted from sources (either already stored in database or additional information is mined from the web) and evaluation can be done in the answer determination phase. The properties and the evaluation criteria or rules are specified based on definitions given by an expert. Since the answer is not lifted from the source text, the response generator is in-charge of producing natural language text from the resulting computation and evaluation results. The succeeding subsections outline the processing of the source texts, the question, and the interpretation of a selected set of comparative and evaluative expressions. A. Processing Source Text We are considering the set of economic news in biotechnologies as our main source of information. Each news article is between 80 and 200 words long and is written in English. An excerpt of a news article (from is as follows: IDDI (INTERNATIONAL DRUG DEVELOPMENT INSTITUTE) AND CYTEL INC. TODAY REPORTED ENTERING INTO A STRATEGIC TECHNOLOGY COLLABORATION. THE COMPANIES ARE COOPERATING TO DEVELOP INTEGRATED SYSTEMS AND SERVICES FOR THE RANDOMIZATION OF TREATMENT ASSIGNMENTS FOR PATIENTS PARTICIPATING IN CLINICAL TRIALS As can be seen, sentences are long and verb forms may be quite complex and indirect. Each sentence is composed of a main predicate pred, which serves as a head, and arguments arg to the predicate are defined by their thematic roles t. The argument may be a string of words representing a noun phrase, a prepositional phrase, or a clause. ROL(s) = { t(arg, pred) t {agent, theme, patient, goal, temporal, location, abstract-pos, amount} } Moreover, there exist rhetorical relations rel between sentences s i and s j that comprise the news article. This is also used to identify which among the sentences contain relevant information.

4 REL = { rel(s i, s j ) rel { nucleus, elaboration(focus), justification, underspecified} Thus, for the given sample news article, the text is split into sentences. Let us call the first sentence S1 and the second sentence S2. Then these sentences are represented as: REL ={nucleus(s1, S2), elaboration[companies](s2, S1)} ROL(S1) = {agent( [IDDI (International Drug Development Institute) and Cytel Inc.], collaborate), temporal(today, collaborate), theme([strategic technology collaboration], collaborate) } ROL(S2) = {agent([iddi (International Drug Development Institute) and Cytel Inc.], develop), goal([integrated systems and services for patients], develop), abstract-pos([clinical trials], develop)} Notice that in S1 and S2 instead of the predicates reported (or entering ) and cooperating, the main predicate used are collaborate and develop, respectively. This is because we are only concerned with predicates that are relevant to the transactions being reported in the news. Thus, the semantic dependency is simplified to model only those needed for the conceptual representation of the news. From the semantic representation of each sentence in the news, information is extracted to fill in the typed-feature structure (which is the conceptual representation of the news). It contains the following information: News Source Date Link Transaction TransCategory TransType Date Company (1..10) ContractedItem Such that is a complex type containing the LocString, City, State, Country, and Continent. The date consists of the month, day, and year. TransCategory and TransType are transaction categories and its transaction subtype. There can be at most ten companies. Each of the company information and the contracted item are complex types defined as follows: Company ContractedItem Name Item Sector Role Indication NewEntity Stage SubsidiaryOf Worth Category Not all the information that is stored into the typed-feature structure is available from one news article. Some processing has to be done. A set of inferencing rules is developed to retrieve and store the required information. For example, the news date is indicated, but not the transaction date. In this case, the date of the news is inherited as the transaction date. On other cases, information from other web source is used. An example would be for the case of location. The unprocessed location string (LocString) actually refers to the location of the companies involved. Identifying which of the companies is located in the first location and which is located in the second can be taken from other news sources or other sources like company profile (possibly from B. Processing Questions Information from the question should be extracted for proper processing. We need to identify the type of question (question type), what we are looking for (question focus), and what the conditions are in our search (constraints). In our approach, we represent these into the following semantic representation: Q(<QUESTION TYPE>, <QUESTION FOCUS>,<BODY>) <Question Type> indicates the type of question (whether it is superlative or comparative) and its arguments. An example set of arguments for the superlative type of question would be the number of results (many or single) and search criteria. The <Question Focus> refers to what is expected as a result. The <Body> is the semantic dependency of the question defined by the main predicate and the thematic roles of its arguments. For the sample question Which companies take the most risks?, the semantic representation of the question will be the following: Q(SUPERLATIVE(MANY, HIGHEST), COMPANY, TAKE-RISK(AGENT: COMPANY)) This semantic representation is not enough to come up with the appropriate answer. We need other information to represent the basis for the evaluation. Thus, an operational representation of the question is constructed. An example format (in this case, for the superlative type of question) is: <SUPERLATIVE>(<VARIABLE>, <EVENT>, <RESPONSE>) Where <Superlative> could be highest or lowest depending on the search criteria in the semantic representation, the <Variable> is the basis of the search criteria, <Event> is the key concept determined from the semantic dependency in the question, and <Response> is the expected answer. In the above sample question, the operational representation will be: HIGHEST(RISK, TAKE-RISK(AGENT: COMPANY), COMPANY) Here, the <Event> is similar to the <Body> because takerisk is included in the identified key concepts that we can interpret. Other terms like Which companies like to make risky investments? are also mapped to the take-risk concept. To facilitate mapping of questions to the answers, we have a typed-feature representation for the question containing the following features:

5 Question Question Type Number of Results Search Criteria Question Focus Search Constraints Such that the <Question Type>, <Number of Results>, <Search Criteria>, and <Question Focus> are taken from the semantic representation, the <Search Constraints> is a complex type defined below. The <Duration> is the temporal scope of the search, while the <> and the <Transaction> are complex types, defined similar to that of the news. Search Contraints Duration DateStart DateEnd Transaction For criteria or properties that are already in the conceptual representation, these are used in the evaluation and/or comparison. For the sample question Which Asian companies have the most number of transactions in year 2008?. The company involved in the transaction should be located in Asia and the date (or year) should be Since these are search constraints indicated in the question, mapping the representation to the entries in the typed-feature representation of the news would provide a short-list of matching entries. Then with the most number of transactions, it is a matter of counting the occurrences of a certain company and comparing the values to determine the top companies. C. Complex Terms Other criteria that are non-directly quantifiable are referred to as complex terms. For these, the lexical knowledge is consulted to identify the term s interpretation into quantifiable properties. For the take-risk example, the lexical knowledge represents a company that takes risks as one which is active, has transactions every year, have alliances every year with new and unstable partners. Take-risk(c) := Active(c) TransEveryYear(c) CompAllyEveryYearAndAlwaysNewPart(c) HaveStablePartners(c) Again, the definition could consist of more key concepts or terms, which have to be evaluated first. Eventually, the key concept is broken down into values or quantifiable measure that can be extracted from the typed-feature structure of the news article. For example, the condition Active is also a key concept, defined to be a company that has above mean transactions in the duration of the search constraint. It is formally defined as: Active(c) := (c, n) CompanyTrans n NumTrans / NumTrans CompanyTrans = { (Company1.Name: c, n) n = Transact(c) } Transact(c) = { (Transaction.Company1.Name : c, Transaction.Company2.Name: c 2, Transaction.TransCategory:t, Transaction.Date.Year: y, Transaction.ContractedItem.Item:p) } NumTrans = { n (c, n) CompanyTrans} To process a comparative question like Does Company X take more risk than Company Y?, each of these entities will be tested based on the constraints. For a superlative question Which companies take the most risk?, all companies will be tested and computations will be done to generate the top entities. D. Interpretation of Comparative and Evaluative Expressions Aside from active and take risk, other comparative and evaluative expressions have been studied from questions that can be raised in BI. The expression is studied from the predicate, identifying its basic properties, then looking at the nouns that it can modify, re-evaluating the properties if there are additional constraints or different constraints. From the study, the predicates are categorized into uni-dimensional, multi-dimensional, polysemous, and underspecified. 1) Uni-dimensional predicates: Some predicates have only one sense or definition. For example, expensive. It is essentially involving a high cost. In this case, the cost is the quantifiable property that we can use to evaluate or compare entities. However, for some uni-dimensional predicates, like innovative, it is difficult to quantify. Innovative is defined as characterized by or introducing something new [13]. In this case, we can look at the effect instead, i.e., something innovative is in demand. Thus, determining if a product is innovative would depend on the number of entities having an interest in it. And an innovative company is one with an innovative product. In this particular domain, the expert formally defines this as: Innovative(c) := i, i [1, m-1] y:year (c 1, c, t, x, y, p i ) SellTransact(c,y) p i = p i+1 SellTransact(c,y) 0.7 x n CompanyTrans-Per-Year SellTransact(c, y) = { (Transaction.Company1.Name : c 1, Transaction.Company2.Name: c, Transaction.TransCategory:t, Transaction.TransType:x, Transaction.Date.Year : y, Transaction.ContractedItem.Item:p) t = buy t = alliance (x = [exclusive licensing] x = [nonexclusive licensing]) } CompanyTrans-Per-Year = { (Company1.Name: c, Transaction.Date.Year: y, n) n = Transact-Per- Year(c, y ) } 2) Multi-dimensional predicate: Taking the example of

6 take-risk, it entails different dimensions from being conservative. Different aspects would then have to be considered. In the case of BI, this could be in terms of the amount of investments, types of products invested in, the partners being taken, or the overall strategy that is being employed. 3) Polysemous predicate: Many predicates have different senses and meanings. Taking the example stable, it is defined in [13] to have three meanings, namely: firmly established, the second is steady in purpose, and the third is capability to resist motion. Being able to identify which among these senses may depend on the noun that it is associated with or with the domain in question. 4) Underspecified predicate and metonymy: Underspecification refers to a general criteria associated with the predicate, but will gain (more) context only when associated with the noun it modifies. Assuming that we consider only sense of stable as being steady in purpose, it is still underspecified because the properties associated to this meaning still depends on the context. Even within the domain of BI, the criteria for evaluating a stable company are different from a stable partner, even if the partner is also a company. This also leads to the issue of metonymy. The nouns associated to the predicate represent a class of objects that hold various properties. For example, a company can be quantified by the number of employees, the number of transactions, the types of transactions, the investments that it makes, and so on. By associating the predicate stable with company, determining which of these properties is to be used in the evaluation of steady in purpose is a challenge. In this case, the constraints are provided by an expert. A stable company is defined as one which is active, may not have alliances every year or have alliances every year but always with old partners. Stable(c) := Active(c) AllianceEveryYear(c) AllianceEveryYear(c) OnlyOncePartners(c) AllianceEveryYear(c) := y: YEAR (c, y, n) CompanyAlliance CompanyAlliance = { (Company1.Name: c, Transaction.Date.Year: y, n) n = Alliance-Per- Year(c, y ) } Alliance-Per-Year(c, y) = { (Transaction.Company1.Name : c, Transaction.Company1.Name : c2, Transaction.TransCategory:t, Transaction.Date.Year:y, Transaction.ContractedItem.Item: p ) t = alliance } OnlyOncePartners(c) := y1, y2 CompanyAlliance(c, c2, y1) CompanyAlliance(c, c2, y2 ) y1 = y2 On the other hand, a stable partner is one which has alliances every year. And, a company has stable partners when it has alliances every year and always with new partners and the partners have alliances every year. HaveStablePartners(c) := CompAllyEveryYearAndAlwaysNewPart(c) AllianceEveryYear(c2) CompAllyEveryYearAndAlwaysNewPart (c) := OnlyOncePartners(c) AllianceEveryYear(c,y) V. CONCLUSION AND DIRECTIONS Comparative and evaluative expressions in the domain of BI are complex because there are intricacies to the terminology used in BI where the criteria are predefined. As can be seen in the example predicates, expressions can be based on one criterion, can be based on multiple criteria, and/or can be underspecified. Currently, there are at least ten basic comparative and evaluative expressions in questions studied. Each of which have several variations considering aspects of polysemy, metonymy, underspecification, and multiple criteria. More comparative and evaluative expressions are yet to be explored in the context of BI. Expected form and style of answers from these questions will be taken into consideration. The research will also explore techniques to automatically determine the properties which are at stake in the evaluation and to automatically determine limits, ranges, and relative values of these properties from on-line sources, so that the technique can be portable to other domains. Evaluation will be carried out eventually. However, it is crucial first to identify evaluation metrics and processes and to which components the metrics and processes will be applied to, as the evaluation is not so straightforward. REFERENCES [1] J. Burger, et al., Issues, Tasks and Program Structures to Roadmap Research in Question & Answering, [Online]. Available: www-nlpir.nist.gov/projects/duc/papers/qa.roadmap-paper_v2.doc [2] B. Gay and B. Dousset, Innovation and Network Structural Dynamics: Study of the Alliance Network of a Major Sector of the Biotechnology Industry, Research Policy,34(10), , Management Journal, Special Issue, 213, , [3] Tetralogie. [Online]. Available: [4] M. Maybury, New Directions in Question Answering, The MIT Press, Menlo Park [5] D. Moldovan, et al., The Structure and Performance of an Open Domain Question Answering System, in Proceedings of the 38 th Meeting of the Association for Computational Linguistics (ACL), HongKong, [6] C. Kennedy, Comparatives, Semantics Of, in K. Allen (section editor) Lexical and Logical Semantics; Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier, Oxford, [7] J. Ruppenhofer, et al. (2006). FrameNet II: Extended Theory and Practice. Available: [8] C. Friedman, A General Computational Treatment of the Comparative, in Proceedings of the 27 th Annual Meeting of the ACL,

7 1989. [Online]. Available: [9] N. Jindal and B. Liu, Mining Comparative Sentences and Relations, in Proceedings of the 21 st AAAI Conference on Artificial Intelligence, AAAI Press, USA, [10] D. Olawsky, The Lexical Semantics of Comparative Expressions in a Multi-level Semantic Processor, in Proceedings of the 27 th AnnualMeeting on ACL, USA, [11] B. Ballard, A General Computational Treatment of Comparatives for Natural Language Question Answering, in Proceedings of the 26 th Annual Meeting of the ACL, [Online]. Available: [12] MUSING Newsletter nr. 2 Spring [Online]. Available : spring-2009 [13] Merriam Webster Dictionary. [Online]. Available: