Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Size: px
Start display at page:

Download "Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance"

Transcription

1 Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson, TX, 75080, USA {bixler, moldovan, Keywords: Information Sharing and Collaboration, Search and Retrieval, Novel Intelligence from Massive Data, Knowledge Discovery and Dissemination, Information Sharing and Collaboration Abstract Analysts are constantly overwhelmed by large amounts of data which lack meaningful or useful structure. LCC is working on two tools which help to alleviate this problem, Jaguar and Polaris. The technical contributions of each of these tools, namely automatic extraction of semantic relations, automatic ontology construction, and metrics to evaluate ontology quality, as well as experimental results are discussed. 1. Introduction Intelligence analysts are constantly plagued with an overabundance of information. Individual analysts approach this problem in a variety of ways, using organizational methods which work on a small scale but do not lend themselves to interoperability with methods used by other analysts. Even these methods do not solve the problem, as analysts can only handle a tiny amount of the information available to them. Unfortunately, many of the clues and answers they are looking for reside in the vast amounts of information left untouched, and even the information they do have at their disposal lacks many of the data bridges which could help drive inferences and hypotheses. LCC has been developing two tools which will help these problems by enabling technologies such as those which leverage prior and tacit knowledge: Question Answering (QA), Information Extraction (IE), and Summarization. These two tools are Polaris, a semantic parser, and Jaguar, an automatic ontology builder. Both Polaris and Jaguar operate automatically on text, allowing an analyst to perform other tasks while these tools run in the background. The end result of Jaguar (which uses Polaris in its processing) is automatically generated, semantically rich, domain-specific ontologies which analysts can use while working on a task related to a domain or set of domains. These ontologies can capture data specific to a given analyst as well as data for broader use, allowing analysts to keep their own specific knowledge while being able to share and exchange information with other analysts in an efficient, streamlined fashion. The ontologies and semantic clusters can also be integrated with other tools to boost their accuracy and performance. 2. Motivation Analysts lack tools which can assist them in higher modes of critical thinking, but it is these tools which analysts need to improve analysis on complex issues [Heuer]. One method is to structure the information in a way which is easy to understand and allows the analyst to be more efficient. More information, however, is not necessarily better. Many psychological studies have demonstrated that accuracy generally increases very little, if at all, as more information is given to an expert; what is needed is "more truly useful information" [Heuer]. Since analysis tends not to improve with more information, it is important that the information that is used is the most important and is structured in a useful fashion. It is also well-known that the capacity of short term memory (STM) is very minute, and long term memory (LTM) retrieval is difficult for tasks not performed recently. Humans are also not good at identifying patterns between chunks of data, structuring data in ways which are useful, and analogizing. External memory aids are helpful in resolving these issues, and semantically enriched ontologies can serve as external memory aids by both identifying patterns between concepts and groups of concepts and simulating a highly structured LTM that is simple to retrieve information from. Heuer also notes that human memory rarely changes retroactively, and well-maintained knowledge bases can accommodate this shortcoming. 3. Approach 3.1 Polaris Polaris is based on a set of 40 semantic relations which LCC has defined. Semantic relations are abstractions of underlying relations between concepts, and can occur within a word, between words, between phrases, and between sentences. Semantic relations are useful because

2 # Semantic Relation # Semantic Relation # Semantic Relation 1 Possession 15 Source-From 29 Possibility 2 Kinship 16 Topic 30 Certainty 3 Property-Attribute Holder 17 Manner 31 Theme-Patient 4 Agent 18 Means 32 Result 5 Temporal 19 Accompaniment-Companion 33 Stimulus 6 Depiction 20 Experiencer 34 Extent 7 Part-Whole 21 Recipient 35 Predicate 8 Hyponymy 22 Frequency 36 Belief 9 Entail 23 Influence 37 Goal 10 Cause 24 Associated-with/Other 38 Meaning 11 Make-Produce 25 Measure 39 Justification 12 Instrument 26 Synonymy-Name 40 Explanation 13 Location-Space 27 Antonymy 14 Purpose 28 Probability-of/Existence Table 1: LCC s 40 Semantic Relations they provide denser connectivity between concepts and contexts. Also, detecting semantic relations is one essential step toward the ultimate goal of machine text understanding. Semantic relations allow for richer ontologies and knowledge bases which can capture contextual knowledge, events, and firmer assertions. LCC's set of 40 relations is summarized in Table 1. These 40 relations have been carefully selected for their usefulness in natural language processing, for the feasibility of their automatic extraction from text, and for the broadest semantic coverage with the least amount of overlap. While no list will ever be perfect, LCC feels this list strikes a good balance between being too specific (too many relations making reasoning difficult) and too general (not enough information to be useful). An example of semantic relations is the sentence He carefully disarmed the letter bomb. The compound nominal letter bomb alone contains at least 5 semantic relations: letter bomb IS-A bomb, letter bomb IS-A letter, letter is the LOCATION of the bomb, bombing is the PURPOSE of letter bomb, and letter is the MEANS of bombing. The sentence also includes several other relations: He is the AGENT of disarm; carefully is the MANNER of disarmed; and the letter bomb is the THEME (or object) of disarmed. Together, these semantic relations can give a structured picture of the event: who was involved, what was done, and to what; and what was the purpose, etc. of the object involved. To find semantic relations in text, Polaris uses a combination of state-of-the-art text processing and machine learning techniques. In the first step, low-level NLP processing, such as named entity recognition, part-ofspeech tagging, syntactic parsing and word sense disambiguation, are used to structure the text. The parse tree is then broken down into a number of syntactic patterns that Polaris can analyze. These syntactic patterns include s and their arguments, complex nominals, adjective phrases, adjective clauses, and others. Polaris next runs classifiers on each section of text that matched a syntactic pattern. The classifiers examine features of the text and attempt to determine whether any of the 40 relations apply between the elements of the pattern. Most of the classifiers are based on one of four different machine learning algorithms: Decision Trees, Naïve Bayes, Support Vector Machine (SVM), and Semantic Scattering (a new learning algorithm that uses WordNet classes to find the most probable relation that holds between two nouns [Badulescu]). Some of these machine-learning classifiers use a per-relation approach to output only one specific relation they were trained to recognize, while others use a per-pattern approach which could potentially output any of the 40 semantic relations. Additionally, some classifiers containing human-coded rules are used for the most explicit and unambiguous cases. These three methods form a hybrid approach which produces better results than any one approach on its own. As an example of actual system performance, Table 2 demonstrates the output discovered by Polaris from the sentence Bin Laden reportedly purchased anthrax a half decade ago from a supplier in North Korea. Human-generated relations System output AGENT(Bin Laden, purchased) AGENT(Bin Laden, purchased) TOPIC(purchased, reportedly) THEME(anthrax, purchased) THEME(anthrax, purchased) RECIPIENT(a supplier in North LOCATION(from a supplier in Korea, purchased) North Korea, purchased) TEMPORAL(a half decade ago, TEMPORAL(a half decade ago, purchased) purchased) MEASURE(a half, decade) PROPERTY(half, decade) LOCATION(in North Korea, a LOCATION(in North Korea, a supplier) supplier) Table 2: List of relations discovered from example sentence 3.2 Jaguar Jaguar automatically builds domain-specific ontologies by processing plain text from a variety of sources. These ontologies can be fine-tuned to contain the level of detail

3 desired by an analyst. Ontologies built by Jaguar contain (i) ontological concepts, which are the basic building blocks of an ontology, (ii) a hierarchy, consisting of a structure imposed on certain ontological concepts via transitive relations that generally hold to be universally true (e.g. IS-A, part-whole, locative, etc), and (iii) the contextual knowledge base, consisting of semantic contexts that encapsulate knowledge of events via semantic relations. Current work also includes a fourth component called Axioms on Demand which capture assertions about knowledge and are useful for reasoning. Jaguar is a complex text processing project, using both basic and advanced NLP tools to accomplish its task. The first step in the process is to filter and clean up the input text. Raw input to Jaguar can come from all possible types of sources, including Word documents, PDF files and web pages in HTML format, and is therefore prone to having many irregularities, such as incomplete, strangely formatted sentences, headings, and tabular information. The filtering mechanism of Jaguar is a crucial step that makes the input acceptable for subsequent NLP tools to process it. A single run of Jaguar can be divided into two major processes: (i) text processing, and (ii) classification/hierarchy formation. In Text Processing, Jaguar is provided with a set of seeds which are used to determine the set of sentences of interest. Until recently, these were always selected manually; now, seeds can be automatically generated if desired and used in place of or to augment the manually selected seed set. The set of sentences selected based on the seeds goes through a set of NLP processing tools: named-entity recognition, part-ofspeech tagging, parsing, word-sense disambiguation, coreference resolution, and semantic relation discovery (Polaris). The resulting data structure is processed and used to populate one or many semantic contexts, groups of relations or nested contexts which hold true around a common central concept. Another aspect of text processing is concept discovery, which entails the discovery of noun concepts in sentences which are related to the target words or seeds. Each processed sentence is scanned for noun phrases, and targeted noun concepts are added to a local data structure for subsequent processing into the ontology's hierarchy. Figure 1 shows an example hierarchy and semantic context. Classification is the determination of a hierarchical structure within a group of concepts. Isolated IS-A (hypernymy) relations are discovered in the text processing stage. Classification uses a set of well-formed and tested procedures to impose a hierarchical structure on the set of discovered concepts, and it uses WordNet [Miller] as its upper ontology. Details of these procedures are presented in [Moldovan and Girju]. Hypernymy relations discovered via classification may contain anomalies or redundancies. Jaguar contains a conflict resolution engine which detects and corrects possible inconsistencies. The hierarchies in Jaguar are created link by link (or relation by relation) and follow a conflict avoidance technique, Figure 1: Example Hierarchy and Semantic Context within a Knowledge Base wherein each new relation is tested for anomalies/redundancies before being added to the hierarchy. Although single runs of Jaguar yield rich ontologies, the real power of it lies in providing an option to layer ontologies from many different runs. Jaguar can currently merge disparate ontologies into one by using the aforementioned conflict resolution technique. The merge tool merges the two ontologies' concept sets, hierarchies (using conflict resolution), and their knowledge bases (set of semantic contexts). Merging is useful for distributed or parallel systems where small chunks of the input text may be processed on some portions of the system and then subsequently merged. It also provides a foundation for future work in contextual reasoning and epistemic logic. The result is a rich knowledge base which can be viewed at many different levels of granularity, providing an analyst with the level of detail desired. 4. Results 4.1 Polaris As mentioned earlier, Polaris uses four machine learning algorithms to discover semantic relations in syntactic patterns: Semantic Scattering, Decision Trees, Naïve Bayes and Support Vector Machine. There are six primary pattern types discovered within noun phrases: N-N and Adj-N (which comprise compound nominals), 's and of (Genitive patterns), Adjective Phrases, and Adjective Clauses. The first five are further subdivided into nominalized and non-nominalized occurrences, giving a total of 11 patterns discovered within compound nominals. Table 3 summarizes the accuracy over the training data of each machine learning algorithm for each noun phrase pattern. In this table, non-al refers to nominalized forms and al refers to non-nominalized. The training corpus source for the noun phrase patterns is Wall Street Journal (TreeBank 2), L.A. Times (TREC 9), and XWN 2.0 [Harabagiu and Moldovan]. There are also five argument level patterns being discovered: NP, NP, PP, ADVP, and S. Table 4 summa-

4 Machine Learning Algorithms Syntactic Patterns Adjective Complex nominals Genitives Phrases NN AdjN Of 's NP prep NP al al al Verbal nonal nonal nonal Nonal nonal al Adj Clauses Semantic Scattering n/a NP Wh- Pron Decision Tree n/a Naïve Bayes n/a SVM Table 3: Machine Learning Accuracy for Noun Phrase Level rizes the accuracy over the training data for two machine learning algorithms. The training corpus source for the argument patterns is FrameNet [Baker]. Neither table is an indication of overall system score; however, if all inputs were perfect, each would indicate the expected best performance for the current system. Machine Learning Algorithms Syntactic Patterns NP NP PP ADVP Verb S Decision Tree SVM Table 4: Machine Learning for Verb Argument Level LCC has created a benchmark corpus to evaluate the Polaris system. The corpus contains 300 sentences, but currently only 51 have been fully annotated due to the large manual effort required. Within these 51 sentences, human annotators discovered 683 total relations; 290 of these match the syntactic patterns that Polaris currently recognizes. A scorer program runs Polaris over these same 51 sentences and compares the generated relations to the human annotations. As of March 29, 2005, Polaris discovered 265 relations within the syntactic patterns that it uses. Of these, 94 were exact matches to the human annotations. An additional 38.2 were partial matches, meaning that while the relation type was correct and the argument bracketing at least overlapped, there were some extra or missing tokens in the generated arguments. The partial matches are scored using precision, recall, and F- measure on the overlapping tokens. The total score for all matches, including discounting for partial matches, is shown in Table 5. The first column indicates performance on all human annotations, including those on syntactic patterns Polaris currently cannot see. The second column shows the performance within the syntactic patterns Polaris currently recognizes. The second column is a better indication of the overall potential of Polaris' approach if it were extended to include more syntactic patterns. All relations Measured over: Only relations covered by syntactic patterns Precision 49.89% Recall 19.63% 50.04% F-Measure 28.18% 49.96% Table 5: Polaris System Score The numbers continue to improve but are obviously not perfect. There are many reasons for this, resulting both from external and internal factors. The external NLP techniques which Polaris depends on offer varying degrees of precision. Automatic word sense disambiguation is percent accurate for nouns, and lower than that for s. Syntactic parsing is close to 90 percent accurate for subtrees, but this precision degenerates to somewhere between 50 and 70 percent for an entire, complex sentence. The part of speech tagger is around 95 percent accurate, and the named entity tagger ranges from percent accuracy. Additionally, there is currently no true coreference resolution library. Multiplying the accuracies of each tool which Polaris depends upon demonstrates that there is likely less than 50 percent likelihood of accuracy on real-world, complex sentences. Internally, there are also many issues which affect the precision and recall. The training data has a fair number of issues: insufficient examples for syntactic patterns or semantic relations; narrow domain for the training corpora; inconsistency in the order of relations arguments; noisy data; and lack of a one-to-one mapping to the source. Additionally, there are currently not enough features for each of the semantic relations. Relation arguments are many times ambiguous within a parse tree structure, and syntactic patterns do not always capture all relations. The machine learning classifiers tend to only return one relation per syntactic pattern even if there are multiple possibilities. There are also issues caused by metonymy (figures of speech) and multiple relations

5 Metric Name Conceptual Precision (CP) Subsumption Precision (SP) Conceptual Recall (CR) Subsumption Recall (SR) Unlinked Concepts (UC) Conceptual Expansion (CE) Metric Description number of well-formed and relevant concepts in the ontology divided by the total number of concepts in the ontology number of correct subsumption links in the ontology divided by the total number of subsumption links in the ontology number of well-formed and relevant concepts in the ontology divided by the union of this number and this number from a reference ontology number of correct subsumption links in the ontology divided by the union of this number and this number from a reference ontology proportion of orphan concepts in the ontology proportional difference between number of seed concepts and number of concepts in generated ontology Table 6: Ontology Evaluation Metrics found within the same phrase. Work is being done on all of these areas to help improve precision and recall. 4.2 Jaguar LCC has recently developed a battery of evaluation metrics to assess the quality of ontologies. They are summarized in Table 6. These ontology evaluation metrics were used to evaluate two versions of Jaguar, one which uses a manually selected set of seed concepts and one which selects seeds automatically. The document collection used for this evaluation was 5.67 megabytes of text from a CNS (Center for Nonproliferation Studies) corpus focused on chemical and biological weapons. The manually selected seed set consisted of 158 concepts associated with biological agents and weapons, and the automatically selected seed set consisted of 100 concepts. Both sets were used as input to Jaguar to create two separate ontologies for the biological weapons and agents domain. Two manually built, hand-edited ontologies focusing on the biological weapons domain were used as reference ontologies. These reference ontologies were pruned from the original ontologies to remove information about chemical and nuclear weapons, and one of them was additionally pruned to remove concepts not found in the document collection. The first reference ontology, which contains 151 concepts and 208 subsumption links, will be referred to as BW-manual, and the second one, which contains 68 concepts and 93 subsumption links, will be referred to as BW-manual-filtered. Jaguar was run two times, first with the 158 manually selected seeds (labeled BW-KAT1), and second with the 100 automatically selected seeds (labeled BW-KAT2). BW-KAT1 contained 4,712 concepts, with 896 considered to be well-formed and relevant to the domain; 85 of these were unsubsumed, and 756 of the remaining 811 subsumed concepts were considered to be accurate when checked manually. BW-KAT2 contained 7,197 concepts, with 1,147 considered to be well-formed and relevant to the domain; 68 of these were unsubsumed, and 977 of the remaining 1079 subsumed concepts were considered to be accurate. The metrics described above are summarized for BW-KAT1 and BW-KAT2 in Table 7. With the exception of conceptual precision, the results are very good. The results are also very comparable between the manual and automatic selection of seeds. There are, however, still issues which need to be addressed to improve the results. Due to its dependency on Polaris, Jaguar also depends on a number of lower level NLP components. Their shortcomings and effect on Polaris have previously been discussed and thus impact the performance of Jaguar. Improvement in lower level components should increase the performance of Jaguar. There is still a good bit of noise in the input to Jaguar, and better filtering techniques will increase the overall quality of the resultant ontology. The classifier uses a variety of heuristics, many of which possess some degree of ambiguity. Additionally, anomalies in the hypernymy tree, such as two very different concepts sharing the same hypernym several levels removed, introduces more noise into the data. Conflict resolution is still being researched, and though an initial implementation is in place, further refinement should also improve the quality of the built ontologies. Much effort has been made to build a collection of Metric BW-KAT1 BW-KAT2 Conceptual Precision (CP) 19.02% (896/4712) 15.94% (1147/7197) Subsumption Precision (SP) 93.22% (756/811) 90.55% (977/1079) Conceptual Recall (1) CR % (896/( )) 88.37% (1147/( )) Conceptual Recall (2) CR % (896/( )) 94.40% (1147/( )) Subsumption Recall (1) SR % (756/( )) 82.45% (977/( )) Subsumption Recall (2) SR % (756/( )) 91.31% (977/( )) Conceptual Expansion (CE) % (( )/ 158) 1047% (( )/100) Unlinked Concepts (UC) 9.49% (85/896) 5.93% (68/1147) Table 7: Results of Jaguar Evaluation

6 domain-specific ontologies on a regular and automatic basis. Using web harvesting tools developed at LCC, Jaguar has been extended to build ontologies automatically from the web. Seed concepts are used as query keywords for a search engine like Google, and found documents are ranked accordingly and then processed by Jaguar. Over 30 different ontologies have been built which include IS-A hierarchies; work is being done to augment them with other relation types, such as partwhole and locative. Example domains which have been built and made available via the web include HR, biological weapons, Al Qaeda, North Korean Nuclear Program, acid rain, and trains. 5. Conclusion LCC has made great strides toward extracting, structuring, and maintaining knowledge which can assist an analyst in higher levels of critical thinking for better analysis, but there is still much work to be done. Continued improvement of the quality of knowledge extracted and the relationships between chunks of knowledge is needed to ensure that the most useful information is always available to the analyst. More detailed work on extracting and formulating Axioms on Demand will allow ontologies to become more useful knowledge bases. Work on reasoning will allow the system to perform preliminary analysis and present it to the analyst to aid the critical thinking process. Mechanisms for connecting with disparate knowledge bases and ontologies are also being explored to improve the utility and structure of knowledge available to the analyst. The impact on text processing has already been large by bridging the gap to machine text understanding, enabling powerful technologies like QA, reasoning and inferences, IE, and summarization. Overall, the current system provides a very strong foundation for future endeavors and possesses a great deal of utility in its own right. Roxana Girju, et al Support Vector Machines Applied to the Classification of Semantic Relations in Nominalized Noun Phrases. In Proc. of the Lexical Semantics Workshop, HLT 2004, Boston. Sanda Harabagiu and Dan Moldovan. Knowledge Processing on an Extended WordNet. WordNet-An Electronic Lexical Database. MIT Press, C. Fellbaum editor, pp , Richards J. Heuer, Jr. Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence Agency, George Miller. WordNet: a lexical database for English. Communications of the ACM, Vol.38, No.11:39-41, Dan I. Moldovan and Roxana C. Girju. An Interactive Tool for the Rapid Development of Knowledge Bases. International Journal on Artificial Intelligence Tools, vol 10, no 1-2, March Acknowledgments This material is based upon work funded in part by the U.S. Government and any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the U.S. Government. Thanks to Altaf Mohammed, Lowell Boggs, Adriana Badulescu, and Ian Niles for their contributions. References Adriana Badulescu. Classification of Semantic Relations Between Nouns. Ph.D. Dissertation, University of Texas at Dallas Collins F. Baker, Charles J. Fillmore, and John B. Lowe The Berkeley FrameNet Project. In Proceedings of COLING/ACL '98: Montreal, Canada.

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Overview of the TACITUS Project

Overview of the TACITUS Project Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

Three Methods for ediscovery Document Prioritization:

Three Methods for ediscovery Document Prioritization: Three Methods for ediscovery Document Prioritization: Comparing and Contrasting Keyword Search with Concept Based and Support Vector Based "Technology Assisted Review-Predictive Coding" Platforms Tom Groom,

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

Travis Goodwin & Sanda Harabagiu

Travis Goodwin & Sanda Harabagiu Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

What Is This, Anyway: Automatic Hypernym Discovery

What Is This, Anyway: Automatic Hypernym Discovery What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

More information

Taxonomies in Practice Welcome to the second decade of online taxonomy construction

Taxonomies in Practice Welcome to the second decade of online taxonomy construction Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

An Approach towards Automation of Requirements Analysis

An Approach towards Automation of Requirements Analysis An Approach towards Automation of Requirements Analysis Vinay S, Shridhar Aithal, Prashanth Desai Abstract-Application of Natural Language processing to requirements gathering to facilitate automation

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

Paraphrasing controlled English texts

Paraphrasing controlled English texts Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich kaljurand@gmail.com Abstract. We discuss paraphrasing controlled English texts, by defining

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed

WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA by Zareen Saba Syed Thesis submitted to the Faculty of the Graduate School of the University of Maryland in partial fulfillment of the requirements

More information

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;

More information

Resolving Common Analytical Tasks in Text Databases

Resolving Common Analytical Tasks in Text Databases Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information

More information

Implementation of hybrid software architecture for Artificial Intelligence System

Implementation of hybrid software architecture for Artificial Intelligence System IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 2007 35 Implementation of hybrid software architecture for Artificial Intelligence System B.Vinayagasundaram and

More information

Language and Computation

Language and Computation Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Detecting Parser Errors Using Web-based Semantic Filters

Detecting Parser Errors Using Web-based Semantic Filters Detecting Parser Errors Using Web-based Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195-2350 Oren Etzioni {ayates,

More information

The University of Washington s UW CLMA QA System

The University of Washington s UW CLMA QA System The University of Washington s UW CLMA QA System Dan Jinguji, William Lewis,EfthimisN.Efthimiadis, Joshua Minor, Albert Bertram, Shauna Eggers, Joshua Johanson,BrianNisonger,PingYu, and Zhengbo Zhou Computational

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces

More information

WHITEPAPER. Text Analytics Beginner s Guide

WHITEPAPER. Text Analytics Beginner s Guide WHITEPAPER Text Analytics Beginner s Guide What is Text Analytics? Text Analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:> » A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Jinying Chen and Martha Palmer Department of Computer and Information Science, University of Pennsylvania,

More information

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao Requirements Analysis Concepts & Principles Instructor: Dr. Jerry Gao Requirements Analysis Concepts and Principles - Requirements Analysis - Communication Techniques - Initiating the Process - Facilitated

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

A Case Study of Question Answering in Automatic Tourism Service Packaging

A Case Study of Question Answering in Automatic Tourism Service Packaging BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Mining Opinion Features in Customer Reviews

Mining Opinion Features in Customer Reviews Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

The compositional semantics of same

The compositional semantics of same The compositional semantics of same Mike Solomon Amherst College Abstract Barker (2007) proposes the first strictly compositional semantic analysis of internal same. I show that Barker s analysis fails

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning jtl@ifi.uio.no Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

Click to edit Master title style

Click to edit Master title style Click to edit Master title style UNCLASSIFIED//FOR OFFICIAL USE ONLY Dr. Russell D. Richardson, G2/INSCOM Science Advisor UNCLASSIFIED//FOR OFFICIAL USE ONLY 1 UNCLASSIFIED Semantic Enrichment of the Data

More information