# A Graph-Based Approach to Commonsense Concept Extraction and Semantic Similarity Detection

Save this PDF as:

Size: px
Start display at page:

Download "A Graph-Based Approach to Commonsense Concept Extraction and Semantic Similarity Detection"

## Transcription

4 Data: NounPhrase1, NounPhrase2 Result: True if the concepts are similar, else False if Both phrases have atleast one noun in common then Objects1 := All Valid Objects for NounPhrase1; Objects2 := All Valid Objects for NounPhrase2; M1 = matches from KB for M1 := ; M2 := ; for all concepts in NounPhrase1 do M1 := M1 all property matches for concept; for all concepts in NounPhrase2 do M2 := M2 all property matches for concept ; SetCommon = M1 M2 ; if length of SetCommon > 0 then The Noun Phrases are similar else They are not similar Algorithm 3: Finding similar concepts In particular, we use AffectNet 1 [3] (hereafter termed A), a 14, ,365 matrix of affective commonsense knowledge. In A, each concept is represented by a vector in the space of possible features whose values are positive for features that produce an assertion of positive valence (e.g., a penguin is a bird ), negative for features that produce an assertion of negative valence (e.g., a penguin cannot fly ), and zero when nothing is known about the assertion. The degree of similarity between two concepts, then, is the dot product between their rows in A. The value of such a dot product increases whenever two concepts are described by the same features and decreases when they are described by features that are negations of each other. In particular, we use truncated singular value decomposition (SVD) [27]: the resulting matrix 2 has the form Ã = U k Σ k Vk T and is a lowrank approximation of A, the original data. This approximation is based on minimizing the Frobenius norm of the difference between A and Ã under the constraint rank(ã) = k. For the Eckart-Young theorem [10], it represents the best approximation of A in the least-square sense, in fact: min A Ã = min Σ U ÃV Ã rank(ã)=k Ã rank(ã)=k i=1 = min Ã rank(ã)=k Σ S assuming that Ã has the form Ã = USV, where S is diagonal. From the rank constraint, i.e., S has k non-zero diagonal entries, the minimum of the above statement is obtained as follows: min Σ S = min n (σ i s Ã rank(ã)=k s i) 2 = i i=1 = min k n (σ i s s i) 2 + σi 2 = n σi 2 i i=k i=k+1 Figure 2: Vector space model for reasoning on the semantic relatedness of commonsense concepts Therefore, Ã of rank k is the best approximation of A in the Frobenius norm sense when σ i = s i (i = 1,..., k) and the corresponding singular vectors are the same as those of A. If we choose to discard all but the first k principal components, commonsense concepts are represented by vectors of k coordinates, which can be seen as describing multi-word expressions in terms of eigenconcepts that form the axes of the resulting vector space, i.e., its basis e 0,...,e k 1 (Fig. 2). Thus, by exploiting the information sharing property of truncated SVD, concepts with similar meaning are likely to have similar semantic features - that is, concepts conveying the same semantics t to fall near each other in the space. Concept similarity does not dep on their absolute positions in the vector space, but rather on the angle they make with the origin. In order to measure such semantic relatedness, then, Ã is clustered by using a k-medoid approach [21]. Differently from the k-means algorithm (which does not pose constraints on centroids), k-medoids do assume that centroids must coincide with k observed points. The k-medoids approach is similar to the partitioning around medoids (PAM) algorithm, which determines a medoid for each cluster selecting the most centrally located centroid within that cluster. Unlike other PAM techniques, however, the k-medoids algorithm runs similarly to k-means and, hence, requires a significantly reduced computational time. Given that the distance between two points in the space is defined as D(e i, e j) = d s=1 ( e (s) i e (s) j ) 2, the adopted algorithm can be summarised as follows: 1. Each centroid ē i R d (i = 1, 2,..., k) is set as one of the k most representative instances of general categories such as time, location, object, animal, and plant; 2. Assign each instance e j to a cluster ē i if D(e j, ē i) D(e j, ē i ) where i(i ) = 1, 2,..., k; 3. Find a new centroid ē i for each cluster c so that j Cluster c D(ej, ēi) j Cluster c D(ej, ē i ); 4. Repeat step 2 and 3 until no changes are observed. 568

5 5. EVALUATION A manually labeled dataset of 200 multi-word concept pairs were considered for evaluating the similarity detection capabilities of the developed parser 3. Such pairs were randomly selected from a collection of different knowledge bases, namely: DBPedia [1], Concept- Net [23], NELL [6], and tuples obtained from Google N- grams [24], containing about 9 million triples. As a baseline, we considered the syntactic similarity algorithm alone; then, we calculated semantic similarity by means of multidimensional scaling; finally, an ensemble of both approaches was employed. Results are shown in Table 1. A goldset of 50 natural language sentences, moreover, was selected to test how the ensemble application of concept extraction and semantic similarity detection can improve semantic parsing, with respect to the POS-based bigram algorithm alone and a naïve parser that only uses the Stanford Chunker and the Lancaster stemming algorithm. Results are listed in Table 2. Algorithm Precision Recall F-measure Syntactic similarity 65.6% 67.3% 66.4% Semantic similarity 77.2% 70.8% 73.9% Ensemble similarity 85.4% 74.0% 79.3% Table 1: Performance of different similarity detection algorithms over 200 concept pairs Algorithm Concept extraction accuracy Naïve parser 65.8% POS-based bigram 79.1% POS-based + similarity 87.6% Table 2: Performance of different parsing algorithms over 50 natural language sentences 6. CONCLUSION AND FUTURE WORK In this paper, we proposed a novel approach for effectively extracting event and object concepts from natural language text, aided by a semantic similarity detection technique capable of effectively finding syntactically and semantically related concepts. We also explored how knowledge may be used to expand the reach of matching algorithms and compensate for database sparsity. Future work will involve exploration of how commonsense knowledge may be repurposed to generate even more knowledge by using existing commonsense to detect natural language patterns and, hence, match such patterns on new texts in order to extract previously unknown pieces of knowledge. In addition, work will be undertaken exploring how to create ad hoc knowledge extraction algorithms that output data ideal for immediate entry into specific commonsense knowledge representation and reasoning systems. 7. REFERENCES [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages , [2] E. Cambria, N. Howard, J. Hsu, and A. Hussain. Sentic bling: Scalable multimodal fusion for continuous interpretation of semantics and sentics. In IEEE SSCI, Singapore, [3] E. Cambria and A. Hussain. Sentic Computing: Techniques, Tools, and Applications. Springer, Dordrecht, Netherlands, [4] E. Cambria, D. Rajagopal, D. Olsher, and D. Das. Big social data analysis. In R. Akerkar, editor, Big Data Computing, chapter 13. Chapman and Hall/CRC, [5] E. Cambria, Y. Song, H. Wang, and N. Howard. Semantic multi-dimensional scaling for open-domain sentiment analysis. IEEE Intelligent Systems, doi: /MIS , [6] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka, and T. Mitchell. Toward an architecture for never-ing language learning. In AAAI, pages , Atlanta, [7] G. Carroll and E. Charniak. Two experiments on learning probabilistic depency grammars from corpora. AAAI technical report WS-92-01, Department of Computer Science, Univ., [8] E. Charniak. Statistical parsing with a context-free grammar and word statistics. In AAAI, pages , Providence, [9] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6): , [10] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3): , [11] C. Fellbaum. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, [12] M. Grassi, E. Cambria, A. Hussain, and F. Piazza. Sentic web: A new paradigm for managing social media affective information. Cognitive Computation, 3(3): , [13] R. Hwa. Sample selection for statistical grammar induction. In EMNLP, pages 45 52, Hong Kong, [14] J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. Advances in neural information processing systems, 15: , [15] D. Lenat and R. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, Boston, [16] C. Manning. Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 6608 of Lecture Notes in Computer Science, pages Springer, [17] D. Olsher. COGPARSE: Brain-inspired knowledge-driven full semantics parsing. In Advances in Brain Inspired Cognitive Systems, pages 1 11, [18] D. Olsher. Full spectrum opinion mining: Integrating domain, syntactic and lexical knowledge. In ICDM SENTIRE, pages , Brussels,

6 [19] D. Olsher. COGVIEW & INTELNET: Nuanced energy-based knowledge representation and integrated cognitive-conceptual framework for realistic culture, values, and concept-affected systems simulation. In IEEE SSCI, Singapore, [20] C. Paice. Another stemmer. SIGIR Forum, 24(3):56 61, [21] H. Park and C. Jun. A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2): , [22] M. Sahami and T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages , Edinburgh, [23] R. Speer and C. Havasi. ConceptNet 5: A large semantic network for relational knowledge. In E. Hovy, M. Johnson, and G. Hirst, editors, Theory and Applications of Natural Language Processing, chapter 6. Springer, [24] N. Tandon, D. Rajagopal, and G. De Melo. Markov chains for robust graph-based commonsense information extraction. In COLING, pages , Mumbai, [25] M. Tang, X. Luo, S. Roukos, et al. Active learning for statistical natural language parsing. In ACL, pages , Philadelphia, [26] K. Toutanova, D. Klein, C. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic depency network. In NAACL, pages , Stroudsburg, [27] M. Wall, A. Rechtsteiner, and L. Rocha. Singular value decomposition and principal component analysis. In D. Berrar, W. Dubitzky, and M. Granzow, editors, A Practical Approach to Microarray Data Analysis, pages Springer,

### Sentiment analysis for news articles

Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based

### Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

### Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

### THE EVER-GROWING amount of available information

1 Semantic Multi-Dimensional Scaling for Open-Domain Sentiment Analysis Erik Cambria, Member, IEEE, Yangqiu Song, Member, IEEE, Haixun Wang, Member, IEEE, and Newton Howard, Member, IEEE Abstract The ability

### SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis

SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis Erik Cambria School of Computer Engineering Nanyang Technological University cambria@ntu.edu.sg Daniel Olsher

### Mining Opinion Features in Customer Reviews

Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu

### Effective Self-Training for Parsing

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

### Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

### Guest Editors Introduction: Machine Learning in Speech and Language Technologies

Guest Editors Introduction: Machine Learning in Speech and Language Technologies Pascale Fung (pascale@ee.ust.hk) Department of Electrical and Electronic Engineering Hong Kong University of Science and

### SenticNet: A Publicly Available Semantic Resource for Opinion Mining

Commonsense Knowledge: Papers from the AAAI Fall Symposium (FS-10-02) SenticNet: A Publicly Available Semantic Resource for Opinion Mining Erik Cambria University of Stirling Stirling, FK4 9LA UK eca@cs.stir.ac.uk

### English Grammar Checker

International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook

### Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

### Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

### Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

### Multi-source hybrid Question Answering system

Multi-source hybrid Question Answering system Seonyeong Park, Hyosup Shim, Sangdo Han, Byungsoo Kim, Gary Geunbae Lee Pohang University of Science and Technology, Pohang, Republic of Korea {sypark322,

### dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

### Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

### Curse or Boon? Presence of Subjunctive Mood in Opinionated Text

Curse or Boon? Presence of Subjunctive Mood in Opinionated Text Sapna Negi and Paul Buitelaar Insight Centre for Data Analytics National University of Ireland Galway firstname.lastname@insight-centre.org

### TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs

### Sentic Album: Content-, Concept-, and Context-Based Online Personal Photo Management System

Cogn Comput (2012) 4:477 496 DOI 10.1007/s12559-012-9145-4 Sentic Album: Content-, Concept-, and Context-Based Online Personal Photo Management System Erik Cambria Amir Hussain Received: 15 August 2011

### Assignment 3. Due date: 13:10, Wednesday 4 November 2015, on CDF. This assignment is worth 12% of your final grade.

University of Toronto, Department of Computer Science CSC 2501/485F Computational Linguistics, Fall 2015 Assignment 3 Due date: 13:10, Wednesday 4 ovember 2015, on CDF. This assignment is worth 12% of

### Semi-Supervised Learning for Blog Classification

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

### A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

### Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

### Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: lurdes@sip.ucm.es)

### Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

### Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

### Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

### SIMOnt: A Security Information Management Ontology Framework

SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud

### Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

### Natural Language to Relational Query by Using Parsing Compiler

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

### Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### . Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

### Semantic analysis of text and speech

Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Generatin Coherent Event Schemas at Scale

Generatin Coherent Event Schemas at Scale Niranjan Balasubramanian, Stephen Soderland, Mausam, Oren Etzioni University of Washington Presented By: Jumana Table of Contents: 1 Introduction 2 System Overview

### The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

SCIENTIFIC PAPERS, UNIVERSITY OF LATVIA, 2010. Vol. 757 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES 11 22 P. The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle Armands Šlihte Faculty

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

### DEPENDENCY PARSING JOAKIM NIVRE

DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes

### Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

### RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

### Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

### Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

### Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing

Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento

### Machine Learning for natural language processing

Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to

### A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

### APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

### Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques

Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques Jennifer Foster Natural Language Processing and Language Learning Workshop Nancy

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Learning and Inference for Clause Identification

Learning and Inference for Clause Identification Xavier Carreras Lluís Màrquez Technical University of Catalonia (UPC) Vasin Punyakanok Dan Roth University of Illinois at Urbana-Champaign (UIUC) ECML 2002

### Brill s rule-based PoS tagger

Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

### Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

### THE SEMANTIC WEB AND IT`S APPLICATIONS

15-16 September 2011, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2011) 15-16 September 2011, Bulgaria THE SEMANTIC WEB AND IT`S APPLICATIONS Dimitar Vuldzhev

### Learning Translation Rules from Bilingual English Filipino Corpus

Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

### Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect

### Special Topics in Computer Science

Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

### Semantic annotation of requirements for automatic UML class diagram generation

www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

### Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

### Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### How to make Ontologies self-building from Wiki-Texts

How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.

### Big Social Data Analysis

13 Big Social Data Analysis Erik Cambria, Dheeraj Rajagopal, Daniel Olsher, and Dipankar Das CONTENTS From Small to Big Social Data Analysis... 401 Large-Scale Sentiment Analysis and Tracking...403 Toward

### Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

### Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

### 2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

### Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

### Boosting the Feature Space: Text Classification for Unstructured Data on the Web

Boosting the Feature Space: Text Classification for Unstructured Data on the Web Yang Song 1, Ding Zhou 1, Jian Huang 2, Isaac G. Councill 2, Hongyuan Zha 1,2, C. Lee Giles 1,2 1 Department of Computer

### SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

### Soft Clustering with Projections: PCA, ICA, and Laplacian

1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix

### POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Information extraction from online XML-encoded documents

Information extraction from online XML-encoded documents From: AAAI Technical Report WS-98-14. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Patricia Lutsky ArborText, Inc. 1000

### Training Set Properties and Decision-Tree Taggers: A Closer Look

Training Set Properties and Decision-Tree Taggers: A Closer Look Brian R. Hirshman Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 hirshman@cs.cmu.edu Abstract This paper

### Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu

Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based

### Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo

### Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

### A Knowledge-based System for Translating FOL Formulas into NL Sentences

A Knowledge-based System for Translating FOL Formulas into NL Sentences Aikaterini Mpagouli, Ioannis Hatzilygeroudis University of Patras, School of Engineering Department of Computer Engineering & Informatics,

### Towards robust symbolic reasoning about information from the web

Towards robust symbolic reasoning about information from the web Steven Schockaert School of Computer Science & Informatics, Cardiff University, 5 The Parade, CF24 3AA, Cardiff, United Kingdom s.schockaert@cs.cardiff.ac.uk

### Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

### Some Research Challenges for Big Data Analytics of Intelligent Security

Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,

### Cross-lingual Synonymy Overlap

Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University

### Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

### Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and

### Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

### Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

### Overview of the TACITUS Project

Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for

### Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

### Transition-Based Dependency Parsing with Long Distance Collocations

Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...