Crossing Corpora. Modelling Semantic Similarity across Languages and Lects.
|
|
|
- Emil Franklin
- 10 years ago
- Views:
Transcription
1 Distributional Models Bilectal Bilingual Crossing Corpora. Modelling Semantic Similarity across Languages and Lects. Yves Peirsman Supervisors: Dirk Geeraerts & Dirk Speelman Quantitative Lexicology and Variational Linguistics Conclusions
2 Background Corpus-based investigation of lexical variation jeans jeans jeansbroek spijkerbroek be 35% 45% 20% nl 20% 15% 65% Problem: semantic equivalence Solution: distributional models of semantic similarity Extension from one corpus to two corpora Application to two corpora from the same language: variational linguistics Application to two corpora from different languages: cross-language knowledge induction
3 Outline Distributional Models Semantic Similarity across two Lects Semantic Similarity across two Languages Conclusions and Outlook
4 Outline Distributional Models Semantic Similarity across two Lects Semantic Similarity across two Languages Conclusions and Outlook
5 Distributional Models Distributional hypothesis Semantically similar words appear in similar contexts (Wittgenstein, Firth, Harris) Recipe Ingredients: 1 large corpus, 1 definition of context, target words search the corpus for the contexts of your target words calculate the contextual similarity between each pair of target words find the most contextually similar target words
6 Distributional Models poor rich
7 Distributional Models poor 2 lawyer 5 rich
8 Distributional Models poor lawyer attorney rich
9 Distributional Models poor linguist lawyer attorney rich
10 Distributional Models poor linguist lawyer attorney rich
11 Distributional Models Definition of context context words: rich, poor several context sizes syntactic relations: subject of plead, object of eat documents or paragraphs The definition of context influences the type of semantic relation that is modelled. 3 evaluation exercises human similarity judgements EuroWordNet free associations
12 Human similarity judgements Distributional Models autograph shore 0.06 piloot fornuis 1.48 monk slave 0.57 sla reiger 2.28 glass jewel 1.78 konijn vlieg 8.79 bird crane 2.63 grasmachine stofzuiger cemetery graveyard 3.88 goudvis haring gem jewel 3.94 clementine sinaasappel Data Dutch: 250M words of Belgian Dutch newspaper language English: BNC
13 Human similarity judgements Distributional Models english S C2 C4 C6 C8 C dutch S C2 C4 C6 C8 C10
14 EuroWordNet Distributional Models number of nearest neighbours cohyponym hypernym hyponym synonym syn doc word space models
15 Distributional Models Free associations strawberry red, wave sea, pepper salt median rank of most frequent association word based PMI log likelihood statistic compound based syntax based document based context size
16 Distributional Models Conclusions Semantic similarity: syntax-based models and word-based models with small context sizes Semantic relatedness: document-based models Problem of data sparseness Modelling semantic similarity across languages and lects: word-based models with small context sizes
17 Outline Distributional Models Semantic Similarity across two Lects Semantic Similarity across two Languages Conclusions and Outlook
18 Cross-lectal similarity lorry taxi motorbike car motorcycle
19 Cross-lectal similarity lorry truck taxi cab motorbike car motorcycle
20 Cross-lectal similarity Extension of semantic similarity to two corpora Requirement: context features are identical between the corpora Bilectal models can help us recognize markers of a specific language variety extract the synonym from another language variety
21 Cross-lectal similarity stad/stad schepen wethouder burgemeester minister burgemeester minister beslissen/beslissen
22 Cross-lectal similarity Lectal markers traditional frequency-based definition of keywords different frequencies lectal markers? lectal markers different frequencies? new distributional definition Data 250M words of newspaper language from BE and NL Referentiebestand Belgisch Nederlands list of 4,000 markers of Belgian Dutch with their Netherlandic Dutch alternative
23 Cross-lectal similarity alternation BE NL meaning free ajuin ui onion aftrekker flesopener bottle opener unique doctorandus promovendus PhD student schepen wethouder alderman unlexicalised dodentol number of victims vieruurtje afternoon snack informal refter eetzaal canteen schoon mooi beautiful restrictions schacht first-year student kaka poep poop cultural CM Belgian health fund rijkswacht former national police
24 Lectal markers Cross-lectal similarity Look for words with an unexpectedly low cross-lectal distributional similarity Method finds more RBBN words than traditional keyword methods precision log likelihood chi square stable lexical markers cosine
25 Cross-lectal similarity Cross-lectal synonyms Extract nearest Netherlandic neighbours to RBBN words. > 20% of the 1st nearest neighbours are RBBN synonyms. > 50% of the RBBN synonyms are in the top 15 neighbours. Belgian Netherlandic alternatives accuracy context size 2 context size 3 context size 4 context size 5 context size 6 syntax number of nearest neighbours
26 Cross-lectal similarity Examples of correct pairs Belgian marker Netherlandic alternative meaning bankbriefje bankbiljet bank note confituur jam jam fier trots proud gamma assortiment assortment job baan job kot studentenkamer student room living woonkamer living room microgolf magnetron microwave oven proper schoon clean uitbaten exploiteren exploit
27 Cross-lectal similarity Problem group 1: polysemous words Belgian marker alternative nearest neighbours katholiek catholic goed good katholiek catholic rooms-katholiek roman catholic protestants protestant stelling position steiger stelling position scaffold standpunt position opvatting opinion tenor tenor topper tenor tenor steersman countertenor counter tenor bariton baritone
28 Cross-lectal similarity Problem group 2: colloquial words Belgian marker alternative nearest neighbours bal ball/franc frank franc bal ball schot shot rust rest flessen fail zakken aanlengen dilute hergebruiken reuse recycle recycle boerenbuiten platteland vrouwenhuis women s house countryside buitenlieden countrymen huisschrijver resident author
29 Cross-lectal similarity Discussion Distributional approaches can model lexical variation Tool for variational-linguistic research Polysemy is a major problem Future work Application to other languages, other genres Addressing polysemy
30 Outline Distributional Models Semantic Similarity across two Lects Semantic Similarity across two Languages Conclusions and Outlook
31 Cross-lingual similarity Application of distributional models to two corpora can be extended to two languages Condition of overlap in context features Many languages share a number of words (cognates, loans) These words generally have a similar meaning.
32 Cross-lingual similarity Method 1. Identify the shared words between the two languages. e.g. manager, school, kind 2. Use these words as corresponding context words to find a small number of reliable translations e.g. auto car, boek book, kind child 3. Add these newly found translations to the set of corresponding context words 4. Repeat
33 Cross-lingual similarity kiezer/voter verkiezing election wet law minister/minister
34 Experiments Cross-lingual similarity Experiments for Dutch-English, Dutch-German, German-English, English-Spanish Additional stemming for English-Spanish Final results n noun adj verb total Dutch-German German-Dutch Dutch-English English-Dutch English-German German-English English-Spanish Spanish-English
35 Cross-lingual similarity Accuracy improves through the process first translation accuracy German Dutch Dutch German English German German English English Dutch Dutch English Spanish English English Spanish iteration
36 Cross-lingual similarity Examples stad = city town area roman = novel story poem aap animal elephant = monkey vrouw = woman man child bedreigen = threaten protect attack leren = learn = teach know veronderstel imply regard = assume wemel? translate abound crowd
37 Cross-lingual similarity Applications Identification of false friends Bilingual knowledge induction monolingual selectional preference model schießen shoot deer Hirsch monolingual selectional preference model (schießen,obj,hirsch) German triple bilingual vector space (shoot,obj,deer) English triple
38 Cross-lingual similarity hunter murderer rabbit peacock deer duck
39 Cross-lingual similarity monolingual selectional preference model schießen shoot deer Hirsch monolingual selectional preference model (schießen,obj,hirsch) German triple bilingual vector space (shoot,obj,deer) English triple
40 Outline Distributional Models Semantic Similarity across two Lects Semantic Similarity across two Languages Conclusions and Outlook
41 Conclusions and Outlook Extension of semantic similarity to two corpora Application to lexical variation Identification of lectal markers Identification of cross-lectal synonyms Application to bilingual knowledge induction Basic translation of words between languages Generalizing knowledge from one language to the other
42 Polysemy Conclusions and Outlook Addressing polysemy involves a movement towards individual contexts Looking for a clustering of contexts that corresponds to the sense structure of a word stelling (BE) gebouw, renoveren argument, verdedigen
43 Polysemy Conclusions and Outlook Addressing polysemy involves a movement towards individual contexts Looking for a clustering of contexts that corresponds to the sense structure of a word stelling (BE) gebouw, renoveren stelling (NL) steiger (NL) argument, verdedigen
44 For more information:
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions. Natalia Levshina
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Natalia Levshina RU Quantitative Lexicology and Variational Linguistics Faculteit Letteren Subfaculteit Taalkunde K.U.Leuven
Tracking change in word meaning
Overview Intro DisSem Previous Case Visualisation Conclusion References Tracking change in word meaning A dynamic visualization of diachronic distributional semantics Kris Heylen, Thomas Wielfaert & Dirk
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?
Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
2. SEMANTIC RELATIONS
2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the
Joint Research Centre
Joint Research Centre Open Source Monitoring Tools and Applications emm.newsbrief.eu Serving society Stimulating innovation Supporting legislation Open Source Monitoring - Overview EMM Introduction Custom
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE
SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
Lexical convergence in the Dutch lexicon
Overview Introduction Dutch Method Results Lexical convergence in the Dutch lexicon Jocelyne Daems Kris Heylen Dirk Geeraerts University of Leuven RU Quantitative Lexicology and Variational Linguistics
Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer
Acquiring grammatical gender in northern and southern Acquring grammatical gender in southern and northern 2 Research questions How does variation relate to change? (transmission in Labov 2007 variation
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch.
Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch Natalia Levshina Outline 1. Dutch causative Cx with doen 2. Data and method 3. Quantitative
Varieties of lexical variation
Dirk Geeraerts University of Leuven Varieties of lexical Abstract This paper presents the theoretical backgr ound of a large-scale lexicological research project on lexical that was carried out at the
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Distributional Semantic Modelling in Cognitive Sociolinguistics: QLVL probes Semantic Space
Overview CogSoLx Onomas Semas Conclusion Distributional Semantic Modelling in Cognitive Sociolinguistics: QLVL probes Semantic Space Kris Heylen, Dirk Geeraerts & Dirk Speelman KU Leuven Quantitative Lexicology
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Young Learners English
University of Cambridge ESOL Examinations Young Learners English Starters Information for Candidates Information for candidates YLE Starters Dear Parent Thank you for encouraging your child to learn English
MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS
University of Cambridge: Programme Specifications Every effort has been made to ensure the accuracy of the information in this programme specification. Programme specifications are produced and then reviewed
Some Implications of Controlling Contextual Constraint: Exploring Word Meaning Inference by Using a Cloze Task
Some Implications of Controlling Contextual Constraint: Exploring Word Meaning Inference by Using a Cloze Task Abstract 20 vs Keywords: Lexical Inference, Contextual Constraint, Cloze Task 1. Introduction
Getting Off to a Good Start: Best Practices for Terminology
Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, [email protected] Tools in the Terminology Life Cycle Extraction
INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces
INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces Erik Velldal University of Oslo Sept. 4, 2012 Topics for today 2 More Common Lisp More data types: Arrays, sequences, hash tables, and
Improving Question Retrieval in Community Question Answering Using World Knowledge
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang
Clustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain [email protected] 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
Constituency. The basic units of sentence structure
Constituency The basic units of sentence structure Meaning of a sentence is more than the sum of its words. Meaning of a sentence is more than the sum of its words. a. The puppy hit the rock Meaning of
3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan [email protected]
Gallito 2.0: a Natural Language Processing tool to support Research on Discourse
Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions
FACULTEIT LETTEREN SUBFACULTEIT TAALKUNDE KATHOLIEKE UNIVERSITEIT LEUVEN Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Proefschrift ingediend tot het behalen van de
Master of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse Rob Sanderson, Matthew Brook O Donnell and Clare Llewellyn What happens with
TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
THE BACHELOR S DEGREE IN SPANISH
Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text
New Developments in the Automatic Classification of Email Records. Inge Alberts, André Vellino, Craig Eby, Yves Marleau
New Developments in the Automatic Classification of Email Records Inge Alberts, André Vellino, Craig Eby, Yves Marleau ARMA Canada 2014 INTRODUCTION 2014 2 OUTLINE 1. Research team 2. Research context
Exam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
SPRING SCHOOL. Empirical methods in Usage-Based Linguistics
SPRING SCHOOL Empirical methods in Usage-Based Linguistics University of Lille 3, 13 & 14 May, 2013 WORKSHOP 1: Corpus linguistics workshop WORKSHOP 2: Corpus linguistics: Multivariate Statistics for Semantics
Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Mahesh Srinivasan. Assistant Professor of Psychology and Cognitive Science University of California, Berkeley
Department of Psychology University of California, Berkeley Tolman Hall, Rm. 3315 Berkeley, CA 94720 Phone: (650) 823-9488; Email: [email protected] http://ladlab.ucsd.edu/srinivasan.html Education
Modern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
Fiction: Poetry. Classic Poems. Contemporary Poems. Example. Key Point. Example
Reading - Comprehension Fiction: Poetry Read classic and contemporary poems Recognise riddles and rhymes Recognise tongue twisters Classic Poems A poem expresses an emotion or an idea. Rhyme is when words
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION
General and Professional Education 3/2013 pp. 21-27 ISSN 2084-1469 ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION Svetlana Sheremetyeva Department
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3
Yıl/Year: 2012 Cilt/Volume: 1 Sayı/Issue:2 Sayfalar/Pages: 40-47 Differences in linguistic and discourse features of narrative writing performance Abstract Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu
Chapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Cross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
A terminology model approach for defining and managing statistical metadata
A terminology model approach for defining and managing statistical metadata Comments to : R. Karge (49) 30-6576 2791 mail [email protected] Content 1 Introduction... 4 2 Knowledge presentation...
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh [email protected] Abstract We describe our systems for the SemEval
Analyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
Enhancing Lotus Domino search
Enhancing Lotus Domino search Efficiency & productivity through effective information location 2009 Diegesis Limited Enhanced Search for Lotus Domino Efficiency and productivity - effective information
CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature
CS4025: Pragmatics Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature For more info: J&M, chap 18,19 in 1 st ed; 21,24 in 2 nd Computing Science, University of
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France [email protected] Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
PUSD High Frequency Word List
PUSD High Frequency Word List For Reading and Spelling Grades K-5 High Frequency or instant words are important because: 1. You can t read a sentence or a paragraph without knowing at least the most common.
A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
How To Teach English To Other People
TESOL / NCATE Program Standards STANDARDS FOR THE ACCREDIATION OF INITIAL PROGRAMS IN P 12 ESL TEACHER EDUCATION Prepared and Developed by the TESOL Task Force on ESL Standards for P 12 Teacher Education
Taxonomies in Practice Welcome to the second decade of online taxonomy construction
Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Databases and computerized information retrieval
**** 1 Databases and computerized information retrieval Introduction **** What is a database? 2 A database is a collection of similar data records stored in a common file (or collection of files). ****
CACAO PROJECT AT THE LOGCLEF TRACK
CACAO PROJECT AT THE LOGCLEF TRACK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, [email protected] Abstract This paper presents the participation of the CACAO prototype
