Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]
|
|
- Delilah Charlotte Perkins
- 7 years ago
- Views:
Transcription
1 Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds 8. Co-occurrence analysis 9. Application III: Word senses in lexicography 10. Keyword analysis 8.1 Cluster analysis 8.2 Co-occurrence 8.3 CCDB & IDS co-occurrence analysis 8.4 Searching for collocations Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] word group analysis 8.1 Cluster analysis Cluster A cluster is a chain of linguistic entities. In er sprach vor einem großen Publikum, spr is a consonant cluster consisting of 3 consonants und sprach vor einem a word cluster consisting of 3 words. n-gram A n-gram is a sequence of n linguistic elements of the same type (Kunze & Lemnitzer 2007: 190) A 4-gram of words is a sequence of 5 words. A n-gram is the same as a n- cluster. The term n-gram is used in particular if all n-cluster are extracted from a corpus. Kunze, Claudia und Lothar Lemnitzer. Computerlexikographie. Eine Einführung. Tübingen: Narr [E-Book], S Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 2] 1
2 1 Mongolia / Languages Search: clusters 2 Publishing out of 2 dictionaries words ending in off in part of the 3 Corpus linguistics English corpus of the LCC 4 Improving dictionaries 5 Outlook Search term position (here: on right) Search term (here: off) List of bi-grams with rank and fequency Sort (here: accord. to frequency of the cluster) Size of cluster (here: clusters out of two words) Frequency condition (here: at least three tokens) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 3] Co-occurrence 2.4 Co-occurrence Co-occurrence In a general sense, the term co-occurrence refers to the occurrence of two expressions close to each other. In a more specific sense, the term cooccurrence is used when the two expression occur more often together than can be expected if all words were distributed by chance. co-occurrence analysis the basic idea 1) Assumption: In a certain corpus, word X occurs a 1000 times, word Y a 100 times, word Z 10 times. 2) Probability: The combination XY is ten times as likely as the combination XZ. XY should occur ten times as often as XZ. 3) Observation: Actually, XZ occurs about as often as XY. 4) Conclusion: There is a close linguistic connection between X and Z (close beyond expectation). Kunze, Claudia und Lothar Lemnitzer. Computerlexikographie. Eine Einführung. Tübingen: Narr [E-Book], S. 391f. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 4] 2
3 1 Mongolia / Languages Search: co-occurrences for just in part 2 Publishing of the English dictionaries corpus of 3 Corpus the LCC. linguistics 4 Improving dictionaries 5 Outlook List of co-occurrence partner words with rank, frequency, and significance measure Search term (here: just) Definition of search context (here: up to 2 words after the search term) Sort (here: according to significance of co-occurrence) Frequency condition (here: at least 10 tokens) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 5] 8.3 CCDB & IDS co-occurrence analysis Co-occurrence analysis at the IDS Access: via COSMAS II WWW interface via COSMAS II client via CCDB (co-occurrence databasa) WWW interface and client: Co-occurrences are computed online (takes some time); several options for fine-tuning the analysis are available. CCDB: results of co-occurrence analyses are stored (fast access); no finetuning of analysis; automatic comparison of collocation profies available Quelle: Belica, Cyril: Kookkurrenzdatenbank CCDB. Eine korpuslinguistische Denkund Experimentierplattform für die Erforschung und theoretische Begründung von systemisch-strukturellen Eigenschaften von Kohäsionsrelationen zwischen den Konstituenten des Sprachgebrauchs Institut für Deutsche Sprache, Mannheim. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 6] 3
4 Anwendungsbeispiel II: Kookkurrenzen zu bestehen Question: co-occurrences for bestehen (in particular governed prepositions). 1 Mongolia Textkorpora / Languages 2 Publishing Recherchemethoden dictionaries 3 Corpus Anwendungen linguistics 4 Improving Rechercheprogramme dictionaries 5 Outlook Schlussbemerkung Co-occurrence analysis for bestehen as part of the CCDB (setting: do not ignore function words) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 7] Anwendungsbeispiel II: Kookkurrenzen zu bestehen Question: co-occurrences for bestehen (in particular governed prepositions). 1 Mongolia Textkorpora / Languages 2 Publishing Recherchemethoden dictionaries 3 Corpus Anwendungen linguistics 4 Improving Rechercheprogramme dictionaries 5 Outlook Schlussbemerkung Typical syntagmatic patterns in which the words co-occur, e. g. besteht aus [ ] [zwei drei] Teilen, consists of [ ] [two three] parts Secondary co-occurrence partners of bestehen + aus, here: aus Mitgliedern / Teilen / Ortsteilen bestehen, consist of members / parts / suburbs Primary co-occurrence partner of bestehen (here: aus) Strength of the connection (here: 40683) Co-occurrence analysis for bestehen as part of the CCDB (setting: do not ignore function words) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 8] 4
5 8.3 CCDB & IDS co-occurrence analysis Results (among others) aus: besteht [ ] aus ( consists of [ ] ) besteht [ ] aus [ ] Mitgliedern ( consists [ ] of [ ] members ) darin: besteht [ ] darin, dass ( is [ ] that ) die Schwierigkeit [ ] besteht [ ] darin, dass ( the difficulty [ ] is [ ] that ) darauf: besteht [ ] darauf, dass ( insists [ ] that ) er bestand [ ] darauf, dass ( he insisted [ ] that ) worin: worin [ ] besteht worin [ ] besteht der Unterschied zwischen ( what [ ] is the difference between ) governed preposition: auf, aus, in prepositions auf and in in particular as prepositional complement clauses preposition in often in interrogative sentences Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 9] 8.4 Searching for collocations Exploration of collocations and fixed expressions Article from a German-Mongolian dictionary (preliminary version). 20 Flaschen à 8 Euro, 20 bottles at 8 Euros each Task: Find relevant collocations and fixed expressions containing à. Procedure: 1) Retrieve concordances from a smaller corpus (AntConc with part of the German corpus from the Leipzig Corpus Collection). 2) Carry out co-occurrence analysis (CCDB, Deutsches Referenzkorpus ). Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 10] 5
6 8.4 Searching for collocations Concordances for à in a 1-million-RW selection of the German corpus within the LCC Fixed expression à la, after the fashion of (5 out of 10 hits) Fixed expression peu à peu, bit by bit (1 out of 10 hits) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 11] Co-occurrence analysis on the basis of the Deutsches Referenzkorpus (based on 2 bn. RW); COSMAS II WWW interface 1 Mongolia / Languages 2 Publishing dictionaries la as the most siginificant cooccurrence partner of à 3 Corpus linguistics (log likelihood ratio: 4 Improving ) dictionaries 5 Outlook Both collocations, à la and peu à peu are missing in the dictionary. peu as the second most siginificant co-occurrence partner of à (log likelihood ratio: 15974) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 12] 6
7 VICOMTE Kookkurrenzexplorer i) primary and secondary co-occurrence partner diagramed Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 13] ii) Co-occurrence partners can be annotated Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 14] 7
8 iii) co-occurrencepartners can be grouped Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 15] Perkuhn, Rainer: Systematic Exploration of Collocation Profiles. In: Proceedings of 4th Corpus Linguistics 2007, Birmingham. aper/132_paper.pdf. iv) Refinement of description Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 16] 8
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationMaking a Dictionary in Ulaanbaatar:
Making a Dictionary in Ulaanbaatar: Corpus-based Lexicography with Limited Financial and Technical Resources Stefan Engelberg (Institut für Deutsche Sprache & Universität Mannheim) Stefan Engelberg (IDS
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationThe Use of Text Corpora in Lexical Research
The Use of Text Corpora in Lexical Research Stefan Engelberg Workshop, Universitatea din Bucureşti, November 2008 http://www.ids-mannheim.de/ll/lehre/engelberg/ Webseite_CorpLex/CorpLex.html engelberg@ids-mannheim.de
More informationUsing German corpora for linguistic purposes. Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim
Using German corpora for linguistic purposes Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim Introduction This talk will give a first impression of the complex field of German corpora and methods
More informationBrauchen die Digital Humanities eine eigene Methodologie?
Deutsche DH, Passau 26.03.2014 Brauchen die Digital Humanities eine eigene Methodologie? 26. März 2014 Heyer / Niekler / Wiedemann 1 Übersicht Aspekte der Operationalisierung geistes- und sozialwissenschaftlicher
More informationComplex Predications in Argument Structure Alternations
Complex Predications in Argument Structure Alternations Stefan Engelberg (Institut für Deutsche Sprache & University of Mannheim) Stefan Engelberg (IDS Mannheim), Universitatea din Bucureşti, November
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationCorpus-driven study of multi-word expressions based on collocations from a very large corpus
Corpus-driven study of multi-word expressions based on collocations from a very large corpus Annelen Brunner and Dr Kathrin Steyer Project Usuelle Wortverbindungen Institute for the German Language, Mannheim
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationWhat Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project
Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute
More informationClever Search: A WordNet Based Wrapper for Internet Search Engines
Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationReal-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
More informationSimple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationThomas Ragni (Seco, CH): SAPS for choosing effective measures in Switzerland SAPS. Statistically Assisted Program Selection
Thomas Ragni (Seco, CH): SAPS for choosing effective measures in Switzerland Slide 1 SAPS Statistically Assisted Program Selection A Targeting System of Swiss Active Labor Market Policies (ALMPs) Slide
More informationA Dictionary of Spoken Danish
A Dictionary of Spoken Danish Carsten Hansen & Martin H. Hansen The LANCHART Centre of Copenhagen Key words Lexicography, Speech Corpus, Pragmatics, Conversation Analysis 1. Introduction The purpose of
More informationSearch Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
More informationANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
More informationTranscription bottleneck of speech corpus exploitation
Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen
More informationAntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom
AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom Laurence Anthony Waseda University anthony@antlab.sci.waseda.ac.jp Abstract In this paper, I will
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationComputer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
More informationLINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationLocal Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de/phil/english/linguist
More informationLocal Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationWebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen firstname.lastname@uni-tuebingen.de Abstract This software
More informationFrom Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
More informationPumping up Moodle via Integrated Content Authoring, Sharing and Delivery Tools The Educanext LTI Case Study
Pumping up Moodle via Integrated Content Authoring, Sharing and Delivery Tools The Educanext LTI Case Study Bernd Simon, Michael Aram, Daniela Nösterer, Christoph Haberberger, Knowledge Markets Consulting
More informationCS 533: Natural Language. Word Prediction
CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed
More informationProjektgruppe. Information Extraction An Incomplete Overview
Projektgruppe Henning Wachsmuth Information Extraction An Incomplete Overview 12. Mai 2010 1 Einführungsvorträge Verfassen von Seminarvortrag und paper Prof. Dr. Gregor Engels, Donnerstag 15.4., 16h-18h
More informationMining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationBerlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
More informationSQS the world s leading specialist in software quality. sqs.com. SQS Testsuite. Overview
SQS the world s leading specialist in software quality sqs.com SQS Testsuite Overview Agenda Overview of SQS Testsuite Test Center Qallisto Test Process Automation (TPA) Test Case Specification (TCS) Dashboard
More informationThe Oxford Learner s Dictionary of Academic English
ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students
More informationIRIS - English-Irish Translation System
IRIS - English-Irish Translation System Mihael Arcan, Unit for Natural Language Processing of the Insight Centre for Data Analytics at the National University of Ireland, Galway Introduction about me,
More informationComputer-aided Document Indexing System
Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous
More informationChapter 7. Language models. Statistical Machine Translation
Chapter 7 Language models Statistical Machine Translation Language models Language models answer the question: How likely is a string of English words good English? Help with reordering p lm (the house
More informationSketch Engine. Sketch Engine. SRDANOVIĆ ERJAVEC Irena, Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine
Sketch Engine SRDANOVIĆ ERJAVEC Irena, Sketch Engine Sketch Engine Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine JpWaC 4 Web Sketch Engine 1. 1980 10 80 Kilgarriff & Rundell 2002 500 1,000
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationDie Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationA History of the «Concise Oxford Dictionary»
Lodz Studies in Language 34 A History of the «Concise Oxford Dictionary» Bearbeitet von Malgorzata Kaminska 1. Auflage 2014. Buch. 342 S. Hardcover ISBN 978 3 631 65268 8 Format (B x L): 14,8 x 21 cm Gewicht:
More informationProbability and statistical hypothesis testing. Holger Diessel holger.diessel@uni-jena.de
Probability and statistical hypothesis testing Holger Diessel holger.diessel@uni-jena.de Probability Two reasons why probability is important for the analysis of linguistic data: Joint and conditional
More informationHow To Rank Term And Collocation In A Newspaper
You Can t Beat Frequency (Unless You Use Linguistic Knowledge) A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter Udo Hahn Jena University Language & Information
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationReliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve
More informationTeaching terms: a corpus-based approach to terminology in ESP classes
Teaching terms: a corpus-based approach to terminology in ESP classes Maria João Cotter Lisbon School of Accountancy and Administration (ISCAL) (Portugal) Abstract This paper will build up on corpus linguistic
More informationTHE knowledge needed by software developers
SUBMITTED TO IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Extracting Development Tasks to Navigate Software Documentation Christoph Treude, Martin P. Robillard and Barthélémy Dagenais Abstract Knowledge
More informationChapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationOff-line (and On-line) Text Analysis for Computational Lexicography
Offline (and Online) Text Analysis for Computational Lexicography Von der PhilosophischHistorischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.)
More information3 rd Young Researcher s Day 2013
Einladung zum 3 rd Young Researcher s Day 2013 Nach zwei erfolgreichen Young Researcher s Days starten wir kurz vor dem Sommer in Runde drei. Frau Ingrid Schaumüller-Bichl und Herr Edgar Weippl laden ganz
More informationÜbungen zur Vorlesung Einführung in die Volkswirtschaftslehre VWL 1
Übungen zur Vorlesung Einführung in die Volkswirtschaftslehre VWL 1 Übungen Kapitel 31/38 Beat Spirig Aufgabe 31.4, UK capital outflow NCO = purchases of foreign assets by domestic residents purchases
More informationbound Pronouns
Bound and referential pronouns *with thanks to Birgit Bärnreuther, Christina Bergmann, Dominique Goltz, Stefan Hinterwimmer, MaikeKleemeyer, Peter König, Florian Krause, Marlene Meyer Peter Bosch Institute
More informationA Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts
A Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts Martin Scholz Friedrich-Alexander-University Erlangen-Nürnberg Digital Humanities Research Group Outline Motivation: information
More informationOutline. Learning relational nouns from corpora. Syntactic classes of relational nouns in German. Motivation. Data preparation Annotation Features
Outline Learning relational nouns from corpora Berthold Crysmann Explorations in syntactic government and subcategorisation, Cambridge September, 2 2011 1 2 preparation 3 1 Berthold Crysmann Learning relational
More informationA model for corpus-driven exploration and presentation of multi-word expressions
A model for corpus-driven exploration and presentation of multi-word expressions Annelen Brunner 1 and Kathrin Steyer 1 Institute for the German Language, Mannheim Abstract. In this paper we outline our
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad
More informationPBS CBW NLS IQ Enterprise Content Store
CBW NLS IQ Enterprise Content Store Solution for NetWeaver BW and on HANA Information Lifecycle Management in BW Content Information Lifecycle Management in BW...3 Strategic Partnership...4 Information
More informationGet the most value from your surveys with text analysis
PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That
More informationExtended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1
Extended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1 Jörg Haßler, Marcus Maurer & Thomas Holbach 1. Introduction Without any doubt, the
More informationProductions Management II
Productions Management II - Lecture 6 - Supply Chain Management I Lecture Supervisor: M.Tech. Amit Garg ga@fir.rwth-aachen.de Pontdriesch 14/16 Tel.: 47705-439 Objectives of Lecture on SCM Overview on
More informationGetting Off to a Good Start: Best Practices for Terminology
Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction
More informationTransforming and optimization of the supply chain to create value and secure growth and performance
Transforming and optimization of the supply chain to create value and secure growth and performance Niedersachsen Aviation, Jahresnetzwerktreffen Hannover, 10th December 2015 Today s storyboard Short introduction
More informationInsights into Six Decades of Scientific Practice
DTA-/CLARIN-D-Konferenz Historische Textkorpora für die Geistes- und Sozialwissenschaften Title Insights into Six Decades of Scientific Practice Speaker Coauthors Gerhard Heyer, NLP chair (heyer@informatik.uni-leipzig.de)
More informationLast Words. Googleology is bad science. Adam Kilgarriff Lexical Computing Ltd. and University of Sussex
Last Words Googleology is bad science Adam Kilgarriff Lexical Computing Ltd. and University of Sussex The web is enormous, free, immediately available, and largely linguistic. As we discover, on ever more
More informationA Swedish Grammar for Word Prediction
A Swedish Grammar for Word Prediction Ebba Gustavii and Eva Pettersson ebbag,evapet @stp.ling.uu.se Master s thesis in Computational Linguistics Språkteknologiprogrammet (Language Engineering Programme)
More informationDRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141. 5 Collocations
DRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141 5 Collocations COMPOSITIONALITY TERM TECHNICAL TERM TERMINOLOGICAL PHRASE A COLLOCATION is an expression consisting of two or more words
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationAn Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models
Dissertation (Ph.D. Thesis) An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models Christian Siefkes Disputationen: 16th February
More informationSQS-TEST /Professional
SQS the world s leading specialist in software quality sqs.com SQS-TEST /Professional Overview of SQS Testsuite Agenda Overview of SQS Testsuite SQS Test Center SQS Test Process Automation (TPA) SQS Test
More informationOn the use of antonyms and synonyms from a domain perspective
On the use of antonyms and synonyms from a domain perspective Debela Tesfaye IT PhD Program Addis Ababa University Addis Ababa, Ethiopia dabookoo@gmail.com Carita Paradis Centre for Languages and Literature
More informationIntelligent Systems: Three Practical Questions. Carsten Rother
Intelligent Systems: Three Practical Questions Carsten Rother 04/02/2015 Prüfungsfragen Nur vom zweiten Teil der Vorlesung (Dimitri Schlesinger, Carsten Rother) Drei Typen von Aufgaben: 1) Algorithmen
More informationc. hypermarkets d. supermarkets
http://www.logforum.net LogForum > Electronic Scientific Journal of Logistics < ISSN 1734-459X 2009 Vol. 5 Issue 2 No 1 SHELF READY PACKAGING IN CONSUMERS' OPINION Andrzej Korzeniowski The Poznan School
More informationThe Epistemic Dynamic Model: Developing a Theory of Tagging Systems
The Epistemic Dynamic Model: Developing a Theory of Tagging Systems Klaas Dellschaft klaasd@uni-koblenz.de Institut für Web Science and Technologies Universität Koblenz-Landau September 2012 Zur Erlangung
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationVerteilte Systeme 3. Dienstevermittlung
VS32 Slide 1 Verteilte Systeme 3. Dienstevermittlung 3.2 Prinzipien einer serviceorientierten Architektur (SOA) Sebastian Iwanowski FH Wedel VS32 Slide 2 Prinzipien einer SOA 1. Definitionen und Merkmale
More informationMaster-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok
Master-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok Curriculum 2008 Man kann zwischen zwei Schwerpunkten wählen: Interkulturelle
More informationMultilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013
Multilingual Term Extraction as a Service from Acrolinx Ben Gottesman Michael Klemme Acrolinx CHAT2013 Definitions term extraction: automatically identifying potential terms in a document (corpus) multilingual
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationEnhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com
More informationElena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)
Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna) VII Conference on Legal Translation, Court Interpreting and Comparative Legilinguistics Poznań, 28-30.06.2013
More information