Using German corpora for linguistic purposes. Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim
|
|
|
- Aron Boyd
- 10 years ago
- Views:
Transcription
1 Using German corpora for linguistic purposes Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim
2 Introduction This talk will give a first impression of the complex field of German corpora and methods of corpus analysis. Before starting your work with corpora, be aware what a method can accomplish and what not.
3 Introduction Often I notice that overly complicated methods are used where simply collecting and counting instances would have been enough. Large collections of data and powerful automatic tools sometimes lead to an overvaluation of quantitive data.
4 Introduction Sometimes, the allure of numbers and frequencies leads to methodological laziness. Even today, the quality of linguistic interpretation is the most important factor regarding the informative value of the analysis. Corpus linguistics has not diminished the importance of the old cultural technique of reading and interpreting texts.
5 Introduction Today, I will highlight some ways how corpora and tools can help us linguists to get a high quality prestructuring of data This is particularly useful for examining high frequency phenomena which are important for language use identifying phenomena, which are not obvious to us, e.g. hidden structures and patterns
6 Introduction Focus is not on corpora or tools which need expert knowledge or have to be downloaded those are primarily used for automatic natural language processing e.g. Wortschatz Leipzig or IMS Open Corpus Workbench (Stuttgart) or TIGER (Berlin) Instead: Corpora which are available online and free of charge for the "common linguist"
7 German Introductions to Corpus Linguistics Lemnitzer, Lothar/Zinsmeister, Heike (2010): Korpuslinguistik. Eine Einführung. 2., durchgesehene und aktualisierte Aufl. (= Narr Studienbücher). Tübingen Perkuhn, Rainer/Keibel, Holger/Kupietz, Marc (2012): Korpuslinguistik. (=UTB 3433) Paderborn.
8 German Corpus Linguistics Website Noah Bubenhofer ( ): Einführung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge.
9 A Short History of German Corpora Institut for German Language a pioneer in the German speaking area since mid-1960s (!) Compilation of electronic text databases ( -> today: German reference corpus DeReKo) Development of COSMAS I, first platform for corpus analysis in the German speaking area (early 1990s 2003)
10 A Short History of German Corpora Core corpus of the Digital Dictionary of the 20th century (Digitales Wörterbuch des 20. Jahrhunderts) at the Berlin-Brandenburgische Akademie der Wissenschaften; sponsored by the Deutsche Forschungsgemeinschaft DFG Since 2009 merged into C4 Corpus DWDS; Schweizer Textkorpus (Switzerland), Austrian Academic Corpus; Korpus Südtirol (South Tyrol) 80 million word tokens
11 Overview 1. German specialized corpora examples 2. German general reference corpora 1. DWDS 2. DeReKO 3. Methodological approaches 1. Consulting the corpus 2. Analysing the corpus statistical collocation analysis 4. Corpora and lexical ressources
12 German Specialized Corpora Spoken language: Database (DGD2) Archive Gesprochenes Deutsch (Spoken German) (IDS) Discourse analysis, Dialectology Dortmunder Chatkorpus
13 German Specialized Corpora Annotation: e.g. morpho-syntactically annotated corpora example: TIGER-Korpus (IMS Stuttgart) Language Learning: Learner corpora, errorannotated corpora example: FALKO (HU Berlin) Literature: Project Gutenberg; about free ebooks (online)
14 Specialized corpora at the IDS Author corpora: Goethe corpus Dialects: Zwirner corpus, including corpus of venaculars of the former Eastern territories Genre: parliamentary debates, biographical fiction Historical period: Wendekorpus (1989/90) about 3,3 million word tokens articles, leaflets, flyers, parliamentary proceedings, speeches, declarations usw. Medium: Wikipedia corpus
15 German General Reference Corpora
16 German General Reference Corpora Not compiled for a specific use or for answering specific research questions As general as possible in order to be useful for various language studies DWDS and DeReKo
17 DWDS corpus: in total: 2.5 billion; 1.8 billion word tokens publicly accessible (online and free) (several corpora) Core corpus: approx. 100 million word tokens Balanced in respect to time and genre (literature, journalistic prose, scientific texts, specialized texts (adverts, manuals etc.), spoken) Spans the 20th century Integrated with the DWDS Portal (dictionaries etc.)
18 The German Reference Corpus (DeReKo) and COSMAS II Institut für Deutsche Sprache, Mannheim (IDS)
19 The German Reference Corpus DeReKo 6,1 billion word tokens (status as of ) Contains written German language texts of the present and recent past The largest "primordial sample of contemporary German" world wide online and free, registration required (copyright) List of corpora
20 The German Reference Corpus DeReKo Contains only copyrighted material Dynamic corpus (continually updated) Option to create personal subcorpora with COSMAS II which can be tailored towards specific research questions
21 Deutsches Referenzkorpus am IDS mit über 5,4 Milliarden Wörtern (Stand ) die weltweit größte linguistisch motivierte Sammlung elektronischer Korpora mit geschriebenen deutschsprachigen Texten aus der Gegenwart und der neueren Vergangenheit belletristische, wissenschaftliche und populärwissenschaftliche Texte, eine große Zahl von Zeitungstexten sowie eine breite Palette weiterer Textarten -> Analysesystem COSMAS II German corpora for linguistic purposes
22 COSMAS II Corpus Search, Management and Analysis System Not a web search engine Language independent Free online access since 1993 Ca registered users from over 100 countries
23 Search window KWIC Full text
24 Result presentation sources / corpora chronological alphabetical (successor/predecessor of the search object) randomized sorting text genres topics collocations Export of results
25 Analytical approaches
26 Paradigms of corpus analysis Looking for answers to my questions in the corpus -> validation of a priori knowledge ('consulting') Finding new research questions in the corpus and interpreting those -> best case: generating new knowledge ('analysing')
27 Consulting the corpus Do specific language elements (e.g. morphemes, lexemes, multi-word units) occur at all and if they do, how often? Which usage based aspects of meaning can be identified? In which situations are they used? What is the typical base form in the corpus? Which variations can be found?
28 Consulting the corpus Discourse Globalisierung bedeutet (Globalization means) (Teubert 2006) Text type ( birthday textes; advertises) geistige Frische Regional Differences Samstag vs. Sonnabend Germany, Austria, Switzerland
29 Das Korpus befragen (corpus-based) Areale Besonderheit Grumbeere? auf freiem Fuß anzeigen? Schreibung? Blind date oder Blind Date oder? Diskurs? Besserwessi
30
31 Consulting the corpus Samstag is used in all German speaking areas Sonnabend is used almost exclusively in Germany Chronological (e.g. new lexems, multi word units) voll krass
32 Search strategies - Example Exclusionary searches Excluding hits that are not relevant Verifiying stability and variance S: Übung macht ART WITHOUT Meister Query: (&Übung /+w1 &machen) /+w1 (den ODER die ODER das)) &s0 &Meister S: macht den Meister WITHOUT Übung Query: (&machen /+w2 &Meister) %s0 &Übung
33 Patterns: Übung macht den X M11 Übung macht den Kegelmeister M99 Übung macht den Handball-Meister M99 Übung macht auch hier den Zaubermeister. RHZ11 A97 A00 A09 F99 Übung macht die Meisterin Übung macht Radioprediger Übung macht den Schützen Übung macht den Feuerwehrmann Übung macht den Gourmet linguistic purposes German corpora for
34 Patterns: X macht den Meister B06 Technik macht den Meister Tipps für Anfänger B07 Energie macht den Meister B07 Vorsicht macht den Meister. BVZ07 Die Praxis macht den Meister zu Schulbeginn E99 Doch erst Playoff macht den Meister. M00 Ob Profi oder Schnuppersportler - Training macht den Meister. linguistic purposes German corpora for
35 Other phenomena: Word formation Productivity in word formation *mentalität
36 Other phenomena: Grammar Search in a morpho-syntactically annotated corpus Relatively small in comparison with the whole corpus archive Adjektive - Kopf (in a subcorpus) All dative nouns followed by a dative relative pronoun within a span of three tokens maximum Query: MORPH(NOU dat) /+w3 MORPH(PRN rel dat)
37 Grammar Phenomena Plea for search in non-annotated corpora, even for grammatical research questions Completely abstract constructions not searchable, lexical anchor necessary BUT: Larger corpus size can lead to surprising results Example: all when without comma
38 Drowning in a flood of mass data? BUT The bigger the data set, the more overwhelming for humans Example Kopf
39 Collocation analysis at the IDS Cyril Belica: Statistische Kollokationsanalyse und Clustering. Korpuslinguistische Analysemethode Institut für Deutsche Sprache, Mannheim. Tutorial 2004: Short introduction to collocation analysis Cp. Perkuhn/Keibel/Kupietz (2012)
40 Teil 2 Praktische Übungen
41 Collocation analysis at the IDS Focusses on lexical cooccurrences Dynamically computed on the latest version of the corpus Flexible adjustment of parameters (e.g. span and position, granularity, functions word y/n) Computes not only word collocate pairs, but also hierarchical clusters and common syntagmatic patterns
42 Collocation cluster CA for Kopf
43 Interpreting Clusters Collocation clusters are only indicators for the contexts on which they are based Syntagmatic perspective is most important KWIC cluster Full text cluster
44 Collocation analysis at the IDS Collocations Phrasemes fixed syntagmatic structures fixed context patterns (access to meaning and common usage)
45 You shall know a word by the company it keeps (Firth 1957)
46 Usage clusters: semantical 'injury by external force' Kugel / gegen die Wand stoßen/geschlagen / Platzwunde am Kopf / verletzt / geschossen / an die Bande prallen / abgeschlagenen / Brustverletzung / Beule 'body part' Hals / Nacken / Bauch / Oberkörper / Arme 'symptoms of illness' Gliederschmerzen heiß
47 Usage clusters: phrasemes 'emotional state' mit hängenden Köpfen ('dejected') / mit kühlem Kopf ('level-headed') / mit hochrotem Kopf ('angry' 'embarassed') / mit gesenktem Kopf ('abashed')
48 Colloctions collocation patterns Mutual lexical fixedness Hals über Kopf ('rushed') (*X über Kopf; *Hals über X) Semantically restricted usage mit hochroten Kopf CA hochrot -> hochrot only with body parts (prototypical: Kopf) Productive collocation patterns strategischer Kopf / führende / kreative / beste Köpfe ('leader mastermind')
49 Context patterns Pragmatic Orality / colloquial speech in the corpus voll krass Usage of word classes, formulae, particles, sentence adverbs etc. Example: ernsthaft Discourse: Globalisierung
50 German collocation resources Pro: fast access Contra: no dynamic customization possible DWDS word profiles Collocations in Wortschatz Leipzig IDS- Collocation Database CCDB (Kookkurrenzdatenbank) Pre-analysed profiles of lemmas + KWIC Semantic proximity by comparing CA profiles (e.g. anscheinend vs. scheinbar)
51 Collocation analysis clustering typical contexts of usage is an analytical approach that is central for all kinds of linguistic research questions, if you interested in "language in use" (this can also be "syntax in use")
52 IDS linguistic applications Corpus-based grammar (grammis) Lexicon-grammar-interface: valency, argument structure and construction grammar DeReKo, IMS Workbench, other Spoken language: Variation des gesprochenen Deutsch: Standardsprache Alltagssprache"
53 IDS linguistic applications of CA Corpus-based and driven lexicology and lexicography OWID (e.g. elexiko; dictionary of modern german proverbs ) Multilingual Proverb-Online-Platform Fields of lexical pattern and phrasem-constructions -> Qualitative linguistic interpretation of collocation and syntagmatic profiles
54 Outlook
55 Integrative Platforms Authentic corpus data Qualitative Descriptions Lexical resources (e.g. collocation profiles and networks) Web (DWDS; OWID)
56 KorAP KorAP: The next generation corpus analysis platform of the Institute for German Language Replaces COSMAS II (but features will be reproduced) Extends the possiblities of individual corpus design (e.g. by topic, by text type) Several levels of linguistic annotation Basic and extended search functionality; faster
57 Thank you for your attention!
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]
The Use of Text Corpora in Lexical Research
The Use of Text Corpora in Lexical Research Stefan Engelberg Workshop, Universitatea din Bucureşti, November 2008 http://www.ids-mannheim.de/ll/lehre/engelberg/ Webseite_CorpLex/CorpLex.html [email protected]
How To Write A German Reference Corpus Of Computer Mediated Communication
DeRiK: A German Reference Corpus of Computer-Mediated Communication Michael Beißwenger 1, Maria Ermakova 2, Alexander Geyken 2, Lothar Lemnitzer 2, Angelika Storrer 1 1 Department of German Language and
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger, Eric Ehrhardt, Andrea Horbach, Harald Lüngen, Diana Steffen, Angelika Storrer
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Extracting translation relations for humanreadable dictionaries from bilingual text
Extracting translation relations for humanreadable dictionaries from bilingual text Overview 1. Company 2. Translate pro 12.1 and AutoLearn 3. Translation workflow 4. Extraction method 5. Extended
Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
LINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
Processing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg
Processing Dialogue-Based Data in the UIMA Framework Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg Overview Background Processing dialogue-based Data Conclusion Gnjatović, Kunze,
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
Complex Predications in Argument Structure Alternations
Complex Predications in Argument Structure Alternations Stefan Engelberg (Institut für Deutsche Sprache & University of Mannheim) Stefan Engelberg (IDS Mannheim), Universitatea din Bucureşti, November
What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project
Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
Pragmatic analysis of hotel websites in terms of interpersonal relationships. Theses of the PhD dissertation by. Kovács Péterné Dudás Andrea
Pragmatic analysis of hotel websites in terms of interpersonal relationships Theses of the PhD dissertation by Kovács Péterné Dudás Andrea Eötvös Loránd University Faculty of Humanities Doctoral School
Local Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de/phil/english/linguist
Local Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de
Transcription bottleneck of speech corpus exploitation
Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen
Using the BNC to create and develop educational materials and a website for learners of English
Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
EFL Learners Synonymous Errors: A Case Study of Glad and Happy
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and
CURRICULUM VITAE. M. Sc. Anne-Katharina Schiefele
CURRICULUM VITAE Address: Department of Clinical Psychology and Psychotherapy, University of Trier, 54286 Trier, Germany TEL 0049 (0)651 201 2882 E-mail: [email protected] Birthday: November 30, 1987
Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
An Introduction to TextGrid
An Introduction to TextGrid Philipp Vanscheidt (Universität Trier / Technische Universität Darmstadt) [email protected] Karl-Franzens-Universität Graz 19. September 2014 The times they are a changin
Master of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
Master-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok
Master-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok Curriculum 2008 Man kann zwischen zwei Schwerpunkten wählen: Interkulturelle
Course Content. The following course units will be offered:
The following course units will be offered: Research Methodology Textual Analysis and Practice Sociolinguistics: Critical Approaches Life writing World Englishes Digital Cultures Beyond the Post-colonial
LEJ Langenscheidt Berlin München Wien Zürich New York
Langenscheidt Deutsch in 30 Tagen German in 30 days Von Angelika G. Beck LEJ Langenscheidt Berlin München Wien Zürich New York I Contents Introduction Spelling and pronunciation Lesson 1 Im Flugzeug On
Checklist Use this checklist to find out how much English you already know. Grundstufe 1 (Common European Framework: A1 Level)
Der XL Test: Was können Sie schon? Schätzen Sie Ihre Sprachkenntnisse selbst ein! Sprache: Englisch Mit der folgenden e haben Sie die Möglichkeit, Ihre Fremdsprachenkenntnisse selbst einzuschätzen. Die
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger 1, Eric Ehrhardt 2, Andrea Horbach 3, Harald Lüngen 4, Diana Steffen 3, Angelika
Quantitative Text Typology The Impact of Sentence Length
Quantitative Text Typology The Impact of Sentence Length Emmerich Kelih 1, Peter Grzybek 1, Gordana Antić 2, and Ernst Stadlober 2 1 Department for Slavic Studies, University of Graz, A-8010 Graz, Merangasse
University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus
University of Massachusetts Boston Applied Linguistics Graduate Program APLING 601 Introduction to Linguistics Syllabus Course Description: This course examines the nature and origin of language, the history
Cultural Trends and language change
Cultural Trends and language change Gosse Bouma [email protected] Information Science University of Groningen NHL 2015/03 Gosse Bouma 1/25 Popularity of Wolf in English books Gosse Bouma 2/25 Google Books
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions. Natalia Levshina
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Natalia Levshina RU Quantitative Lexicology and Variational Linguistics Faculteit Letteren Subfaculteit Taalkunde K.U.Leuven
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
SAP Enterprise Portal 6.0 KM Platform Delta Features
SAP Enterprise Portal 6.0 KM Platform Delta Features Please see also the KM Platform feature list in http://service.sap.com/ep Product Management Operations Status: January 20th, 2004 Note: This presentation
NoSta-D: A Corpus of German Non-standard Varieties
NoSta-D: A Corpus of German Non-standard Varieties Stefanie Dipper 1, Anke Lüdeling 2, Marc Reznicek 2 Ruhr-Universität Bochum 1 Humboldt-Universität zu Berlin 2 Abstract Until recently, most research
stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
Exploiting Sign Language Corpora in Deaf Studies
Trinity College Dublin Exploiting Sign Language Corpora in Deaf Studies Lorraine Leeson Trinity College Dublin SLCN Network I Berlin I 4 December 2010 Overview Corpora: going beyond sign linguistics research
German Language Resource Packet
German has three features of word order than do not exist in English: 1. The main verb must be the second element in the independent clause. This often requires an inversion of subject and verb. For example:
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Course: German 1 Designated Six Weeks: Weeks 1 and 2. Assessment Vocabulary Instructional Strategies
(1) Communication. The student communicates using the skills of listening, speaking, reading, and writing. The student: (A) engages in oral and written exchanges of learned material to socialize and to
Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3
Yıl/Year: 2012 Cilt/Volume: 1 Sayı/Issue:2 Sayfalar/Pages: 40-47 Differences in linguistic and discourse features of narrative writing performance Abstract Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu
Insights into Six Decades of Scientific Practice
DTA-/CLARIN-D-Konferenz Historische Textkorpora für die Geistes- und Sozialwissenschaften Title Insights into Six Decades of Scientific Practice Speaker Coauthors Gerhard Heyer, NLP chair ([email protected])
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Berlin Berlin Buzzwords 2011, Dr. Christoph Goller, IntraFind AG Outline IntraFind AG Indexing Morphological
Electronic offprint from. baltic linguistics. Vol. 3, 2012
Electronic offprint from baltic linguistics Vol. 3, 2012 ISSN 2081-7533 Nɪᴄᴏʟᴇ Nᴀᴜ, A Short Grammar of Latgalian. (Languages of the World/Materials, 482.) München: ʟɪɴᴄᴏᴍ Europa, 2011, 119 pp. ɪѕʙɴ 978-3-86288-055-3.
Study Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
Download Check My Words from: http://mywords.ust.hk/cmw/
Grammar Checking Press the button on the Check My Words toolbar to see what common errors learners make with a word and to see all members of the word family. Press the Check button to check for common
ICAME Journal No. 24. Reviews
ICAME Journal No. 24 Reviews Collins COBUILD Grammar Patterns 2: Nouns and Adjectives, edited by Gill Francis, Susan Hunston, andelizabeth Manning, withjohn Sinclair as the founding editor-in-chief of
in Language, Culture, and Communication
22 April 2013 Study Plan M. A. Degree in Language, Culture, and Communication Linguistics Department 2012/2013 Faculty of Foreign Languages - Jordan University 1 STUDY PLAN M. A. DEGREE IN LANGUAGE, CULTURE
Multilingual and mixed-lingual TTS applications
Multilingual and mixed-lingual TTS applications LangTech 2003 November 24, 2003 Simona Fina, Manager Linguistics Real-life texts need mixed-lingual analysis Agenda Short presentation of SVOX Challenges
Services supply chain management and organisational performance
Services supply chain management and organisational performance Irène Kilubi Services supply chain management and organisational performance An exploratory mixed-method investigation of service and manufacturing
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France [email protected] Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Breatling the Meaning of Tag Sets in CMC corpora
An extended tag set for annotating parts of speech in CMC corpora Thomas Bartz 1, Michael Beißwenger 1, Eric Ehrhardt 2, Angelika Storrer 2 1) 2) International Research Days: Social Media and CMC Corpora
The Rise of Documentary Linguistics and a New Kind of Corpus
The Rise of Documentary Linguistics and a New Kind of Corpus Gary F. Simons SIL International 5th National Natural Language Research Symposium De La Salle University, Manila, 25 Nov 2008 Milestones in
Declarative Parsing and Annotation of Electronic Dictionaries
Declarative Parsing and Annotation of Electronic Dictionaries Christian Schneiker 1, Dietmar Seipel 1, Werner Wegstein 2, and Klaus Prätor 3 1 Department of Computer Science {schneiker seipel}@informatik.uni-wuerzburg.de
CURRICULUM VITAE SILKE BRANDT
CURRICULUM VITAE SILKE BRANDT CONTACT Silke Brandt, PhD English Department Nadelberg 6 CH-4051 Basel Switzerland [email protected] POSITIONS 2011-present Postdoctoral researcher English Department
A History of the «Concise Oxford Dictionary»
Lodz Studies in Language 34 A History of the «Concise Oxford Dictionary» Bearbeitet von Malgorzata Kaminska 1. Auflage 2014. Buch. 342 S. Hardcover ISBN 978 3 631 65268 8 Format (B x L): 14,8 x 21 cm Gewicht:
DiaCollo: On the trail of diachronic collocations
DiaCollo: On the trail of diachronic collocations Bryan Jurish [email protected] AG Elektronisches Publizieren Historische Semantik und Semantic Web Heidelberger Akademie der Wissenschaften 14 th 16 th September,
CLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University
Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University 1. Academic Program Master of Arts Program in Linguistics for Communication
Security Vendor Benchmark 2016 A Comparison of Security Vendors and Service Providers
A Comparison of Security Vendors and Service Providers Information Security and Data Protection An Die Overview digitale Welt of the wird German Realität. and Mit Swiss jedem Security Tag Competitive ein
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
BACKUP EAGLE. Release Notes. Version: 6.1.1.16 Date: 11/25/2011
BACKUP EAGLE Release Notes Version: 6.1.1.16 Date: 11/25/2011 Schmitz RZ Consult GmbH BACKUP EAGLE Release Notes Seite 1 von 7 Date 11/29/2011 Contents 1. New Features... 3 1.1. Configurable automatically
Department of English. University of Innsbruck
Department of English University of Innsbruck Welcome! Welcome to the Department of English at the University of Innsbruck! Founded in 1898, our department is one of the oldest English departments in Austria.
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Level 2 German, 2014
91123 911230 2SUPERVISOR S Level 2 German, 2014 91123 Demonstrate understanding of a variety of spoken German texts on familiar matters 9.30 am Wednesday 12 November 2014 Credits: Five Achievement Achievement
Big Data Vendor Benchmark 2015 A Comparison of Hardware Vendors, Software Vendors and Service Providers
A Comparison of Hardware Vendors, Software Vendors and Service Providers The digital world is becoming a reality. Mit jedem Tag ein bisschen mehr. ECommerce, Online- Werbung, mobile Applikationen und soziale
SHORT, August 2015. THE KLEINE ZEITUNG INTRODUCES ITSELF. From the two-shilling daily to a multimedia brand
SHORT, August 2015 THE KLEINE ZEITUNG INTRODUCES ITSELF. From the two-shilling daily to a multimedia brand KLEINE ZEITUNG A CONSTANT PRESENCE IN THE WORLD OF MEDIA SINCE 111 YEARS INDEPENDENCE The Kleine
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Reference Books. (1) English-English Dictionaries. Fiona Ross FindYourFeet.de
Reference Books This handout originated many years ago in response to requests from students, most of them at Konstanz University. Students from many different departments asked me for advice on dictionaries,
Working Paper Series des Rates für Sozial- und Wirtschaftsdaten, No. 163
econstor www.econstor.eu Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Wilkinson,
