Using German corpora for linguistic purposes. Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim
|
|
- Aron Boyd
- 8 years ago
- Views:
Transcription
1 Using German corpora for linguistic purposes Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim
2 Introduction This talk will give a first impression of the complex field of German corpora and methods of corpus analysis. Before starting your work with corpora, be aware what a method can accomplish and what not.
3 Introduction Often I notice that overly complicated methods are used where simply collecting and counting instances would have been enough. Large collections of data and powerful automatic tools sometimes lead to an overvaluation of quantitive data.
4 Introduction Sometimes, the allure of numbers and frequencies leads to methodological laziness. Even today, the quality of linguistic interpretation is the most important factor regarding the informative value of the analysis. Corpus linguistics has not diminished the importance of the old cultural technique of reading and interpreting texts.
5 Introduction Today, I will highlight some ways how corpora and tools can help us linguists to get a high quality prestructuring of data This is particularly useful for examining high frequency phenomena which are important for language use identifying phenomena, which are not obvious to us, e.g. hidden structures and patterns
6 Introduction Focus is not on corpora or tools which need expert knowledge or have to be downloaded those are primarily used for automatic natural language processing e.g. Wortschatz Leipzig or IMS Open Corpus Workbench (Stuttgart) or TIGER (Berlin) Instead: Corpora which are available online and free of charge for the "common linguist"
7 German Introductions to Corpus Linguistics Lemnitzer, Lothar/Zinsmeister, Heike (2010): Korpuslinguistik. Eine Einführung. 2., durchgesehene und aktualisierte Aufl. (= Narr Studienbücher). Tübingen Perkuhn, Rainer/Keibel, Holger/Kupietz, Marc (2012): Korpuslinguistik. (=UTB 3433) Paderborn.
8 German Corpus Linguistics Website Noah Bubenhofer ( ): Einführung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge.
9 A Short History of German Corpora Institut for German Language a pioneer in the German speaking area since mid-1960s (!) Compilation of electronic text databases ( -> today: German reference corpus DeReKo) Development of COSMAS I, first platform for corpus analysis in the German speaking area (early 1990s 2003)
10 A Short History of German Corpora Core corpus of the Digital Dictionary of the 20th century (Digitales Wörterbuch des 20. Jahrhunderts) at the Berlin-Brandenburgische Akademie der Wissenschaften; sponsored by the Deutsche Forschungsgemeinschaft DFG Since 2009 merged into C4 Corpus DWDS; Schweizer Textkorpus (Switzerland), Austrian Academic Corpus; Korpus Südtirol (South Tyrol) 80 million word tokens
11 Overview 1. German specialized corpora examples 2. German general reference corpora 1. DWDS 2. DeReKO 3. Methodological approaches 1. Consulting the corpus 2. Analysing the corpus statistical collocation analysis 4. Corpora and lexical ressources
12 German Specialized Corpora Spoken language: Database (DGD2) Archive Gesprochenes Deutsch (Spoken German) (IDS) Discourse analysis, Dialectology Dortmunder Chatkorpus
13 German Specialized Corpora Annotation: e.g. morpho-syntactically annotated corpora example: TIGER-Korpus (IMS Stuttgart) Language Learning: Learner corpora, errorannotated corpora example: FALKO (HU Berlin) Literature: Project Gutenberg; about free ebooks (online)
14 Specialized corpora at the IDS Author corpora: Goethe corpus Dialects: Zwirner corpus, including corpus of venaculars of the former Eastern territories Genre: parliamentary debates, biographical fiction Historical period: Wendekorpus (1989/90) about 3,3 million word tokens articles, leaflets, flyers, parliamentary proceedings, speeches, declarations usw. Medium: Wikipedia corpus
15 German General Reference Corpora
16 German General Reference Corpora Not compiled for a specific use or for answering specific research questions As general as possible in order to be useful for various language studies DWDS and DeReKo
17 DWDS corpus: in total: 2.5 billion; 1.8 billion word tokens publicly accessible (online and free) (several corpora) Core corpus: approx. 100 million word tokens Balanced in respect to time and genre (literature, journalistic prose, scientific texts, specialized texts (adverts, manuals etc.), spoken) Spans the 20th century Integrated with the DWDS Portal (dictionaries etc.)
18 The German Reference Corpus (DeReKo) and COSMAS II Institut für Deutsche Sprache, Mannheim (IDS)
19 The German Reference Corpus DeReKo 6,1 billion word tokens (status as of ) Contains written German language texts of the present and recent past The largest "primordial sample of contemporary German" world wide online and free, registration required (copyright) List of corpora
20 The German Reference Corpus DeReKo Contains only copyrighted material Dynamic corpus (continually updated) Option to create personal subcorpora with COSMAS II which can be tailored towards specific research questions
21 Deutsches Referenzkorpus am IDS mit über 5,4 Milliarden Wörtern (Stand ) die weltweit größte linguistisch motivierte Sammlung elektronischer Korpora mit geschriebenen deutschsprachigen Texten aus der Gegenwart und der neueren Vergangenheit belletristische, wissenschaftliche und populärwissenschaftliche Texte, eine große Zahl von Zeitungstexten sowie eine breite Palette weiterer Textarten -> Analysesystem COSMAS II German corpora for linguistic purposes
22 COSMAS II Corpus Search, Management and Analysis System Not a web search engine Language independent Free online access since 1993 Ca registered users from over 100 countries
23 Search window KWIC Full text
24 Result presentation sources / corpora chronological alphabetical (successor/predecessor of the search object) randomized sorting text genres topics collocations Export of results
25 Analytical approaches
26 Paradigms of corpus analysis Looking for answers to my questions in the corpus -> validation of a priori knowledge ('consulting') Finding new research questions in the corpus and interpreting those -> best case: generating new knowledge ('analysing')
27 Consulting the corpus Do specific language elements (e.g. morphemes, lexemes, multi-word units) occur at all and if they do, how often? Which usage based aspects of meaning can be identified? In which situations are they used? What is the typical base form in the corpus? Which variations can be found?
28 Consulting the corpus Discourse Globalisierung bedeutet (Globalization means) (Teubert 2006) Text type ( birthday textes; advertises) geistige Frische Regional Differences Samstag vs. Sonnabend Germany, Austria, Switzerland
29 Das Korpus befragen (corpus-based) Areale Besonderheit Grumbeere? auf freiem Fuß anzeigen? Schreibung? Blind date oder Blind Date oder? Diskurs? Besserwessi
30
31 Consulting the corpus Samstag is used in all German speaking areas Sonnabend is used almost exclusively in Germany Chronological (e.g. new lexems, multi word units) voll krass
32 Search strategies - Example Exclusionary searches Excluding hits that are not relevant Verifiying stability and variance S: Übung macht ART WITHOUT Meister Query: (&Übung /+w1 &machen) /+w1 (den ODER die ODER das)) &s0 &Meister S: macht den Meister WITHOUT Übung Query: (&machen /+w2 &Meister) %s0 &Übung
33 Patterns: Übung macht den X M11 Übung macht den Kegelmeister M99 Übung macht den Handball-Meister M99 Übung macht auch hier den Zaubermeister. RHZ11 A97 A00 A09 F99 Übung macht die Meisterin Übung macht Radioprediger Übung macht den Schützen Übung macht den Feuerwehrmann Übung macht den Gourmet linguistic purposes German corpora for
34 Patterns: X macht den Meister B06 Technik macht den Meister Tipps für Anfänger B07 Energie macht den Meister B07 Vorsicht macht den Meister. BVZ07 Die Praxis macht den Meister zu Schulbeginn E99 Doch erst Playoff macht den Meister. M00 Ob Profi oder Schnuppersportler - Training macht den Meister. linguistic purposes German corpora for
35 Other phenomena: Word formation Productivity in word formation *mentalität
36 Other phenomena: Grammar Search in a morpho-syntactically annotated corpus Relatively small in comparison with the whole corpus archive Adjektive - Kopf (in a subcorpus) All dative nouns followed by a dative relative pronoun within a span of three tokens maximum Query: MORPH(NOU dat) /+w3 MORPH(PRN rel dat)
37 Grammar Phenomena Plea for search in non-annotated corpora, even for grammatical research questions Completely abstract constructions not searchable, lexical anchor necessary BUT: Larger corpus size can lead to surprising results Example: all when without comma
38 Drowning in a flood of mass data? BUT The bigger the data set, the more overwhelming for humans Example Kopf
39 Collocation analysis at the IDS Cyril Belica: Statistische Kollokationsanalyse und Clustering. Korpuslinguistische Analysemethode Institut für Deutsche Sprache, Mannheim. Tutorial 2004: Short introduction to collocation analysis Cp. Perkuhn/Keibel/Kupietz (2012)
40 Teil 2 Praktische Übungen
41 Collocation analysis at the IDS Focusses on lexical cooccurrences Dynamically computed on the latest version of the corpus Flexible adjustment of parameters (e.g. span and position, granularity, functions word y/n) Computes not only word collocate pairs, but also hierarchical clusters and common syntagmatic patterns
42 Collocation cluster CA for Kopf
43 Interpreting Clusters Collocation clusters are only indicators for the contexts on which they are based Syntagmatic perspective is most important KWIC cluster Full text cluster
44 Collocation analysis at the IDS Collocations Phrasemes fixed syntagmatic structures fixed context patterns (access to meaning and common usage)
45 You shall know a word by the company it keeps (Firth 1957)
46 Usage clusters: semantical 'injury by external force' Kugel / gegen die Wand stoßen/geschlagen / Platzwunde am Kopf / verletzt / geschossen / an die Bande prallen / abgeschlagenen / Brustverletzung / Beule 'body part' Hals / Nacken / Bauch / Oberkörper / Arme 'symptoms of illness' Gliederschmerzen heiß
47 Usage clusters: phrasemes 'emotional state' mit hängenden Köpfen ('dejected') / mit kühlem Kopf ('level-headed') / mit hochrotem Kopf ('angry' 'embarassed') / mit gesenktem Kopf ('abashed')
48 Colloctions collocation patterns Mutual lexical fixedness Hals über Kopf ('rushed') (*X über Kopf; *Hals über X) Semantically restricted usage mit hochroten Kopf CA hochrot -> hochrot only with body parts (prototypical: Kopf) Productive collocation patterns strategischer Kopf / führende / kreative / beste Köpfe ('leader mastermind')
49 Context patterns Pragmatic Orality / colloquial speech in the corpus voll krass Usage of word classes, formulae, particles, sentence adverbs etc. Example: ernsthaft Discourse: Globalisierung
50 German collocation resources Pro: fast access Contra: no dynamic customization possible DWDS word profiles Collocations in Wortschatz Leipzig IDS- Collocation Database CCDB (Kookkurrenzdatenbank) Pre-analysed profiles of lemmas + KWIC Semantic proximity by comparing CA profiles (e.g. anscheinend vs. scheinbar)
51 Collocation analysis clustering typical contexts of usage is an analytical approach that is central for all kinds of linguistic research questions, if you interested in "language in use" (this can also be "syntax in use")
52 IDS linguistic applications Corpus-based grammar (grammis) Lexicon-grammar-interface: valency, argument structure and construction grammar DeReKo, IMS Workbench, other Spoken language: Variation des gesprochenen Deutsch: Standardsprache Alltagssprache"
53 IDS linguistic applications of CA Corpus-based and driven lexicology and lexicography OWID (e.g. elexiko; dictionary of modern german proverbs ) Multilingual Proverb-Online-Platform Fields of lexical pattern and phrasem-constructions -> Qualitative linguistic interpretation of collocation and syntagmatic profiles
54 Outlook
55 Integrative Platforms Authentic corpus data Qualitative Descriptions Lexical resources (e.g. collocation profiles and networks) Web (DWDS; OWID)
56 KorAP KorAP: The next generation corpus analysis platform of the Institute for German Language Replaces COSMAS II (but features will be reproduced) Extends the possiblities of individual corpus design (e.g. by topic, by text type) Several levels of linguistic annotation Basic and extended search functionality; faster
57 Thank you for your attention!
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationCorpus-driven study of multi-word expressions based on collocations from a very large corpus
Corpus-driven study of multi-word expressions based on collocations from a very large corpus Annelen Brunner and Dr Kathrin Steyer Project Usuelle Wortverbindungen Institute for the German Language, Mannheim
More informationA model for corpus-driven exploration and presentation of multi-word expressions
A model for corpus-driven exploration and presentation of multi-word expressions Annelen Brunner 1 and Kathrin Steyer 1 Institute for the German Language, Mannheim Abstract. In this paper we outline our
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationThe Use of Text Corpora in Lexical Research
The Use of Text Corpora in Lexical Research Stefan Engelberg Workshop, Universitatea din Bucureşti, November 2008 http://www.ids-mannheim.de/ll/lehre/engelberg/ Webseite_CorpLex/CorpLex.html engelberg@ids-mannheim.de
More informationHow To Write A German Reference Corpus Of Computer Mediated Communication
DeRiK: A German Reference Corpus of Computer-Mediated Communication Michael Beißwenger 1, Maria Ermakova 2, Alexander Geyken 2, Lothar Lemnitzer 2, Angelika Storrer 1 1 Department of German Language and
More informationAdding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger, Eric Ehrhardt, Andrea Horbach, Harald Lüngen, Diana Steffen, Angelika Storrer
More informationBerlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
More informationA Dictionary of Spoken Danish
A Dictionary of Spoken Danish Carsten Hansen & Martin H. Hansen The LANCHART Centre of Copenhagen Key words Lexicography, Speech Corpus, Pragmatics, Conversation Analysis 1. Introduction The purpose of
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationBrauchen die Digital Humanities eine eigene Methodologie?
Deutsche DH, Passau 26.03.2014 Brauchen die Digital Humanities eine eigene Methodologie? 26. März 2014 Heyer / Niekler / Wiedemann 1 Übersicht Aspekte der Operationalisierung geistes- und sozialwissenschaftlicher
More informationExtracting translation relations for humanreadable dictionaries from bilingual text
Extracting translation relations for humanreadable dictionaries from bilingual text Overview 1. Company 2. Translate pro 12.1 and AutoLearn 3. Translation workflow 4. Extraction method 5. Extended
More informationCorpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
More informationLINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
More informationProcessing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg
Processing Dialogue-Based Data in the UIMA Framework Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg Overview Background Processing dialogue-based Data Conclusion Gnjatović, Kunze,
More informationEnabling a data management system to support the good laboratory practice Masterthesis Status Report Miriam Ney (13.01.
Enabling a data management system to support the good laboratory practice Masterthesis Status Report Miriam Ney (13.01.2011) Folie 1 Statusreport Masterthesis > Miriam Ney > 13.01.2011 Overview Description
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationComplex Predications in Argument Structure Alternations
Complex Predications in Argument Structure Alternations Stefan Engelberg (Institut für Deutsche Sprache & University of Mannheim) Stefan Engelberg (IDS Mannheim), Universitatea din Bucureşti, November
More informationWhat Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project
Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute
More informationSimple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
More informationPragmatic analysis of hotel websites in terms of interpersonal relationships. Theses of the PhD dissertation by. Kovács Péterné Dudás Andrea
Pragmatic analysis of hotel websites in terms of interpersonal relationships Theses of the PhD dissertation by Kovács Péterné Dudás Andrea Eötvös Loránd University Faculty of Humanities Doctoral School
More informationExtended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1
Extended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1 Jörg Haßler, Marcus Maurer & Thomas Holbach 1. Introduction Without any doubt, the
More informationLocal Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de/phil/english/linguist
More informationCOURSE PRESENTATION FORM ACADEMIC YEAR 2013
COURSE PRESENTATION FORM ACADEMIC YEAR 2013 COURSE NAME Presentation, Communication & Scientific Writing COURSE CODE 75024 LECTURERS Johannes Mahlknecht, Mario Klarer TEACHING ASSISTANT -- TEACHING LANGUAGE
More informationLocal Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de
More informationThe Database for Spoken German DGD2
The Database for Spoken German DGD2 Thomas Schmidt Institut für Deutsche Sprache R5, 6-13, D-68161 Mannheim E-mail: thomas.schmidt@ids-mannheim.de Abstract The Database for Spoken German (Datenbank für
More informationTranscription bottleneck of speech corpus exploitation
Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen
More informationUsing the BNC to create and develop educational materials and a website for learners of English
Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University
More informationReal-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
More informationEFL Learners Synonymous Errors: A Case Study of Glad and Happy
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and
More informationCURRICULUM VITAE. M. Sc. Anne-Katharina Schiefele
CURRICULUM VITAE Address: Department of Clinical Psychology and Psychotherapy, University of Trier, 54286 Trier, Germany TEL 0049 (0)651 201 2882 E-mail: schiefele@uni-trier.de Birthday: November 30, 1987
More informationSearch Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
More informationAn Introduction to TextGrid
An Introduction to TextGrid Philipp Vanscheidt (Universität Trier / Technische Universität Darmstadt) pvanscheidt@uni-trier.de Karl-Franzens-Universität Graz 19. September 2014 The times they are a changin
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationMaster-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok
Master-Programm Deutsch als Fremdsprache (Master of Arts Program in German as a Foreign Language) an der Ramkhamhaeng Universität/Bangkok Curriculum 2008 Man kann zwischen zwei Schwerpunkten wählen: Interkulturelle
More informationCourse Content. The following course units will be offered:
The following course units will be offered: Research Methodology Textual Analysis and Practice Sociolinguistics: Critical Approaches Life writing World Englishes Digital Cultures Beyond the Post-colonial
More informationData at the SFB "Mehrsprachigkeit"
1 Workshop on multilingual data, 08 July 2003 MULTILINGUAL DATABASE: Obstacles and Opportunities Thomas Schmidt, Project Zb Data at the SFB "Mehrsprachigkeit" K1: Japanese and German expert discourse in
More informationLEJ Langenscheidt Berlin München Wien Zürich New York
Langenscheidt Deutsch in 30 Tagen German in 30 days Von Angelika G. Beck LEJ Langenscheidt Berlin München Wien Zürich New York I Contents Introduction Spelling and pronunciation Lesson 1 Im Flugzeug On
More informationChecklist Use this checklist to find out how much English you already know. Grundstufe 1 (Common European Framework: A1 Level)
Der XL Test: Was können Sie schon? Schätzen Sie Ihre Sprachkenntnisse selbst ein! Sprache: Englisch Mit der folgenden e haben Sie die Möglichkeit, Ihre Fremdsprachenkenntnisse selbst einzuschätzen. Die
More informationAdding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger 1, Eric Ehrhardt 2, Andrea Horbach 3, Harald Lüngen 4, Diana Steffen 3, Angelika
More informationQuantitative Text Typology The Impact of Sentence Length
Quantitative Text Typology The Impact of Sentence Length Emmerich Kelih 1, Peter Grzybek 1, Gordana Antić 2, and Ernst Stadlober 2 1 Department for Slavic Studies, University of Graz, A-8010 Graz, Merangasse
More informationUniversity of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus
University of Massachusetts Boston Applied Linguistics Graduate Program APLING 601 Introduction to Linguistics Syllabus Course Description: This course examines the nature and origin of language, the history
More informationCultural Trends and language change
Cultural Trends and language change Gosse Bouma g.bouma@rug.nl Information Science University of Groningen NHL 2015/03 Gosse Bouma 1/25 Popularity of Wolf in English books Gosse Bouma 2/25 Google Books
More informationDoe wat je niet laten kan: A usage-based analysis of Dutch causative constructions. Natalia Levshina
Doe wat je niet laten kan: A usage-based analysis of Dutch causative constructions Natalia Levshina RU Quantitative Lexicology and Variational Linguistics Faculteit Letteren Subfaculteit Taalkunde K.U.Leuven
More informationThe PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History
More informationSAP Enterprise Portal 6.0 KM Platform Delta Features
SAP Enterprise Portal 6.0 KM Platform Delta Features Please see also the KM Platform feature list in http://service.sap.com/ep Product Management Operations Status: January 20th, 2004 Note: This presentation
More informationNoSta-D: A Corpus of German Non-standard Varieties
NoSta-D: A Corpus of German Non-standard Varieties Stefanie Dipper 1, Anke Lüdeling 2, Marc Reznicek 2 Ruhr-Universität Bochum 1 Humboldt-Universität zu Berlin 2 Abstract Until recently, most research
More informationOff-line (and On-line) Text Analysis for Computational Lexicography
Offline (and Online) Text Analysis for Computational Lexicography Von der PhilosophischHistorischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.)
More informationstress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
More informationContent Management in Web Based Education
Content Management in Web Based Education Thomas Kleinberger tecmath AG Sauerwiesen 2 67661 Kaiserslautern Germany Email: kleinberger@cms.tecmath.com Paul Müller University of Kaiserslautern Department
More informationExploiting Sign Language Corpora in Deaf Studies
Trinity College Dublin Exploiting Sign Language Corpora in Deaf Studies Lorraine Leeson Trinity College Dublin SLCN Network I Berlin I 4 December 2010 Overview Corpora: going beyond sign linguistics research
More informationDAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR
DAM-LR at the INL Archive Formation and Local INL Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl Introducing Remco van Veenendaal Project manager DAM-LR Acting project manager Dutch HLT Agency
More informationGerman Language Resource Packet
German has three features of word order than do not exist in English: 1. The main verb must be the second element in the independent clause. This often requires an inversion of subject and verb. For example:
More informationWebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen firstname.lastname@uni-tuebingen.de Abstract This software
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationCourse: German 1 Designated Six Weeks: Weeks 1 and 2. Assessment Vocabulary Instructional Strategies
(1) Communication. The student communicates using the skills of listening, speaking, reading, and writing. The student: (A) engages in oral and written exchanges of learned material to socialize and to
More informationDifferences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3
Yıl/Year: 2012 Cilt/Volume: 1 Sayı/Issue:2 Sayfalar/Pages: 40-47 Differences in linguistic and discourse features of narrative writing performance Abstract Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu
More informationEnabling a data management system to support the good laboratory practice Master Thesis Final Report Miriam Ney (09.06.2011)
Enabling a data management system to support the good laboratory practice Master Thesis Final Report Miriam Ney (09.06.2011) Overview Description of Task Phase 1: Requirements Analysis Good Laboratory
More informationWorking Paper Series. RatSWD. Working Paper No. 127. Potential and availability of market research data for empirical social and economic research
RatSWD Working Paper Series Working Paper No. 127 Potential and availability of market research data for empirical social and economic research Erich Wiegand August 2009 Working Paper Series of the Council
More informationInsights into Six Decades of Scientific Practice
DTA-/CLARIN-D-Konferenz Historische Textkorpora für die Geistes- und Sozialwissenschaften Title Insights into Six Decades of Scientific Practice Speaker Coauthors Gerhard Heyer, NLP chair (heyer@informatik.uni-leipzig.de)
More informationMorphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Berlin Berlin Buzzwords 2011, Dr. Christoph Goller, IntraFind AG Outline IntraFind AG Indexing Morphological
More informationElectronic offprint from. baltic linguistics. Vol. 3, 2012
Electronic offprint from baltic linguistics Vol. 3, 2012 ISSN 2081-7533 Nɪᴄᴏʟᴇ Nᴀᴜ, A Short Grammar of Latgalian. (Languages of the World/Materials, 482.) München: ʟɪɴᴄᴏᴍ Europa, 2011, 119 pp. ɪѕʙɴ 978-3-86288-055-3.
More informationStudy Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
More information(A) DESNET (DEmand & Supply NETwork) Identification. Identification
V-LAB-Instruction Ver 4.0.doc (A) DESNET (DEmand & Supply NETwork) Identification Name RPD-Tech 2 Address Web site E - mail Coachulting, Johanniterstrasse 36, D-73207 Plochingen www.coachulting.de info@coachulting.de
More information1 von 91 RMS WiSe 2014/15/Academic Working/Seiten/Startseite
1 von 91 RMS WiSe 2014/15/Academic Working/Seiten/Startseite 2 von 91 RMS WiSe 2014/15/Academic Working/Seiten/Abstract 3 von 91 RMS WiSe 2014/15/Academic Working/Seiten/LernBar 4 von 91 RMS WiSe 2014/15/Academic
More informationELLs and Special Education : Language Difference or Learning Disability. Diane Staehr Fenner AMNH November 4, 2012
ELLs and Special Education : Language Difference or Learning Disability Diane Staehr Fenner AMNH November 4, 2012 1 2 Objectives Compare characteristics of the second language acquisition (SLA) process
More informationRepresenting dictionaries in hypertextual form
Preprint. To appear in: Rufus H. Gouws, Ulrich Heid, Wolfgang Schweickhard & Herbert Ernst Wiegand (eds.): Dictionaries. An international encyclopedia of lexicography. Supplementary volume: Recent developments
More informationDownload Check My Words from: http://mywords.ust.hk/cmw/
Grammar Checking Press the button on the Check My Words toolbar to see what common errors learners make with a word and to see all members of the word family. Press the Check button to check for common
More informationICAME Journal No. 24. Reviews
ICAME Journal No. 24 Reviews Collins COBUILD Grammar Patterns 2: Nouns and Adjectives, edited by Gill Francis, Susan Hunston, andelizabeth Manning, withjohn Sinclair as the founding editor-in-chief of
More informationin Language, Culture, and Communication
22 April 2013 Study Plan M. A. Degree in Language, Culture, and Communication Linguistics Department 2012/2013 Faculty of Foreign Languages - Jordan University 1 STUDY PLAN M. A. DEGREE IN LANGUAGE, CULTURE
More informationMultilingual and mixed-lingual TTS applications
Multilingual and mixed-lingual TTS applications LangTech 2003 November 24, 2003 Simona Fina, Manager Linguistics Real-life texts need mixed-lingual analysis Agenda Short presentation of SVOX Challenges
More informationServices supply chain management and organisational performance
Services supply chain management and organisational performance Irène Kilubi Services supply chain management and organisational performance An exploratory mixed-method investigation of service and manufacturing
More informationANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
More informationComprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
More informationBreatling the Meaning of Tag Sets in CMC corpora
An extended tag set for annotating parts of speech in CMC corpora Thomas Bartz 1, Michael Beißwenger 1, Eric Ehrhardt 2, Angelika Storrer 2 1) 2) International Research Days: Social Media and CMC Corpora
More informationPROMETHEUS - THE DISTRIBUTED DIGITAL IMAGE ARCHIVE FOR RESEARCH AND EDUCATION GOES INTERNATIONAL!
PROMETHEUS - THE DISTRIBUTED DIGITAL IMAGE ARCHIVE FOR RESEARCH AND EDUCATION GOES INTERNATIONAL! p r o m e t h e u s c/o Kunsthistorisches Institut University of Cologne Albertus-Magnus-Platz 50923 Cologne
More informationThe Rise of Documentary Linguistics and a New Kind of Corpus
The Rise of Documentary Linguistics and a New Kind of Corpus Gary F. Simons SIL International 5th National Natural Language Research Symposium De La Salle University, Manila, 25 Nov 2008 Milestones in
More informationDEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES
Teaching English with Technology, vol. 3, no. 1, pp. 3-12, http://www.iatefl.org.pl/call/callnl.htm 3 DEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES by Alejandro Curado University
More informationDeclarative Parsing and Annotation of Electronic Dictionaries
Declarative Parsing and Annotation of Electronic Dictionaries Christian Schneiker 1, Dietmar Seipel 1, Werner Wegstein 2, and Klaus Prätor 3 1 Department of Computer Science {schneiker seipel}@informatik.uni-wuerzburg.de
More informationCURRICULUM VITAE SILKE BRANDT
CURRICULUM VITAE SILKE BRANDT CONTACT Silke Brandt, PhD English Department Nadelberg 6 CH-4051 Basel Switzerland silke.brandt@unibas.ch POSITIONS 2011-present Postdoctoral researcher English Department
More informationAccessing the Deep Web: A Survey
VL Text Analytics Accessing the Deep Web: A Survey Marc Bux, Tobias Mühl Accessing the Deep Web: A Survey, 2007 by Bin He, Mitesh Patel, Zhen Zhang, Kevin Chen Chuan Chang Computer Science Department University
More informationA History of the «Concise Oxford Dictionary»
Lodz Studies in Language 34 A History of the «Concise Oxford Dictionary» Bearbeitet von Malgorzata Kaminska 1. Auflage 2014. Buch. 342 S. Hardcover ISBN 978 3 631 65268 8 Format (B x L): 14,8 x 21 cm Gewicht:
More informationDiaCollo: On the trail of diachronic collocations
DiaCollo: On the trail of diachronic collocations Bryan Jurish jurish@bbaw.de AG Elektronisches Publizieren Historische Semantik und Semantic Web Heidelberger Akademie der Wissenschaften 14 th 16 th September,
More informationCLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
More informationMaster of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University
Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University 1. Academic Program Master of Arts Program in Linguistics for Communication
More informationvernetziko: A Cross-Reference Management Tool for the Lexicographer s Workbench
vernetziko: A Cross-Reference Management Tool for the Lexicographer s Workbench Peter Meyer Institut für Deutsche Sprache Mannheim E-mail: meyer@ids-mannheim.de Abstract vernetziko is an assistive software
More informationSecurity Vendor Benchmark 2016 A Comparison of Security Vendors and Service Providers
A Comparison of Security Vendors and Service Providers Information Security and Data Protection An Die Overview digitale Welt of the wird German Realität. and Mit Swiss jedem Security Tag Competitive ein
More informationMotivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
More informationBACKUP EAGLE. Release Notes. Version: 6.1.1.16 Date: 11/25/2011
BACKUP EAGLE Release Notes Version: 6.1.1.16 Date: 11/25/2011 Schmitz RZ Consult GmbH BACKUP EAGLE Release Notes Seite 1 von 7 Date 11/29/2011 Contents 1. New Features... 3 1.1. Configurable automatically
More informationDepartment of English. University of Innsbruck
Department of English University of Innsbruck Welcome! Welcome to the Department of English at the University of Innsbruck! Founded in 1898, our department is one of the oldest English departments in Austria.
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationLevel 2 German, 2014
91123 911230 2SUPERVISOR S Level 2 German, 2014 91123 Demonstrate understanding of a variety of spoken German texts on familiar matters 9.30 am Wednesday 12 November 2014 Credits: Five Achievement Achievement
More informationBig Data Vendor Benchmark 2015 A Comparison of Hardware Vendors, Software Vendors and Service Providers
A Comparison of Hardware Vendors, Software Vendors and Service Providers The digital world is becoming a reality. Mit jedem Tag ein bisschen mehr. ECommerce, Online- Werbung, mobile Applikationen und soziale
More informationSHORT, August 2015. THE KLEINE ZEITUNG INTRODUCES ITSELF. From the two-shilling daily to a multimedia brand
SHORT, August 2015 THE KLEINE ZEITUNG INTRODUCES ITSELF. From the two-shilling daily to a multimedia brand KLEINE ZEITUNG A CONSTANT PRESENCE IN THE WORLD OF MEDIA SINCE 111 YEARS INDEPENDENCE The Kleine
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationReference Books. (1) English-English Dictionaries. Fiona Ross FindYourFeet.de
Reference Books This handout originated many years ago in response to requests from students, most of them at Konstanz University. Students from many different departments asked me for advice on dictionaries,
More informationFEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE
FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE Şükriye RUHİ sukruh@metu.edu.tr Derya ÇOKAL KARADAŞ cokal@metu.edu.tr Middle East Technical University THE METU SPOKEN TURKISH DISCOURSE
More informationWorking Paper Series des Rates für Sozial- und Wirtschaftsdaten, No. 163
econstor www.econstor.eu Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Wilkinson,
More information