Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]



Similar documents
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]

Making a Dictionary in Ulaanbaatar:

The Use of Text Corpora in Lexical Research

Complex Predications in Argument Structure Alternations

Diese Liste wird präsentiert von. Netheweb.de

Search Engines Chapter 2 Architecture Felix Naumann

Exemplar for Internal Assessment Resource German Level 1. Resource title: Planning a School Exchange

Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)

Checklist Use this checklist to find out how much English you already know. Grundstufe 1 (Common European Framework: A1 Level)

Exemplar for Internal Achievement Standard. German Level 1

Is Cloud relevant for SOA? Corsin Decurtins

Using German corpora for linguistic purposes. Dr. Kathrin Steyer Institut für Deutsche Sprache, Mannheim

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

(Incorporated as a stock corporation in the Republic of Austria under registered number FN m)

Voraussetzungen/ Prerequisites *for English see below*

FOR TEACHERS ONLY The University of the State of New York

Microsoft Nano Server «Tuva» Rinon Belegu

Übungen zur Vorlesung Einführung in die Volkswirtschaftslehre VWL 1

Vorläufiges English Programme im akademischen Jahr 2015/2016 Preliminary English Programme in the Academic Year 2015/2016 *for English see below*

It is also possible to combine courses from the English and the German programme, which is of course available for everyone!

Vorläufiges English Programme im akademischen Jahr 2015/2016 Preliminary English Programme in the Academic Year 2015/2016 *for English see below*

Mit einem Auge auf den mathema/schen Horizont: Was der Lehrer braucht für die Zukun= seiner Schüler

Software / FileMaker / Plug-Ins Mailit 6 for FileMaker 10-13

Contents. What is Wirtschaftsmathematik?

How To Make A Germanian Stationery Brand From Japanese Quality Germanic Style

Leitfaden für die Antragstellung zur Förderung einer nationalen Biomaterialbankeninitiative

German Language Resource Packet

IAC-BOX Network Integration. IAC-BOX Network Integration IACBOX.COM. Version English

The finite verb and the clause: IP

Semantic Web. Semantic Web: Resource Description Framework (RDF) cont. Resource Description Framework (RDF) W3C Definition:

I-Q SCHACHT & KOLLEGEN QUALITÄTSKONSTRUKTION GMBH ISO 26262:2011. Liste der Work Products aus der Norm

Linux & Docker auf Azure

Paul Kussmaul. Becoming a competent translator in a B.A. course. 1. Introduction

LEHMAN BROTHERS SECURITIES N.V. LEHMAN BROTHERS (LUXEMBOURG) EQUITY FINANCE S.A.

FOR TEACHERS ONLY The University of the State of New York

Kapitel 2 Unternehmensarchitektur III

Update to V10. Automic Support: Best Practices Josef Scharl. Please ask your questions here Event code 6262

Vergleich der Versionen von Kapitel 1 des EU-GMP-Leitfaden (Oktober 2012) 01 July November Januar 2013 Kommentar Maas & Peither

LINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN

International Guest Students APPLICATION FORM

International Guest Students APPLICATION FORM

The Changing Global Egg Industry

Microsoft Certified IT Professional (MCITP) MCTS: Windows 7, Configuration ( )

English Programme im akademischen Jahr 2014/2015 English Programme in the Academic Year 2014/2015 *for English see below*

Varieties of specification and underspecification: A view from semantics

Does it really CHANGE something?

2010 Users Symposium Berlin

How To Talk To A Teen Help

Department of Geography - Birgit Sattler - University of Duisburg-Essen ILIAS. in geography and landscape architecture

An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models

AnyWeb AG

LEARNING AGREEMENT FOR STUDIES

Virtual Organization Virtuelle Fabrik

TIn 1: Lecture 3: Lernziele. Lecture 3 The Belly of the Architect. Basic internal components of the Pointers and data storage in memory

AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES

SPICE auf der Überholspur. Vergleich von ISO (TR) und Automotive SPICE

Exchange Synchronization AX 2012

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

O D B C / R O C K E T ( B S / O S D ) V 5. 0 F O R S E S A M / S Q L D A T E : F E B R U A R Y *2 R E L E A S E N O T I C E

Information Systems 2

Timebox Planning View der agile Ansatz für die visuelle Planung von System Engineering Projekt Portfolios

Application Optimization, Visibility and Control for the Hybrid Enterprise

Evening Lectures: Aggression in Health Care Settings

QCF Qualifications in Languages. German. Level 1

AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom

I Textarbeit. Text 1. I never leave my horse

Cloud OS Network. Uwe Lüthy, Die Bedeutung einer Partner Managed Cloud für Kunden. Partner Technology Strategiest

Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

HYPO TIROL BANK AG. EUR 5,750,000,000 Debt Issuance Programme (the "Programme")

FILE WORKFLOW MIT AREMA

Building an Architecture Model Entwerfen Sie mit AxiomSys ein Kontextdiagramm, das folgendermaßen aussieht:

:09: [scheduler thread(5)]: AdvancedCardAllocation.GetAvailableCardsForChannel took 7 msec

Teacher education and its internationalisation LATVIA

Upgrading Your Skills to MCSA Windows Server 2012 MOC 20417

Supervisory Disclosure during a Financial Crisis: Evidence from the EU-wide Stress-Testing Exercises

Collaboration or Cooperation? Analyzing Group Dynamics and Revision Processes in Wikis

German Language Support Package

1. Wenn der Spieler/die Spielerin noch keine IPIN hat, bitte auf den Button Register drücken

Dokumentation über die Übernahme von. "GS-R-3" (The Management System for Facilities and Activities) "Sicherheitskriterien für Kernkraftwerke"

Coffee Break German Lesson 06

Examiners Report/ Principal Examiner Feedback. Summer GCSE German (5GN04) Paper 01 Writing in German

INSRUCTION MANUAL AND GUARANTEE POLICY

Integrating Jalopy code formatting in CVS repositories

German Language Support Package

0525 GERMAN (FOREIGN LANGUAGE)

How To Get A Job At Ecm

ida.com excellence in dependable automation

Rainer Stropek software architects gmbh. Entwicklung modularer Anwendungen mit C# und dem Managed Extensibility Framework (MEF)

Was muss ein Unternehmen im Griff haben, wenn es IT einsetzt? Jimmy Heschl

Transcription:

Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds 8. Co-occurrence analysis 9. Application III: Word senses in lexicography 10. Keyword analysis 3.1 Corpus analysis software I: AntConc 3.2 KWICs and concordances 3.3 Corpus analysis software II: COSMAS II Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1] Corpus analysis software I: AntConc 3.1 Software I: AntConc AntConc Developer: Laurence Anthony, Faculty of Science and Engineering,Waseda University, Japan. Version: 3.2.1w (Windows), release March 10th, 2007. Search: offline. Software: installed on a local computer. Access: free download. Corpora: own (txt-files). Languages: all (Unicode), e. g., German, Englisch, Romanian, Mongolian. URL: http://www.antlab.sci.waseda.ac.jp/antconc_index.html. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 2] 1

3.1 Software I: AntConc co-occurrence analysis frequencies / word list key word analysis cluster analysis concordances Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 3] 3.1 Software I: AntConc can be recommended with smaller corpora (up to 20 mio. running words) strenghts: sorted concordances, word lists, cluster analyses, key word analyses less useful for co-occurrence analyses (too slow; larger corpora are needed) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 4] 2

3.2 KWICs and concordances Concordances Concordance A concordance is a collection of cotexts of a particular key word. Cotexts of a specified length (of letters, words, or sentences) around a key word are extracted from a corpus and ordered with the key word in the center. Lemnitzer, Lothar und Heike Zinsmeister. Korpuslinguistik. Eine Einführung. Tübingen: Narr, 2006. S. 196f. KWIC A KWIC ( Key word in context ) is a single cotext of a particular key word. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 5] Search: concordances for helps in part of the English corpus of the Leipzig Corpus Collection (newspapers). Search term (here: helps) Sort (here: alphabetically according to the word to the right of the search term) Cotext (here: 200 char.) Hits (here: 56) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 6] 3

3.2 KWICs and concordances Export of results as a txt-file Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 7] Search: Concordances for depăşeşte in a small collection of Romanian texts (Unicode) 3.2 KWICs and concordances Reset language settings to Unicode (utf8) in Global Settings / Language Encoding Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 8] 4

Corpus analysis software II: COSMAS II COSMAS II is the corpus analysis system at the Institut für Deutsche Sprache. It comes in two versions: COSMAS II Client for Windows COSMAS II WWW-interface the WWW-interface has fewer functions than the client both access the same corpora the search is carried out online in both versions Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 9] COSMAS II (Windows Client) Developer: Institut für deutsche Sprache. Version: 3.61 (Windows). Search: Online. Software: local installation. Zugang: free download of analysis software; registration necessary. Korpora: DeReKo (Corpora of the IDS). Languages: German (3,4 bn. running words). URL: http://www.ids-mannheim.de/cosmas2/install/. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 10] 5

After program start: load corpora Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 11] Search option I: line-based Step 1: formulation of search request Search expression, here: &behaupten /+w2 (dass oder daß) [Search for records for the lexeme behaupten (&behaupten), up to 2 words apart (/+w2) from the word form dass or the word from daß (dass oder daß)] Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 12] 6

Step 2: Determine search and lemmatization options Search options (treatment of upper cases, frequency information, sort options, limit of hits). Lemmatization options ( Grundformenoperator supports search for inflected forms and compounds, etc.). Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 13] Step 3: Choose word forms from expansion list Selection of word forms Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 14] 7

Number of hits for search expression (here: 15904) Step 4: Confirm intermediate statistics of search request Move to display of records Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 15] Step 5: Request KWICs (Menü: Ansicht) Display (here: Korpusansicht) Change display (here: request KWICs) Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 16] 8

Step 6: Request full text Full text option Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 17] Result Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 18] 9

Korpusanalyse am IDS COSMAS II Search option II: template-based Step 1: formulation of search request Search expression, here: &behaupten /+w2 (dass oder daß) [templates can be moved from the left column into the center] further steps: as with line-based request Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 19] COSMAS II (WWW interface) Developer: Institut für deutsche Sprache. Version: 1.21. Search: Online. Software: Online. Access: free; registration necessary. Korpora: Deutsches Referenzkorpus (IDS-corpora). Languages: German (3,4 bn. running words). URL: https://cosmas2.ids-mannheim.de/cosmas2-web/. Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 20] 10

After program start: load corpora Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 21] After program start: load corpora Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 22] 11

After program start: load corpora Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 23] Only search option: line-based Search expression, here: &behaupten /+w2 (dass oder daß) Step 1: Formulate search request Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 24] 12

Schritt 2 (optional): Determine search and lemmatization options (as with Client) Options Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 25] Step 3 (optional): Choose word forms from expansion list Step 4: Display results Results Open expansion lists Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 26] 13

Step 5: Choose type of KWIC display Numer of hits for search expression (here: 15904) Options for the display of results (by month, by year, by decade, ) KWIC display Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 27] Step 6: Request full text Full text option Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 28] 14

Result Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 29] Syntax of search language Some examples Funktion Lemmasuche Wortformensuche Wortkettensuche Wortteilsuche Abstandssuche und-suche Suche mit Tags Beispiel &spielen spielte &spielen /+w1 &Domino spiele /+w1 &Domino *spiel &spielen /+w3 &Domino Domino /s0 Schach Suchziel: Belege mit beliebigen Wortformen des Lexems spielen der Wortform spielte Wortketten, die aus einer beliebigen Wortform von spielen gefolgt von einer beliebigen Wortform von Domino bestehen Wortketten, die aus der Wortform spiele gefolgt von einer beliebigen Wortform von Domino bestehen einer Wortform, die auf spiel endet Wortketten, die aus einer beliebigen Wortform von spielen gefolgt im Abstand von bis zu 3 Wörtern von einer beliebigen Wortform von Domino bestehen sowohl der Wortform Domino als auch der Wortform Schach Wortketten, die aus einer beliebigen Wortform von haben gefolgt von einem Infinitiv und der Wortform können bestehen Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 30] 15

Example for search in COSMAS II Looking for: dass-clauses as sentential subject with the verb helfen ( to help ). Assumption: Sentential subjects with helfen often occur within constructions like <[ ] es [ ] hilft, dass/daß>. Search: (es /+w3 &helfen) /+w1 (dass oder daß) Beispiele T04 Der SPD hat es nicht geholfen, dass der Sympathieträger und B99 Uns könne es nur helfen, dass wir so früh den Weg zu B02 Vielleicht hat es Metzelder geholfen, dass die Kollegen seinen E96 Da wird es auch nicht helfen, dass der Publikumsrat E99 Mir hat es viel geholfen, dass ich Kabuki-Theater N98 "Uns könnte es helfen, daß gleichzeitig Landtagswahl ist", P93 Saddam Hussein könnte es helfen, daß Zulieferstaaten... eine volle P98 "Wenn es Saddam hilft, daß Unscom von Diplomaten R99 Was kann es nun helfen, daß inzwischen 13 der 15 Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 31] 16