Extracting translation relations for humanreadable dictionaries from bilingual text
Overview 1. Company 2. Translate pro 12.1 and AutoLearn<word> 3. Translation workflow 4. Extraction method 5. Extended AutoLearn with selection restrictions 6. Improving accuracy, coverage and availability 7. On-the-fly extraction 06.11.2013 TeKom 2013 - Lingenio 2013 2
Company Funded in 1999 Spin-off of the IBM research center Germany located in Heidelberg develops and markets language technology software and services. Core compentence machine translation electronic dictionaries text analysis (morphology, syntax, semantics) Several research projects 06.11.2013 TeKom 2013 - Lingenio 2013 3
Translate pro 12.1 Single user versions for professional translators and private use including dictionaries with context-sensitive search functions 06.11.2013 TeKom 2013 - Lingenio 2013 4
Translate pro 12.1 Corporate solutions Client/Server networks for workgroups Lingenio Translation Server: web-based solutions for company-wide intranets 06.11.2013 TeKom 2013 - Lingenio 2013 5
Translate pro 12.1 Integration via Plug-Ins into Publishing Tools Wordpress, CAT-Tools Trados OmegaT, 06.11.2013 TeKom 2013 - Lingenio 2013 6
Translate pro 12.1 Translation Center MS Office Plug-Ins Browser Plug-Ins (IE, Firefox) Pdf translation 06.11.2013 TeKom 2013 - Lingenio 2013 7
Translation Center User dictionaries edition, settings Translation Memories selection, settings Automatic extraction of dictionary entries Postediting: Alternative translations Assistant: Unknown words, Statistics, settings,.. 06.11.2013 TeKom 2013 - Lingenio 2013 8
AutoLearn<word> extracts suggestions for dictionary entries from postedited MT Translation memories 06.11.2013 TeKom 2013 - Lingenio 2013 9
AutoLearn<word> creates suggestions from postedited text 06.11.2013 TeKom 2013 - Lingenio 2013 10
AutoLearn<word> creates suggestions from postedited text 06.11.2013 TeKom 2013 - Lingenio 2013 11
AutoLearn<word> creates suggestions from translation memory sentence pairs 06.11.2013 TeKom 2013 - Lingenio 2013 12
AutoLearn<word> suggestions extracted from translation memory sentence pairs 06.11.2013 TeKom 2013 - Lingenio 2013 13
AutoLearn<word> suggestions relate (potentially) to all parts of speech (nouns, verbs, adjectives, ) include multiword expressions can be selected for integration into active user dictionary. 06.11.2013 TeKom 2013 - Lingenio 2013 14
AutoLearn<word> suggestions can be added to dictionary single relations or all 06.11.2013 TeKom 2013 - Lingenio 2013 15
Dictionary entries assigned to suggestions make use of morpho-syntactic & semantic information & defaults of the MT system can be edited TeKom 2013 - Lingenio 2013 16
AutoLearn entries adapt the translation to the references extracted 06.11.2013 TeKom 2013 - Lingenio 2013 17
AutoLearn<word> extracts suggestions for dictionary entries from postedited MT Translation memories from single sentence pairs complete TMs from bilingual text via Lingenio sentence aligner workflow 06.11.2013 TeKom 2013 - Lingenio 2013 18
AutoLearn<word> bilingual texts European insurance regulation 06.11.2013 TeKom 2013 - Lingenio 2013 19
AutoLearn<word> align & import into translation memory 06.11.2013 TeKom 2013 - Lingenio 2013 20
AutoLearn<word> extract translation suggestions from single sentence pairs 06.11.2013 TeKom 2013 - Lingenio 2013 21
AutoLearn<word> extract translation suggestions from single sentence pairs 06.11.2013 TeKom 2013 - Lingenio 2013 22
AutoLearn<word> or from complete translation memories 06.11.2013 TeKom 2013 - Lingenio 2013 23
AutoLearn<word> from complete translation memories 06.11.2013 TeKom 2013 - Lingenio 2013 24
AutoLearn<word> - Extraction method 1. Translation relations from system dictionaries 2. Structures assigned to source and target sentence by the analysis components of the MT system 06.11.2013 TeKom 2013 - Lingenio 2013 25
Example Die Lithofazien-Analyse des oberen Teils der Pliozän-Schicht im Valdelsa-Becken (Mittelitalien) hat eine gewisse Anzahl von Umweltablagerungen ergeben, von der Schwemm- zur Küsten- und zur Meeresebene. Lithofacies analysis of the upper part of the Pliocene succession of the Valdelsa basin (central Italy) unravelled a number of depositional environments, ranging from alluvial plain to coastal, to marine 06.11.2013 TeKom 2013 - Lingenio 2013 26
Example Die Lithofazien-Analyse der Pliozän-Schicht hat eine gewisse Anzahl von Umweltablagerungen ergeben. Lithofacies analysis of the Pliocene succession unravelled a number of depositional environments. 06.11.2013 TeKom 2013 - Lingenio 2013 27
Dependence grammar structures 06.11.2013 TeKom 2013 - Lingenio 2013 28
Dependence grammar structures + transfer knowledge 06.11.2013 TeKom 2013 - Lingenio 2013 29
Dependence grammar structures + transfer knowledge (+ statistics) 06.11.2013 TeKom 2013 - Lingenio 2013 30
Dependence grammar structures + transfer knowledge (+ statistics) Derive new relations 06.11.2013 TeKom 2013 - Lingenio 2013 31
Dependence grammar structures + transfer knowledge (+ statistics) Derive new relations AutoLearn<word> 06.11.2013 TeKom 2013 - Lingenio 2013 32
Do more! Use analysis constraints! syntactic constraints semantic constraints morphological constraints 33
Extended AutoLearn with selection restrictions 06.11.2013 TeKom 2013 - Lingenio 2013 34
Extended AutoLearn with selection restrictions genitive object constraint direct object constraints 35
Extended AutoLearn with selection restrictions extract restrictions Lithofazien-Analyse ergibt Anzahl Umweltablagerungen ~ unravel weaken conditions Analyse ergibt Ablagerung Vorgang ergibt Ergebnis Select conditions by evaluating occurrences in corpora Analyse/Vorgang ergibt Rückstand/ Ergebnis ~ unravel, yield? 36
Extended AutoLearn soon: version 12.5 with selection restrictions 37
Extended AutoLearn soon: version 12.5 with selection restrictions supporting research: improve accuracy and coverage 38
Improving accuracy and coverage EU Marie Curie project (Hybrid high quality machine translation) BMWi project FlexNeuroTrans (Flexible MT for medium-sized businesses using neural nets) combination of rule-based and statistical methods extract information from the internet, 06.11.2013 TeKom 2013 - Lingenio 2013 39
AutoLearn<word> information Example: European insurance regulation search bilingual text (on the fly) that suits information requirement For example via Wikipedia,.. 06.11.2013 TeKom 2013 - Lingenio 2013 40
AutoLearn<word> information store & examine extracted texts 06.11.2013 TeKom 2013 - Lingenio 2013 41
Availability for translation service (Lingenio Translation Server LTS) for CAT-tools publishing tools (Wordpress,..) Intranet solutions AutoLearn<word> 06.11.2013 TeKom 2013 - Lingenio 2013 42
Improving availability for multilingual platforms 06.11.2013 TeKom 2013 - Lingenio 2013 43
On-the-fly extraction Text to be processed 06.11.2013 TeKom 2013 - Lingenio 2013 44
Summary: products & research 1. version 12.1 AutoLearn<word> (for several parts of speech & multiwords) 2. version 12.5 (soon: with selection restrictions) 3. learning, user dictionaries & memories available for Lingenio Translation Server (for CAT-tools, publishing and intranet) 4. supporting research for improving accuracy, coverage and onthe-fly extraction of translation information 06.11.2013 TeKom 2013 - Lingenio 2013 45
Thank you for your attention! Questions? (please visit us at stand 420)