HLT in Hungary - 2009 Gábor Prószéky MorphoLogic http:// Pázmány Péter Catholic University Faculty of Information Technology http://
Basics On Hungarian 15 million speakers world-wide, 10 million in Hungary Agglutinative language: Fenno-Ugric roots (with uncertain points) and with a few small relative languages only Since 896 in Central Europe: Turkish, Slavic, Romance and German areal influences Complex formal descriptions have been needed, namely simple CL methods (which work for English) don t work The first detailed and computationally usable morphosyntactic description of Hungarian was made in 1991
History of Hungarian HLT 1960 s: Russian-Hungarian MT Group, periodical Computational Linguistics (Prof. Kiefer) 1970 s: Atergo dictionary, basic language statistics (Debrecen University - Prof. Papp) 1980 s-: Speech applications (Technical University - Prof. Gordos, Németh, Olaszy, Vicsi) AI applications (ALL Gergely et al.) 1991-: Marketable NLP products (MorphoLogic - Prószéky et al.) 1990 s: Historical dictionary, corpus linguistics (Linguistics Institute of HAS - Váradi et al.), 2000 s-: Learning methods in NLP (Szeged University - Prof. Csirik) Services combined with speech applications (AITIA Tatai et al.) 2002-: Courses in HLT, PhD s in HLT (Pázmány University - Prof. Prószéky, Prof. Takács) 2003-: Series of Annual National HLT Conferences (Szeged) 2008-: HLT Platform: 4 academic institutions, 4 enterprises
Hungarian HLT Research MorphoLogic (Gábor Prószéky) Staff: 15 Proofing tools, intelligent dictionaries, machine translation, a large scale of linguistic resources for various languages (incl. Hungarian WordNet), text processing tools, lexicographical activities Linguistics Institute (Tamás Váradi) Staff: 8 Hungarian National Corpus, research activities in various CL projects (incl. Hungarian WordNet) Szeged University (János Csirik) Staff: 6 Machine learning tools for NLP, speech research, activities in various CL research projects (incl. Hungarian WordNet)
Hungarian HLT Research (cont d) Technical University of Budapest (TMIT) Staff: 28 (in 9 laboratories) Speech Technology Lab: speech information systems, e-mail/sms reader, tools for blind people (Géza Németh) Speech Acoustics Lab: speech databases, medical applications, speech correction, acoustic-phonetic research (Klára Vicsi) Speech Recognition Lab: speaker recognition, speech recognition, statistical modeling, multimedia indexing (Tibor Fegyó, Péter Mihajlik) Technical University of Budapest (MOKK) Staff: 5 Corpus collection (mono- and bilingual), text aligning, audio/video archives, ontology modeling, POS-tagging (Péter Halácsy)
Hungarian HLT Research (cont d) Pázmány Péter Catholic University, Faculty of Information Technology (Gábor Prószéky, György Takács) 4 researchers, 7 PhD students Language: WSD, semantic representation, anaphora resolution, text mining Speech: mobile applications (incl. mobile for the deaf!) Pécs University (Gábor Alberti) 4 researchers, 2 PhD students Computational semantics, machine translation, Prolog Other universities (with 1-2 researchers) Debrecen (literary computing) Miskolc (face modeling) Szombathely (terminology)
Hungarian HLT Research (cont d) Applied Logic Laboratory (Tamás Gergely) 4 researchers, 5 PhD students AI tools for medical and pharmacological applications, cognitive systems AITIA (Gábor Tatai) 48 co-workers (a few of them in HLT) Speech technology applications, text mining, chat-robots Kilgray (Balázs Kis) 3 full-time employees Translation memory development
International Cooperations in HLT Earlier in the 1990 s: MULTEXT-East, GLOSSER, GRAMLEX, ELSNET Goes East, SPECO, BABEL, TELRI, TRACTOR, EuroTermBank (MorphoLogic): common EU terminology ImportNet (ALL): ontology generation EASAIER (ALL): multimedia search CACAO (Linguistics Institute): library applications with HLT EuroMatrix (MorphoLogic): statistical MT for Europe CLARIN (Linguistics Institute & others): resources FLaReNeT (MorphoLogic): resources
Hungarian HLT Platform (2008-2010) Founders of the Platform: 4 industrial partners: AITIA Applied Logic Laboratory Kilgray MorphoLogic 4 academic partners: Linguistics Institute, HAS Technical University, Telecomm. & Media-informatics (TMIT) Technical University, Center for Media Res. & Educ. (MOKK) Szeged University, Res. Group of AI (RGAI) New member: Pázmány University, Faculty of Information Technology
Hungarian Education in NLP Courses in CL/HLT/NLP: Pázmány University: HLT (Prószéky + 5 PhD) speech (Takács + 2 PhD) Szeged University: machine learning (Csirik, Alexin + 3 PhD) Technical University: speech (Gordos, Németh, Olaszy, Vicsi + 3 PhD) artificial intelligence (Prószéky) Others: Debrecen University: general linguistics programme (Hunyadi) ELTE University: theoretical linguistics programme (Kálmán, Oravecz) Dept. of Translation Theory (Prószéky + 3 PhD) Pécs University: semantic representation (Alberti + 3 PhD) Pannon University, Szombathely: terminology (Fóris)
Annual National Conferences in Computational Linguistics 2-day conferences, always in December: 2003) 1st: 39 long and 20 short papers 2004) 2nd: 46 papers (in 8 sections) 2005) 3rd: 40 papers (in 7 sections), 13 posters & demos 2006) 4th: 34 papers (in 7 sections), 16 posters & demos 2007) 5th: 30 long papers (in 7 sections), 8 posters & demos 2008) Kick-off Conference of the Platform: 6 plenary presentations, 9 posters & demos
Hungary s Nr.1 HLT website: www.webforditas.hu Website for various HLT applications: text & web translation, dictionaries, spell-checking, search with linguistic support For fordítás (= translation ) it is the 1st in Google (among nearly 20 million hits) 60 000 visitors/day In 2008: 91 million pages translated (in 2007: 43 million pages) 81 million text translation + 2 million web translation + 7 million dictionary lookup 13,3 GB data traffic/year (with 1800 char/page it is 7,2 million A4 page translation) and the human translation market felt nothing (=no translators complained about losing jobs )
Translation between Hungarian and 33 other languages Technically, it is rather easy to combine two existing web translation services: HU-EN + EN-X and X-EN + EN-HU EN-X and X--EN language pairs for which commercial translation services are currently available: Official EU languages to and from Hungarian: 1. Bulgarian-Hungarian/Hungarian-Bulgarian Magyar/български MorphoLogic & SkyCode 2. Czech-Hungarian/Hungarian-Czech Magyar/Čeština 3. Danish-Hungarian/Hungarian-Danish Magyar/Dansk MorphoLogic & GrammarSoft 4. Dutch-Hungarian/Hungarian-Dutch Magyar/Nederlands 5. English-Hungarian/Hungarian-English Magyar/English MorphoLogic (Hu-En: with LI & SU) 6. Finnish-Hungarian/Hungarian-Finnish Magyar/Suomi 7. French-Hungarian/Hungarian-French Magyar/Français MorphoLogic & ProMT 8. German-Hungarian/Hungarian-German Magyar/Deutsch MorphoLogic & ProMT 9. Greek-Hungarian/Hungarian-Greek Magyar/Ελληνικά 10. Italian-Hungarian/Hungarian-Italian Magyar/Italiano MorphoLogic & ProMT 11. Latvian-Hungarian/Hungarian-Latvian Magyar/Latviesu valoda MorphoLogic & Trident 12. Lithuanian-Hungarian/Hungarian-Lithuanian Magyar/Lietuviu kalba 13. Polish-Hungarian/Hungarian-Polish Magyar/Polski MorphoLogic & pwn.pl 14. Portuguese-Hungarian/Hungarian-Portuguese Magyar/Português MorphoLogic & ProMT 15. Romanian-Hungarian/Hungarian-Romanian Magyar/Română 16. Slovak-Hungarian/Hungarian-Slovak Magyar/Slovenčina 17. Slovene-Hungarian/Hungarian-Slovene Magyar/Slovenščina 18. Spanish-Hungarian/Hungarian-Spanish Magyar/Español MorphoLogic & ProMT 19. Swedish-Hungarian/Hungarian-Swedish Magyar/Svenska 20. Other European languages to and from Hungarian: 20-25. HU/Catalan, HU/Croatian, HU/Norwegian (GrammarSoft), HU/Russian (ProMT), HU/Serbian, HU/Ukrainian (Trident) Important non-european languages to and from Hungarian: 26-33. HU/Arabic, HU/Chinese, HU/Hebrew, HU/Hindi, HU/Indonesian, HU/Japanese, HU/Korean, HU/Vietnamese
Features of a general web translation service Text translation techniques for any X language if X-En and En-X services are available Translation of entire websites Combination with various dictionaries (Web2, AJAX) Virtual keyboard for all languages Spell-checking for all languages Integrated text-to-speech tools (and speech recognition, soon) Language guesser tools integrated Translation options combined with internet search
MT Service for All European Languages Proposal for a new Pan-European cooperation Remark: we have not lost the interest in finding new ways in MT (e.g. we are partners in EuroMatrix), and we are still working on new scientific methods, as well, BUT THIS PROPOSAL IS DIFFERENT: it guarantees a usable translation service for a wide range of end-users on the basis of the existing service www.webforditas.hu, the above pivot application is running and anybody can use it (for the time being, 60.000 users/day), to extend the existing application to any other local languages, software technological developments are needed only, usable final results can be guaranteed, service providers for many languages are recently available on the market, cooperation has already started: both from the EU (HU, BG, DK, PL) and from non- EU countries (RU, UKR), partners R&D activity is basically the improvement of their own service to have better translations, on the basis of its experiences, MorphoLogic is in the position to offer an initiative how to combine efforts of potential partners.
Köszönöm figyelmüket! Thanks for your attention!