The translation of examples, citations, definitions and glosses in the Papillon project

Transcription

1 The translation of examples, citations, definitions and glosses in the Papillon project Christian BOITET GETA, CLIPS, IMAG, 385 rue de la Bibliothèque, BP Grenoble cedex 9, France Abstract The Papillon lexical data base comprises a set of detailed monolingual dictionaries of «lexies» (word senses) interlinked through «axies» (interlingual links) which can also refer to external semanticsoriented systems such as UNL «universal words», Worldnet «synsets», Ontos «concepts» or NTT ALT/JE system «semantic classes». The basic idea is that bilingual or multitarget usage dictionaries can be generated ad libitum from the data base. This implies that examples, citations, definitions and glosses expressed in each language be translated into all other languages and stored into the data base. Storing can be achieved in a simple and «seamless» way by introducing «auxiliary» lexies and axies for these «free language elements». But translating all these elements into all languages is a major subproject of the Papillon project. We propose to use the mutualization feature of the Papillon server and help voluntary contributors perform of postedit translations using a shared, web-oriented translation workbench using the «Montaigne» architecture. We also propose to use freely available UNL web sites to get first drafts of translations, thereby attaching full UNL graphs to auxiliary axies. In the future, a «coedition» technique, still at the research stage, could be used to improve the UNL graphs a posteriori and transparently from any language, and get improved translations in all target languages. Mots-clés : Papillon multilingual data base, N-N translation of dictionary information, Montaigne architecture, interlingual representation, coedition of text & UNL graph Résumé La base de données lexical multilingue Papillon comprend un ensemble de dictionnaires monolingues détaillés de «lexies» (sens de mots) reliés par des «axies» (liens interlingues) qui peuvent aussi renvoyer à des systèmes sémantiques externes comme les «mots universaux» (UWs) UNL, les «synsets» de Worldnet, les «concepts» d Ontos, ou les «classes sémantiques» du système ALT/JE de NTT. L idée de base est que des dictionnaires d usage bilingues ou multicibles puissent être générés ad libitum à partir de la base de données. Cela implique que les exemples, citations, définitions et gloses exprimés dans chaque langue soient traduits dans toutes les autres langues et stockés dans la base. Le stockage peut être obtenu simplement et «sans couture» en introduisant des lexies et des axies «auxiliaires» pour ces «éléments libres de langue». Mais la traduction de tous ces éléments dans toutes les langues est un sous-project majeur du projet Papillon. Nous proposons d utiliser la caractéristique de mutualisation du serveur Papillon en aidant les contributeurs volontaires à effectuer ou à postéditer des traductions en utilisant un poste de traduction partagé, utilisable par réseau, et suivant l architure «Montaigne». Nous proposons aussi d utiliser les sites gratuits UNL pour obtenir de premiers jets de traductions, en attachant ce faisant des graphes UNL complets aux axies auxiliaires correspondantes. Dans le futur, on pourrait utiliser une technique de «coédition», actuellement au stade de la recherche, pour améliorer les graphes UNL a posteriori et de façon transparente à partir de toute langue, et obtenir des traductions améliorées dans toutes les langues cibles. Keywords : Base de données lexicale multilingue Papillon, traduction N-N d informations dictionnairiques, architecture Montaigne, représentation interlingue, coédition de texte & graphe UNL Introduction The Papillon lexical data base comprises a set of detailed monolingual DiCo [26] dictionaries of «lexies» (word senses) interlinked through «axies» (interlingual links) which in turn can also refer to external semantics- 1/10

2 The translation of examples, citations, definitions and glosses in the Papillon project oriented systems such as UNL «universal words», Worldnet «synsets», Ontos «concepts» or NTT ALT/JE system «semantic classes». The basic idea is that bilingual or multitarget usage dictionaries can be generated ad libitum from the data base. Figure 1 gives a simplified example. Figure 1 : Axies link monolingual lexies and external semantic symbols such as UWs This implies that all «language elements» in monolingual entries, that is, examples, citations, definitions, glosses and labels, are translated into all other languages and stored into the data base. For example, in a French- Thai dictionary for Thai readers, French definitions such as «carte à jouer» or «carte géographique» should appear both in French and in Thai. We first analyze this translation problem in more detail and show that, because the situation is asymmetric, the number of binary translations to perform is not linear but quadratic in the number of languages, whatever the translation technique. However, storing these translations can be achieved in a simple and «seamless» way by introducing auxiliaries lexies and axies, with a cost linear in the number of languages. In the second part, we show how to use the «mutualization» spirit of the Papillon projects and the associated characteristics of the Papillon server to link Papillon with a similar web-oriented, cooperative environment for human translation proposing free on-line translation aids in exchange of contributions to its translation memory («Montaigne» architecture). In the third and last part, we show how to produce some (and perhaps ultimately all) translations of free language elements using UNL-graphs as intermediate «pivot» representations linked to complex axies, and a mixture of automatic processing, interactive disambiguation and coedition requiring only monolingual competences from contributors to reach the desired quality level. Although we concentrate on translation in this presentation, we should also address the problem of creating free language elements, if possible in such a way that they are immediately available in all languages. Possible solutions are (1) to produce examples and perhaps definitions in several languages by extracting them from existing translation memories and (2) to try and use again UNL to generate parallel language elements from UNL graphs. But the latter can not work for citations. Handling the free language elements in a rich multilingual lexical data base such as Papillon is not only challenging from the scientific and technical points of view, but also from the organizational, sociological and intercultural points of view, because of the variety of contributors and techniques. 2/10

3 Christian Boitet 1 The problem : translating free language elements & storing the results 1.1 Translating (and creating) language elements Preliminary remarks As labels are elements of finite lists such as domains (geogr., phys., chem. etc.) and appear in the Papillon specific schema for each language, we may consider that their translations in all languages are stored once and for all at creation time. The «free language parts» to be translated are then examples, citations, definitions and glosses. By «gloss», we understand any word or phrase attached to a vocable to let human readers guess the intended lexie (word sense). For example, «ice (food)» = «ice (desert)» as opposed to «ice (cake crust)» and to «ice (water)». Glosses are not definitions, and certainly not DiCo semantic formulas, but serve as abbreviated explanations or hints. To translate them seems trivial but is not. For example, «desert» has to be understood as part of a meal and not as a geographical desert, or else Japanese readers of an English-Japanese dictionary would be misled. Various techniques such as «conceptual vectors» or network activations (as done by MSR on the Longman and American Heritage dictionaries) might be used to disambiguate glosses, and more generally words appearing in language elements. Examples, citations and definitions are obviously more difficult to translate than glosses. To make matters worse, the whole translation situation is asymmetric. Suppose for example that the French DiCo contains «Il utilise toujours des cartes IGN 1» as an example for «carte.2». The English translation «He always uses IGN maps» is good as a translation of this example, but certainly not as an example of use for «map» in English Quadratic size of the problem As a consequence, supposing we have L languages, M lexies in each language, and an average of F free language elements for each lexie, the amount of translations needed is not F*M*(L-1), but F*M*L*(L-1). It is also necessary to build the natural or «native» F*M*L free language elements. If L=7, M=100,000 and F=3 (1 or 2 examples, 1 gloss, 0 or 1 citation), there are more than 2M elements to build, giving rise to 12M translations. Translation of free language elements may of course help building original free language elements in other languages. For example, «He always uses IGN maps» might induce a contributor to propose «He often uses AA 2 maps» as a «native» example. But let us concentrate on the translation problem. 1.2 Storing the translation results Problems of storing native and translated elements in lexies Where to store the translation results? Of course, in the data base itself, and it would seem natural to store them in the corresponding lexies, alongside with the original elements. But we cannot store translations of French elements in English, Japanese, etc. in French lexies, because that would violate the principle of strict monolinguality of the DiCo volumes, and give a very messy data structure. We could envisage to store native and translated elements expressed in French in French lexies. For example, in the current Papillon microstructure for the French DiCo, the original French example for «carte.1» («Il utilise toujours des cartes IGN») is stored in the entry for that lexie. Adding appropriate XML tags or attributes 3, we could store next to it the French translations of «native» examples attached to lexies in other languages, such as «Il utilise souvent des cartes AA». This scheme has an important drawback. Translations of examples have to be linked to the corresponding «native» examples. Implementing that linking would lead to important changes in the macrostructure of the 1 Institut Géographique National 2 Automobile Association (for the sake of the examplep 3 Recall that each monolingual DiCo volume may be considered as one large XML file, although it is broken down in small pieces stored in a relational database such as Postgres at the physical level. 3/10

4 The translation of examples, citations, definitions and glosses in the Papillon project data base : either introduce special identification attributes or introduce new types of axies and links to them from inside lexies Introducing auxiliary lexies and axies A better solution is to create a new type of entries to store all free language elements. We will call them «auxiliary lexies» as opposed to the normal lexies. Auxiliary lexies will be linked by auxiliary axies. We may abbreviate as «x-lexies» and «x-axies», and refine «x» when necessary as «def», «cit», «ex» and «glo» (definition, citation, example, gloss). As it is, glo-lexies will be quite simple. Cit-lexies and ex-lexies would be simpler than normal lexies, having no semantic definition, no logico-syntactic «regime», no examples and no collocations, but perhaps attached morphosyntactic information and sense disambiguation information such as sense number (1 in «carte.1»). The x-axies will only be slightly different from axies. Links will remain the same, the only difference being that an x-axie will contain a list of x-lexies (instead of a list of lexies) for each language L, and that an x-axie with x glo will contain a list of UNL-graphs (instead of a list of UWs). This way, storing of free language elements and their translations can be achieved in a simple and «seamless» way. 2 A «Montaigne» environment for human translation We propose to use the mutualization feature of the Papillon server and help voluntary contributors perform of postedit translations using a shared, web-oriented translation workbench using the «Montaigne» architecture. 2.1 The Montaigne architecture Rationale and evolution of the basic concept The Montaigne 4 architecture has first been defined in 1995, when it was realized that the EuroLang Optimizer (EO) TTS (translation support system) developed by Site/Eurolang could only be used by sizable groups of professional translators. Measures had proven that it was necessary to have at least 800 pages of the same kind in the translation memory to get real improvements from its usage. But an isolated translator, and even more an occasional translator, rarely translates more than pages a year in a given format, domain, and grammatical sublanguage. Also, the pricing scheme and the complexity were dissuasive : about 1500 for a client licence and the same for the corresponding server token, need to buy Windows-NT and SQL-server on the server, difficult installation The basic idea of Montaigne is to let users share a common translation memory and other support tools such as a bilingual editor and online dictionaries, freely, through the network, in exchange for their agreement to share their data «products» with others. These data are aligned sentences and dictionary entries produced by their translation activity. The pricing model is that of IE or Netscape : free clients and paying server, with the idea that servers should be funded by institutions wanting their members to publish both in their native tongue and in English. At the time, we tried to transform the EO software to meet these new requirements, but it proved too costly because the client was tightly integrated into Word and Windows 95. Actually, the client software was far too complex. Since then, the development of web-oriented applications and tools have made it possible to modify this architecture so that what runs on the client is very light. At the limit, the server may send html pages including javascript programs to implement all functions, including the bilingual editor. Progresses have also been made on translation memories matching algorithms, which now give better recall and precision [28-30]. A limited version has been prototyped by V. Berment for Lao-French translation and is available on OKI electric also supports a site built around similar ideas (http;// 4 Model Of New Translation AIds Generalized to the NEt 4/10

5 Christian Boitet Scenario for using a «Montaigne» TSS site The scenario envisaged for a full version is as follows. Using any web navigator, you enter the server, and register if not already done. At that point, you can access the common resources (translation memories, dictionaries, bilingual editor and other tools) in read-only mode, and your private space on the server disks, where you may keep private (or not yet sharable) lexicons and translated segments or documents. From the interface, you upload a document you want to translate, indicating its source language and format, and the desired target languages and formats. The server preprocesses the document : normalization into a common XML format and in Unicode (UTF-8 encoding), segmentation in units of translation (normally sentences), computation of several layers of representation (such as text only without formatting tags, lemmatized forms, chunks ), and search in the translation memory. It then opens a page containing a 2-column table with one line for each segment and a frame for suggestions : source segment N-2 translated segment (done) source segment N-1 translated segment (done) suggestion(s) from the TM source segment N source segment N+1 source segment N+2 translated segment (currently being created) Figure 2 : typical layout of a bilingual editor in a TTS dictionary suggestions When you click in the first source segment, suggestions for translations extracted from the translation memory appear to the right, and under them the lexical information relative to the segment, if any. Using normal editing functions and some specific shortcuts, you build the translated segment. When you click in the next source segment or quit, the server updates your private memory. At the end, you decide to allow or not sharing of the results of your work, and download the translated file in the requested format from the server. This was only the «bare-bones» of the TTS. Some more functions are necessary, such as the possibility to modify the segmentation of the source texte, and of correcting the source text. Many other functions can be envisaged, such as voice input, link with a spell checker, a grammar checker, etc. 2.2 Peer-to-peer architecture: Papillon Montaigne The Papillon data base and server architectures are already quite complex, so that it does not seem a good idea to try and integrate Papillon and Montaigne in a classical «client-server» architecture. Also, in the context of usual translation, Papillon would be a server and Montaigne a client sending requests concerning words, while, in the context of the Papillon translation subproject, Montaigne would appear as a server. A peer-to-peer integration seems preferable. As the server organization is the same in both cases (common shared resources and private spaces, freely settable user groups), it could and should be shared at the upper level, so one would enter Montaigne without login procedure when consulting a Papillon dictionary and wanting to contribute by translating some examples or citations, or revising existing translations. 3 Introducing automaticity through UNL We also propose to use freely available UNL web sites to get translations at various quality levels. These translation will then be available as suggestions in the Montaigne environment, exactly as suggestions coming from the translation memory. For this, we attach full UNL graphs to def-axies, cit-axies, and ex-axies. 3.1 Brief introduction to UNL UNL (Universal Networking Language) is the name of a project, of a meaning representation language, and of a format for "perfectly aligned" multilingual documents ( see also [11, 41]). The UNL language is a good interlingua for automated translation, ranging from fully automatic MT to interactive MT of 5/10

6 The translation of examples, citations, definitions and glosses in the Papillon project several kinds through translation of non task-oriented spoken dialogues. It is also more than that, due to the associated "knowledge base", and has a great potential in textual information processing applications. The UNL representation is made of "semantic graphs" where a graph expresses the meaning of some natural language utterance. Nodes contain lexical units and attributes, arcs bear semantic relations. Connex subgraphs may be defined as "scopes", so that a UNL graph may be a hypergraph. Figure 3 shows a graphic representation of a UNL graph. A linear UNL-xml writing appears in Figure 7. The lexical units, called Universal Words (UW) 5,, represent word meanings, something less ambitious than concepts. Their denotations are built to be intuitively understood by developers knowing English, that is, by all developers in NLP. A UW is an English term or pseudo-term possibly completed by semantic restrictions. Ronaldo agt pos goal(icl>thing) score(icl>event,agt>human,fld>sport).@entry.@past.@complete obj obj ins head(pof>body) plt corner left mod Figure 3 : a possible UNL graph for Ronaldo has headed the ball into the left corner of the goal A UW such as "process" represents all word meanings of that lemma, seen as citation form (verb or noun here). The UW "process(icl>do, agt>person)" covers the verbal meanings of processing, working on, etc. The attributes are the (semantic) number, genre, time, aspect, modality, etc. The 40 or so semantic relations are traditional "deep cases" such as agent, (deep) object, location, goal, time, etc. One way of looking at a UNL graph corresponding to an utterance U-L in language L is to say that it represents the abstract structure of an equivalent English utterance U-E as "seen from L", meaning that semantic attributes not necessarily expressed in L may be absent (e.g., aspect coming from French, determination or number coming from Japanese, etc.). The UNL language of semantic graphs may be called as a "semantico-linguistic" interlingua. As a successor of the technically and commercially successful ATLAS-II and PIVOT interlinguas, its potential to support various kinds of text MT is certain, even if some improvements would be welcome, as always. It is also a strong candidate to be used in spoken dialogue translation systems when the utterances to be handled are not only taskoriented and of limited variety, but become more free and truly spontaneous. Finally, although it is not a true representation language such as KRL and its frame-based and logic-based successors, and although its associated "knowledge base" is not a true ontology, but rather a kind of immense thesaurus of (interlingual) sets of word senses, it seems particularly weel suited to the processing of multilingual information in natural language (information retrieval, abstracting, gisting, etc.). The UNL format of multilingual documents aligned at the level of utterances is currenly embedded in html (call it UNL-html), and used by various tools such as the UNL viewer. By using a simple transformation, one obtains the UNL-xml format, and profits from all tools currently developed around XML (see Figure 7 below). 3.2 Interactive disambiguation at analysis time A first scenario could be as follows. You consult Papillon and the interface tells you your help would be welcome in translating some free language element, such as an example, from French into all other Papillon languages. You agree, and the interface changes to include interactive disambiguation functionalities «à la LIDIA». A frame appears with the example in it, and soon a question mark next to it. That means that ambiguities have been encountered Figure 4: questions on an example are waiting during analysis. You click on the button to start the disambiguation dialogue, which is a succession of simple questions with a few menu items from which to choose one. 5 in French, "mot universel" sounds strange but we may use "Unité de Vocabulaire Virtuel", again UW. 6/10

7 Christian Boitet A first question appears (Figure 5). In the context of this story, the user should choose to attach de Chine to vase (Chinese vase). A second dialogue appears (Figure 6) to ask about the word sense of capitaine. Le capitaine a rapporté un vase de chine. capitaine de Chine, le capitaine a rapporté un vase. Le capitaine a rapporte (un vase de chine). Officier qui commande une compagnie d'infanterie, un escadron de cavalerie, une batterie d'artillerie Officier qui commande un navire de commerce Chef d'une équipe sportive Figure 5: attachement problem Figure 6: word sense disambiguation Internally, a unique multilevel concrete tree (UMC-structure) is obtained. The normal automatic analysis continues and produces a more abstract tree (UMA-structure). Using the French-UNL dictionary (deducible from Papillon if UWs have been linked to axies) and a few transformation rules, a transfer phase produces a «UNLtree», and then a standard algorithm transforms it into a UNL-graph (where reentrancy, cycles, and recursion by «scopes» are possible). At that point, a UNL document containing only the example in French and its enconversion into a UNL-graph is built and sent for deconversion to UNL servers for all desired target languages. When this is finished, the translations obtained are put in the «translation space» of the Papillon server as usual contributions, to be validated by the central group. This ends the scenario, and the user continues browsing Papillon. Other users from other native tongues will of course annotate and improve the translations obtained in their languages. 3.3 Text-UNLgraph coedition at reading time In the future,the preceding scenario could be extended to include a «coedition» technique, still at the research stage, to improve the UNL graphs a posteriori and transparently from any language, and get improved translations in all target languages. Let us illustrate this by an example. Suppose we have an example taken from the FB2004 corpus, initially in Spanish, but enconverted from a Chinese version produced by a Chinese contributor to Papillon, and then deconverted into English, Spanish, French, and Italian. As the UNL graph does not to contain definiteness and aspectual information, the deconversion results have many wrong articles, and some errors on aspects. <unl:s num="1"> <unl:org lg="cn"> </unl:org> <unl:unl> <unl:arc> agt(retrieve(icl>do).@entry.@future, city) </unl:arc> <unl:arc> tim(retrieve(icl>do).@entry.@future, after) </unl:arc> <unl:arc> obj(after, Forum) </unl:arc> <unl:arc> obj(retrieve(icl>do).@entry.@future, zone(icl>place).@indef) </unl:arc> <unl:arc> mod(zone(icl>place).@indef, coastal) </unl:arc> </unl:unl> <unl:cn> </unl:cn> <unl:el> After a Forum, a city will retrieve a coastal zone.</unl:el> <unl:es> Ciudad recobrará una zona de costal después Foro. </unl:es> <unl:fr> Une cité retrouvera une zone côtière après un forum. </unl:fr> <unl:it> Città ricuperarà une zona costiera dopo Forum. </unl:it> <unl:jp> </unl:jp> </unl:s> Figure 7 : an example deconverted but needing revision The idea of "coedition" is to correct the UNL graph associated with a segment one wants to improve, and then to send the improved graph to all deconverters and get better translations into all languages. Here, the modifications on the graph might be : add ".@def" on the nodes containing "city", "Forum". replace "retrieve" by "recover" and add ".@complete" on the node containing it. It is not possible in principle to deduce the modification on the graph from a modification on the text. For example, replacing "un" ("a") by "le" ("the") does not entail that the following noun is determined (.@def), because it can also be generic ("il aime la montagne" = "he likes mountains"). The technique envisaged is that: revision is not done by modifying directly the text, but by using a menu system, the menu items have a "language side" and a hidden "UNL side", 7/10

8 The translation of examples, citations, definitions and glosses in the Papillon project when a menu item is chosen, only the graph is transformed, and the action to be done on the text is stored and shown next to its focus, at any time, the new graph may be sent to the source language deconverter and the result shown. If is is satisfactory, that shows that errors were due to the graph and not to the deconverter, and the graph may be sent to deconverters in other languages. Versions in some other languages known by the user should be dislayable, so that improvement sharing is visible and encouraging. In a scenario for more expert users, the UNL graph or the UNL tree is made visible and directly manipulatable, as well as the results of segmentation and lemmatization used to establish the fine-grained correspondence between the text and the graph necessay for coedition (modifications indicated on words have to be «transported» and «translated» as modifications on «corresponding parts» of the UNL graph). Show Graph Deconversion Find Lemma Find Correspondence Save Graph English Une cité retrouvera une zone côtière après un forum. a dormitory city remember retrieve find a zone area coastal after a Forum un cité retrouver un zone côtier après un Forum indef art noun verb indef art noun adj prop indef art noun sin sin future sin sin sin sin sin retrieve (icl>do)(.@entry.@future) zone(icl>place)(.@indef, obj) coastal(mod) After a Forum, a city will retrieve a coastal zone. After the Forum, the city will have recovered a coastal zone. Spanish Ciudad recobrará una zona de costal después Foro. La ciudad habrá recobrado una zona de costal después el Foro. Italian Città ricuperarà une zona costiera dopo Forum. La città ha ricuperato une zona costiera dopo il Forum. Japanese city(agt) after(tim) Forum(obj) Original text To Do Second Deconversion Manual Insertion Une cité retrouvera une zone côtière après un forum. la le Maj La cité retrouvera une zone côtière après le Forum. Chinese Graph : correspondence Simple text view Multiple text view Save Quit Figure 8 : example of coedition (expert mode) Conclusion and perspectives In any multilingual lexical data base from which really useful bilingual usage dictionaries have to be produced, all language elements such as examples, citations, definitions, glosses and labels expressed in each language have to be translated into all other languages and stored into the data base. Storing can be achieved in a simple and «seamless» way by introducing complex lexies and axies. But translating all these elements into all languages is a major subproject of the Papillon project. We propose to use the mutualization feature of the Papillon server and help voluntary contributors perform of postedit translations using a shared, web-oriented translation workbench using the «Montaigne» architecture. We also propose to use freely available UNL web sites to get first drafts of translations, thereby attaching full UNL graphs to complex axies. In the future, a «coedition» technique, still at the research stage, could be used to improve the UNL graphs a posteriori and transparently from any language, and get improved translations in all target languages. Although we have concentrated on translation in this presentation, we should also address the problem of creating free language elements, if possible in such a way that they are immediately available in all languages. Possible solutions are (1) to produce examples and perhaps definitions in several languages by extracting them 8/10

9 Christian Boitet from existing translation memories and (2) to try and use again UNL to generate parallel language elements from UNL graphs. But solution (2) can not work for citations, because a tentative (created) citation cannot be rejected because it is not found in the available corpora, as it may exist elsewhere. Handling the free language elements in a rich multilingual lexical data base such as Papillon is not only challenging from the scientific and technical points of view, but also from the organizational, sociological and intercultural points of view, because of the variety of contributors and techniques. References [1] Ampornaramveth V. (1998) Saikam: an online dictionary development project. Proc. 4th Intl. Workshop on Academic Information Networks and Systems (WAINS'4), NACSIS Seminar House, Karuizawa, Japan, February [2] Ampornaramveth V. (1998) Trilingual WWW interface to Saikam dictionary project. Proc. 5th Intl. Workshop on Academic Information Networks and Systems (WAINS'5), Bangkok, December 1998, AIT. [3] Ampornaramveth V., Aizawa A. & Oyama K. (2000) An Internet-based Collaborative Dictionary Development Project: SAIKAM. Proc. 7th Intl. Workshop on Academic Information Networks and Systems (WAINS'7), Bangkok, 7-8 December 2000, Kasetsart University, H. Sakaki ed. [4] Blanc E. (1993) Visite guidée de PARAX, une base lexicale pentalingue par acceptions sous HyperCard. GETA, IMAG, 30 p. [5] Blanc É., Sérasset G. & Tchéou F. (1994) Designing an Acception-Based Multilingual Lexical Data Base under HyperCard: PARAX. Research Report, GETA, IMAG (UJF & CNRS), Aug. 1994, 10 p. [6] Blanc É. (2000) From the UNL hypergraph to GETA's multilevel tree. Proc. MT'2000, Oxford, Oct. 2000, British Computer Society, 10 p. [7] Boitet C. & Blanchon H. (1994) Multilingual Dialogue-Based MT for Monolingual Authors: the LIDIA Project and a First Mockup. Machine Translation 9/2 (1994), pp [8] Boitet C. (1996) (Human-Aided) Machine Translation: a better future? In "Survey of the State of the Art of Human Language Technology", R. Cole (Editor-in-Chief), J. Mariani, H. Uszkoreit& al., ed., A. Z. G. Varile, Giardini, Pisa, pp (also available since 1996 at [9] Boitet C. (1996) Machine-Aided Human Translation. In "Survey of the State of the Art of Human Language Technology", R. Cole (Editor-in-Chief), J. Mariani, H. Uszkoreit& al., ed., A. Z. G. Varile, Giardini, Pisa, pp (also available since 1996 at [10] Boitet C. (1997) GETA's MT methodology and its current development towards personal networking communication and speech translation in the context of the UNL and C-STAR projects. Proc. PACLING-97, Ohme, 2-5 September 1997, Meisei University, H. Sakaki ed., pp (invited communication) [11] Boitet C. & Tsai W.-J. (2002) Coedition to share text revision across languages. Proc. COLING-02 WS on MT, Taipeh, 1/9/2002, 8 p. (accepted) [12] Fafiotte G. & Boitet C. (2000) Rapport final de la phase 1 du projet "FeV" (Réalisation d'un dictionnaire d'usage et d'une base terminologique par acceptions informatisés français-vietnamien via l'anglais). GETA, CLIPS, IMAG, 16 p. [13] Gut Y., Yusoff Z., Samat S. A., Boitet C., Nedobejkine N., Lafourcade M. & al. (1996) Kamus Perancis Melayu dewan - dictionnaire français-malais. Dewan Bahasa dans Pustaka, Kuala Lumpur, Malaisie, 1 vol., pp [14] Lafourcade M. (1996) Structured lexical data: how to make them widely available, useful and reasonably protected? - a practical example with a trilingual dictionary. Proc. COLING-96, Copenhagen, 4-9 Aug. 1996, ICCL, B. Maegaard ed., 4 p. [15] Lafourcade M. & Sérasset G. (1996) Apple Technology Integration - A WEB dictionary server as a practical example. MacTech magazine (MacTech magazine), 12/7 (1996), pp [16] Lafourcade M. (1996) Serveurs de dictionnaires - Etude de cas avec l'outil Alex et le projet de dictionnaire Français- Anglais-Malais. Proc. Séminaire Lexique - Représentation et Outils pour les Bases Lexicales - Morphologie Robuste, Grenoble, France, 13-novembre 1996, GDR-PRC - Communication Homme-Machine, vol. 1/1, pp [17] Lafourcade M. (1997) Construction et services dictionnaires n-lingues, exemple des projets Fe*. Proc. Quatrième conférence annuelle sur Le traitement Automatique du Langage Natural (TALN), Grenoble, France, juin 1997, CLIPS, IMAG, D. Genthial ed., pp [18] Lafourcade M. (1997) Multilingual Dictionary Construction and Services - Case Study with the Fe* Projects. Proc. PACLING'97, Meisei University, Ohme, Tokyo, Japan, 2-5 September 1997, PACL, H. Sakaki ed., vol. 1/1, pp [19] Lafourcade M. & Rivepiboon W. (1997) Issues in the French-English-Thai Dictionary Project. Proc. International Workshop on Human and Computer Processing of Language and Speech, Chulalongkorn University, Bangkok, Thailand, 8-12 December 1997, S. Luksaneeyanawin ed., vol. 1/1. [20] Mangeot M. (1999) Accès Internet au dictionnaire FEM (français-anglais-malais). GETA, CLIPS, IMAG, Grenoble, Dictionnaire trilingue d'usage. 9/10

10 The translation of examples, citations, definitions and glosses in the Papillon project [21] Mangeot M. (2001) Environnements centralisés et distribués pour lexicographes et lexicologues en contexte multilingue. Thèse, UJF (thèse préparée au GETA, CLIPS), 293 p. [22] Mangeot-Lerebours M. (1999) Visualisation et Navigation dans des bases de données hétérogènes. Proc. Journée de l'audiovisuel ANRT/INA, Paris, 23 septembre 1999, INA. [23] Mangeot-Lerebours M. (1999) Accès unique à des dictionnaires hétérogènes. Proc. Lexicologie, Terminologie, Traduction (LTT'99), Beyrouth, Liban, novembre 1999, AUPELF-UREF, A. Clas ed., 3 p. [24] Mangeot-Lerebours M. (2000) Papillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links. Proc. 7th Workshop on Advanced Information Network and SystemPacific Association for Computational Linguistics 1997 Conference (WAINS'7), Bangkok, Thailande, 7-8 décembre 2000, Kasetsart University, H. Sakaki ed., 6 p. [25] Mel tchuk I. (1981) Meaning-Text models: a recent trend in Soviet linguistics. Annual Review of Anthropology 10 (1981), pp [26] Mel tchuk I. & Polguère A. (1987) A Formal Lexicon in the Meaning-Text Theory: or How to do Lexica with Words. Computational Linguistics 13/3-4 (1987), pp [27] Mel tchuk I., Clas A. & Polguère A. (1995) Introduction à la lexicologie explicative et combinatoire. AUPELF- UREF/Duculot, Louvain-la-Neuve, 256 p. [28] Planas E. (1998) TELA: Structures et algorithmes pour la Traduction Fondée sur la Mémoire. Thèse, UJF (Grenoble 1), 7 July 1998, 375 p. [29] Planas E. & Furuse O. (1999) Considering Translation Memories as a Cross Language Information Retrieval system. Proc. MT Summit VII, Singapore, September 1999, Asia Pacific Ass. for MT, J.-I. Tsujii ed., 4 p. [30] Planas E. & Furuse O. (1999) A Close Multilevel String Matching Algorithm for Shallow Translation. Proc. TMI-99, 4 p. [31] Sérasset G. (1994) Recent Trends of Electronic Dictionary Research and Development in Europe. TM 038, EDR, Japon, mars 1994, 88 p. [32] Sérasset G. (1994) An Interlingual Lexical Organization Based on Acceptions. Proc. ICLA-94, July 1994, USM, 12 p. [33] Sérasset G. (1994) Interlingual Lexical Organisation for Multilingual Lexical Databases. Proc. 15th International Conference on Computational Linguistics, COLING-94, 5-9 Aug. 1994, 6 p. [34] Sérasset G. (1994) SUBLIM, un système universel de bases lexicales multilingues; et NADIA, sa spécialisation aux bases lexicales interlingues par acceptions. Nouvelle thèse, UJF (Grenoble 1). [35] Sérasset G. (1996) Un éditeur pour le dictionnaire explicatif et combinatoire du français contemporain. Proc. Journées lexique du PRC-CHM, Grenoble, D. Genthial ed. [36] Sérasset G. (1996) Informatisation du Dictionnaire Explicatif et Combinatoire : le projet NADIA-DEC. Proc. Lexicomatique et Dictionnairique, Lyon, septembre 1996, AUPELF UREF, A. Clas ed. [37] Sérasset G. (1997) Informatisation du Dictionnaire Explicatif et Combinatoire. Proc. TALN-97, Grenoble, juin 1997, CLIPS, IMAG, D. Genthial ed., pp [38] Sérasset G. & Polguère A. (1997) Outils pour lexicographes : application à la lexicographie explicative et combinatoire. Proc. RIAO'97, Montréal, juin 1997, vol. 2/2, pp [39] Sérasset G. (1997) Le projet NADIA-DEC : vers un dictionnaire explicatif et combinatoire informatisé? Proc. La mémoire des mots, 5ème journées scientifiques du réseau LTT, Tunis, septembre 1997, AUPELF UREF, A. Clas ed., 7 p. [40] Sérasset G. & Mangeot M. (1998) L'édition lexicographique dans un système générique de gestion de bases lexicales multilingues. Proc. Natural Language Processing and Industrial Applications, Moncton, vol. 1/2, pp [41] Sérasset G. & Boitet C. (2000) On UNL as the future "html of the linguistic content" & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter. Proc. COLING-2000, Saarbrücken, 31/7 3/8/2000, ACL, H. Uszkoreit ed., 7 p. (submitted) [42] Sérasset G. & Mangeot-Lerebours M. (2001) Papillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links. Proc. NLPRS-2001, NII, Tokyo, November 2001, pp [43] Tomokiyo M., Mangeot-Lerebours M. & Planas E. (2000) Papillon : a Project of Lexical Database for English, French and Japanese, using Interlingual Links. Proc. Journées des Sciences et Techniques de l'ambassade de France au Japon, Tokyo, Japon, novembre 2000, Ambassade de France au Japon, 3 p. 10/10