1 [From: Advances in Computers, vol. 1 (1960), p ] The Present Status of Automatic Translation of Languages YEHOSHUA BAR-HILLEL Hebrew University, Jerusalem, Israel 1. Aims and Methods, Survey and Critique 1.1 Introduction 1.2 Unreasonableness of Aiming at Fully Automatic, High Quality Translation 1.3 Commercial Partly Mechanized, High Quality Translation Attainable in the Near Future 1.4 Compromising in the Wrong Direction 1.5 A Critique of the Overestimation of Statistics and the "Empirical Approach" 2. Critical Survey of the Achievements of the Particular MT Research Groups 2.1 The USA Groups 2.2 The British Groups 2.3 The USSR Groups 2.4 Other Groups 3. Conclusion 4. Remark on Bibliography References Appendices I. MT Statistics as of April 1, 1959 II. Some Linguistic Obstacles to Machine Translation References III. A Demonstration of the Nonfeasibility of Fully Automatic, High Quality Translation References 1. Aims and Methods, Survey and Critique 1.1 Introduction Machine translation (MT) has become a multimillion dollar affair. It has been estimated 1 that in the United States alone something like one and one-half million dollars were spent in 1958 upon research more or less closely connected with MT, with approximately one hundred and fifty people, among them eighty with M.A., M.Sc. or higher degrees, working in the field, full or part time. No comparable figures are available for Russia, 2 but it is generally assumed that the number of people engaged there in research on MT is higher than in the States. At a conference on MT that took place in Moscow in May 1958, 347 people from 79 institutions were reported to have participated. Not all participants need necessarily be actively involved in MT research. There exist two centers of research in MT in England, with a third in the process of formation, and one center in Italy. Outside these four countries, MT has been taken up only occasionally, and no additional permanent research groups seem to have been created. Altogether, I would estimate that the equivalent of between 200 and 250 people were This article was prepared with the sponsorship of the Informations Systems Branch, Office of Naval Research, under Contract NR Reproduction as a whole or in part for the purposes of the U. S. Government is permitted. 1 This estimate is not official. In addition, it is still rather difficult to evaluate available machine time. Some basis for the estimate is provided in Appendix I. 2 Reitwiesner and Weik, in their report cited in reference , say on p. 34 that "Dr. Panov's group consists of approximately 500 mathematicians, linguists and clerical personnel, all working on machine translation of foreign languages into Russian and translations between foreign languages with Russian as an inter-language." No source for this figure is given, and it is likely that some mistake was made here.
2 working full-time on MT at the end of 1958, and that the equivalent of three million dollars were spent during this year on MT research. In comparison, let us notice that in June 1952, when the First Conference on Machine Translation convened at MIT, there was probably only one person in the world engaged more than half-time in work on MT, namely myself. Reduced to full-time workers, the number of people doing research on MT could not at that time have been much more than three, and the amount of money spent that year not much more than ten thousand dollars. For the 1952 MT Conference I had prepared in mimeograph a survey of the state of the art . That report was based upon a personal visit to the two or three places where research on MT was being conducted at the time, and seems to have been quite successful, so I was told, in presenting a clear picture of the state of MT research as well as an outline of the major problems and possibilities. Time has come to critically evaluate the progress made during the seven years that have since passed in order to arrive at a better view of these problems and possibilities. To my knowledge, no evaluation of this kind exists, at least not in English. True enough, there did appear during the last year two reviews of the state of MT, one prepared by the group working at the RAND Corporation , the other by Weik and Reitwiesner at the Ballistic Research Laboratories, Aberdeen Proving Ground, Maryland . The first of these reviews was indeed well prepared and is excellent as far as.it goes. However, it is too short to go into a detailed discussion of all existing problems and, in addition, is not always critical to a sufficient degree. The second review seems to have been prepared in a hurry, relies far too heavily on information given by the research workers themselves, who by the nature of things will often be favorably biased towards their own approaches and tend to overestimate their own actual achievements, and does not even attempt to be critical. As a result, the picture presented in this review is somewhat unbalanced though it is still quite useful as a synopsis of certain factual bits of information. Some such factual information, based exclusively upon written communication from the research groups involved, is also contained in a recent booklet published by the National Science Foundation . Brief histories of MT research are presented in the Introductory Comments by Professor Dostert to the Report of the Eighth Annual Round Table Conference on Linguistics and Language Study  as well as in the Historical Introduction to the recent book by Dr. Booth and associates . The present survey is based upon personal visits during October and November 1958 to almost all major research centers on MT in the United States, the only serious exception being the center at the University of Washington, Seattle, upon talks with members of the two research groups in England, and upon replies to a circular letter sent to all research groups in the United States asking for as detailed information as possible concerning the number and names of people engaged in research within these groups, their background and qualifications, the budget, and a short statement of the plans for the near future, as well as, of course, upon a study of all available major publications including also, as much as possible, progress reports and memoranda; with regard to the USSR I had, unfortunately, to rely exclusively on available English translations of their publications and on reports which Professor Anthony G. Oettinger, of the Harvard Computation Laboratory, who had visited the major Russian research centers in MT in August 1958, was so kind to put at my disposal. Some of the purely technical information with regard to the composition of the various MT research groups, their addresses and budgets is presented in Appendix I in tabular form. 1.2 Unreasonableness of Aiming at Fully Automatic High Quality Translation During the first years of the research in MT, a considerable amount of progress was made which sufficed to convince many people, who originally were highly skeptical, that MT was not just a wild idea. It did more than that. It created among many of the workers actively engaged in this field the strong feeling that a working system is just around the corner. Though it is understandable that such an illusion should have been formed at the time, it was an illusion. It was created, among other causes, also by the fact that a large number of problems were rather readily solved, and that the output of machine-simulated "translations" of various texts from Russian, German or French into English were often of a form which an intelligent and expert reader could make good sense and use of. It was not sufficiently realized that the gap between such an output, for which only with difficulty the term
3 "translation" could be used at, all, and high quality translation proper, i.e., a translation of the quality produced by an experienced human translator, was still enormous, and that the problems solved until then were indeed many but just the simplest ones, whereas the "few" remaining problems were the harder ones very hard indeed. Many groups engaged in MT research still regard fully automatic, high quality translation (FAHQT) as an aim towards which it is reasonable to work. Claims to the effect that FAHQT from Russian to English is attainable in the near future were recently made, for instance, by one of the four subgroups working on MT at Georgetown University (Section 2.1.3). I shall discuss these claims below. But let me state already at this point that I could not be persuaded of their validity. On the contrary, I am quite ready to commit myself to concoct Russian sentences or, should this for some reason be regarded as unfair, to exhibit actually printed Russian sentences for which a perusal of the proposed translation program of this group, or of any other group that would offer in the near future a method of fully automatic translation, would result either in gibberish or, what is even worse, in meaningful but wrong translations. I am so convinced of this because I believe to be in possession of an argument which amounts to an almost full-fledged demonstration of the unattainability of FAHQT, not only in the near future but altogether. This demonstration is given in Appendix III. Most groups, however, seem to have realized, sometimes very reluctantly, that FAHQT will not be attained in the near future. Two consequences can be drawn from this realization. One can go on working with FAHQT in mind, in the hope that the pursuit of this aim will yield interesting theoretical insights which will justify this endeavor, whether or not these insights will ever be exploited for some practical purpose. Or one gives up the ideal of FAHQT in favor of some less ambitious aim with a better chance of attainability in the near future. Both consequences are equally reasonable but should lead to rather different approaches. Lack of clarity in this respect, vague hopes that somehow or other both aims can be attained simultaneously and by the use of the same methods, must lead to confusion and result in waste of effort, time and money. Those who are interested in MT as a primarily practical device must realize that full automation of the translation process is incompatible with high quality. There are two possible directions in which a compromise could be struck; one could sacrifice quality or one could reduce the self-sufficiency of the machine output. There are very many situations where less than high quality machine output is satisfactory. There is no need to present examples. If, however, high quality is mandatory and I do not think, for instance, that scientists are prepared to be satisfied with less than the present average standard of human translation, while many regard this standard as too low for their purposes then the machine output will have to be post-edited, thereby turning, strictly speaking, machine translation into machine aids to translation. 1.3 Commercial Partly Mechanized, High Quality Translation Attainable in the Near Future In the remainder of this survey, I shall exclusively deal with those situations where translation involved has to be of high quality. It should be easy to see how the conclusions at which I arrive have to be modified in order to deal with situations in which lesser quality is satisfactory. As soon as the aim of MT is lowered to that of high quality translation by a machine-posteditor partnership, the decisive problem becomes to determine the region of optimality in the continuum of possible divisions of labor. It is clear that the exact position of this region will be a function of, among other things, the state of linguistic analysis to which the languages involved have been submitted. It may be safely assumed that, with machine-time/efficiency becoming cheaper and human time becoming more expensive, continuous efforts will be made to push this region in the direction of reducing the human element. However, there is no good reason to assume that this region can be pushed to the end of the line, certainly not in the near future. It seems that with the state of linguistic analysis achieved today, and with the kind of electronic computers already in existence or under construction, especially with the kind of large capacity, low cost and low-access-time internal memory devices that will be available within a few years, a point has been reached where commercial partly mechanized translation centers stand a serious chance of becoming a practical reality. However, various developments are still pending and certain decisions will have to be made.
4 First, a reliable and versatile mechanical print reader will have to become available. It has been estimated that the cost of retyping printed Russian material into a form and on a medium that could be processed by a machine would amount, under present conditions, to about one fourth of a cent per word . This estimate is probably too low, as the quality of the retyping has to be exceptionally high, in order to avoid printing mistakes which would perhaps be quite harmless for a human reader, but could be rather disastrous for machines which so far are totally unable to deal with misprints. The original text might therefore have to be keypunched by two operators, verified, etc., or else to be keypunched once, but at highly reduced speed. Indeed, whereas the above estimate is based on a rate of 20 Russian words per minute, another report  gives the maximum rate of trained and experienced keypunch operators as half this number. In one place , it is estimated that an automatic print reader might be ten times cheaper than human retyping. The difference between one half of a cent per keypunched word and one twentieth of a cent per print-read word could make all the difference, as the present cost per word of human Russian-to-English translation in the United States is generally given as lying between one and three cents , apparently depending on the quality and urgency of the job, and perhaps also on the exact form of the output. The costs may be different, of course, for other language pairs and in other countries. An informative synopsis on the variation of rates of payment for scientific and technical translation is given in a recent UNESCO survey . Secondly, a concerted effort will have to be made by a pretty large group in order to prepare the necessary dictionary or dictionaries in the most suitable form. That this is not such a straightforward affair as laymen are apt to think becomes clear in the work of the Harvard MT group [12, 13]. This group developed an interesting semiautomatic method for preparing dictionaries (Section 2.1.6). Thirdly, a good amount of thinking accompanied by an equally large amount of experimenting will have to go into the determination of the location of the interval in the above-mentioned continuum within which the optimal point of the division of labor between machine and post-editor will have a good chance of being situated, as a function of the specific translation program and the specific qualities of the envisaged post-editor. Among other things, these studies would have to determine whether some minimal pre-editing, while requiring but very little knowledge of the source language by the pre-editor, could not be utilized in order to reduce the load of the machine by a considerable amount. At present, many of the experimental MT programs make use of such limited pre-editing (Section 2.1.4). As one illustration of an operation that is in almost all cases so ridiculously simple for a human pre-editor that it could be almost instantaneously performed by a keypunch operator with only the barest knowledge of the source language, let me mention the distinction between the functioning of a point as a period, hence as one of the all-important markers of end-of-sentence, and its various other functions. Having the machine make this decision a vital one, indeed so vital that it is one of the first operations, if not the first, in many translation programs that shun the use of pre-editing altogether might be a complex and costly affair, throwing some doubts on the soundness of the case presented above in favor of a mechanical print reader. For the time being, at least, so long as keypunching is being used for the input, it is doubtless profitable to introduce as much elementary preediting as the keypunch operator can take into stride without considerably slowing down. Fourthly, an old question which has not been treated so far with sufficient incisiveness, mostly because the ideal of FAHQT diverted the interests of the research workers into other, less practical directions, namely the question whether MT dictionaries should contain as their source-language entries all letter sequences that may occur between spaces, sometimes called inflected forms, or rather so-called canonical forms, or perhaps something in between like canonical stems , has to be decided one way or other before mass production of translations is taken up. This question is clearly highly dependent, among other things, upon the exact type of internal and external memory devices available, and it is therefore mandatory to have a reliable estimate of this dependence. It is obvious that the speed of the machine part of the translation, and thereby the cost of the total translation process, will depend to a high degree on the organization of the dictionaries used. Most workers in the field of MT seem to have rather definite, though divergent, opinions in this respect. However, I am not aware of any serious comparative studies, though the outcome of such studies most surely will have a considerable impact upon the economics of MT. In general, the intention of reducing the post-editor's part has absorbed so much of the time
5 and energy of most workers in MT, that there has not been sufficient discussion of the problem whether partially automatic translation, even with such a large amount of participation by the posteditor as would be required under present conditions, is not nevertheless a desirable and feasible achievement. I fully understand the feeling that such an achievement is not of very high intellectual caliber, that the real challenge has thereby not yet been taken up, but I do not think that those agencies for whom any reduction of the load imposed at the moment on the time of highly qualified expert translators is an important achievement, should necessarily wait with the installation of commercial man-machine translation outfits until the post-editor's part has become very small, whatever amount of satisfaction the MT research worker will get from such an achievement. It is gratifying to learn that this attitude coincides with that of the Harvard group (Section 2.1.6) and is probably now shared by many other groups in the USA, USSR, and England, though it would further the issue if clear-cut statements of policy could be obtained in this respect. 1.4 Compromising in the Wrong Direction At this stage, it is probably proper to warn against a certain tendency which has been quite conspicuous in the approach of many MT groups: These groups, realizing that FAHQT is not really attainable in the near future so that a less ambitious aim is definitely indicated, had a tendency to compromise in the wrong direction for reasons which, though understandable, must nevertheless be combated and rejected. Their reasoning was something like the following: since we cannot have 100% automatic high quality translation, let us be satisfied with a machine output which is complete and unique, i.e., a smooth text of the kind you will get from a human translator (though perhaps not quite as polished and idiomatic), but which has a less than 100% chance of being correct. I shall use the expression "95%" for this purpose since it has become a kind of slogan in the trade, with the understanding that it should by no means be taken literally. Such an approach would be implemented by one of the two following procedures: the one procedure would require to print the most frequent target-language counterpart of a given source-language word whose ambiguity has not been resolved by the application of the syntactical and semantical routines, necessitating, among other things, largescale statistical studies of the frequency of usage of the various target renderings of many, if not most, source-language words; the other would be ready to work with syntactical and semantical rules of analysis with a degree of validity of no more than 95%, so long as this degree is sufficient to insure uniqueness and smoothness of the translation. This approach seems wrong to me and even dangerous since the machine output of the corresponding program will be of low quality in a misleading and soothing disguise. Since so many sentences, "5%" of a given text, will have a good chance of being mistranslated by the machine, it is by no means clear whether the reader will always be able to detect these mistranslations, just because the machine output in so smooth and grammatical (so let us assume for the sake of the argument, though I doubt whether even this much can really be achieved at this stage of the game) that he might be able to find only few cues to warn him that something is wrong with it. It is not inconceivable that the machine translation would be so wrong at times as to lead its user to actions which he would not have taken when presented by a correct translation. (When I talk about "100%," I obviously have in mind not some heavenly ideal of perfection, but the product of an average qualified translator. I am aware that such a translator will on occasion make mistakes and that even machines of a general low quality output will avoid some of these mistakes. I am naturally comparing averages only.) But there is really no need at all to compromise in the direction of reducing the reliability of the machine output. True enough, a smooth machine translation looks impressive, especially if the reader is unable to realize at first sight that this translation is faulty ever so often, but this esthetically appealing feature should not blind us to see the dangers inherent in this approach. It is much safer to compromise in the other direction. Let us be satisfied with a machine output which will ever so often be neither unique nor smooth, which ever so often will present the post-editor with a multiplicity of renderings among which he will have to take his choice, or with a text which, if it is unique, will not be grammatical. On the other hand, whenever the machine output is grammatical and unique it should be, to adopt a slogan current in the Harvard group, "failsafe" (to about the same degree, to make this qualification for the last time, as the average qualified human translator's output is fail-safe). Let the
6 machine by all means provide the post-editor with all possible help, present him with as many possible renderings as he can digest without becoming confused by the embarras de richesse and here again we have quite a problem of finding an interval of optimality but never let the machine make decisions by itself on purely frequential reasons even if these frequencies can be relied upon. If these frequency counts could be done cheaply and I doubt very much whether this is feasible to such a high degree of reliability as would probably be required for our purposes let this information too be given the post-editor, but by no means should practical MT wait until this information is obtained. The only reasonable aim, then, for short-range research into MT seems to be that of finding some machine-post-editor partnership that would be commercially competitive with existing human translation, and then to try to improve the commercial effectiveness of this partnership by improving the programming in order to delegate to the machine more and more operations in the total translation process which it can perform more effectively than the human post-editor. These improvements will, of course, utilize not only developments in hardware, programming (especially automatic programming), and linguistic analysis, but also the experience gained by analyzing the machine output itself. Should it turn out that for the sake of competitiveness some use of a pro-editor, and perhaps even of a bilingual post-editor, would be at least temporarily required, then this fact should be accepted as such, in spite of the trivialization of the theoretical challenge of the MT problem which would be entailed by such a procedure. 1.5 A Critique of the Overestimation of Statistics and the "Empirical Approach" Let me finish this part of the survey by warning in general against overestimating the impact of statistical information on the problem of MT and related questions. I believe that this overestimation is a remnant of the time, seven or eight years ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication. Though it is often possible by a proper organization of the research effort to get a certain amount of statistical information at no great extra cost, it is my impression that much valuable time of MT workers has been spent on trying to obtain statistical information whose impact on MT is by no means evident. It is not true that every statistic on linguistic matter is automatically of importance for MT so that the gathering of any such statistics could be regarded as an integral part of MT research without any need for additional justification. Gathering of statistics is regarded by many MT groups as being part of a more general methodological approach the so-called "empirical approach" . This term has already caused a lot of confusion. I am using it here in the sense in which it is employed by the RAND group . This sense should become obvious from the following discussion. Adherents of this approach are distrustful of existing grammar books and dictionaries, and regard it as necessary to establish from scratch the grammatical rules by which the source-language text will be machine analyzed, through a human analysis of a large enough corpus of source-language material, constantly improving upon the formulation of these rules by constantly enlarging this corpus. With regard to dictionaries, a similar approach is often implemented and a dictionary compiled from translations performed by bilingual members of the group or by other human translators considered to be qualified by this group. This approach seems to me somewhat wasteful in practice and not sufficiently justified in theory. The underlying distrust seems to have been caused by the well-known fact that most existing grammars are of the normative type, hence often of no great help in the analysis of actual writing (and to an even higher degree, of actual speech), and that existing dictionaries are of such a nature that quite often none of the presented target-language counterparts of a source-language word are satisfactory within certain contexts, especially with regard to terms used.in recently developed scientific fields. However, even in view of these facts, I believe that the baby has far too often been thrown away with the bathwater. No justification has been given for the implicit belief of the "empiricists" that a grammar satisfactory for MT purposes will be compiled any quicker or more reliably by starting from scratch and "deriving" the rules of grammar from an analysis of a large corpus than by starting from some authoritative grammar and changing it, if necessary, in accordance with analysis of actual texts. The same holds mutatis mutandis with regard to the compilation of dictionaries. But grammars have in
7 general not wholly been dreamt up, nor have dictionaries been compiled by some random process. Existing grammars and dictionaries are already based, though admittedly not wholly, upon actual texts of incomparably larger extension than those that serve as a basis for the new compilers. Russian is not Kwakiutl, and with all due regard to the methods and techniques of structural linguistics and to the insights which this science has given us in respect to some deficiencies of traditional grammars, I do not think that it follows from its teachings that all existing codifications of languages with a highly developed literature should be totally disregarded. Let me add, without going here into details for lack of space, that the empiricalness of the derivations of grammar rules from actual texts is rather doubtful as such. For certain general methodological considerations one might as well be led to the conclusion that these rules incorporate a lot of subjective and highly biased and untested assumptions such that their degree of validity might very well, on the average, be lower than that of the well-established, often-tested and critically examined grammars, in spite of their normativity. 2. Critical Survey of the Achievements of the Particular MT Research Groups After these far too short (and therefore occasionally rather dogmatic) general comments, it is now time for a more detailed survey of the approaches and achievements of the twenty or so groups which are at present actively engaged in research on MT or on linguistic topics believed to be of immediate relevance for MT. In one case a defunct group (Section 2.1.5) is being mentioned, first because it made significant contributions during its existence, and secondly because there is still some chance that it may be revived. This survey will deal exclusively with the more general aspects of the MT problem and especially with research methodology. Therefore, the innumerable specific advances of the various groups will) regard to coding, transliterating, keypunching, displaying of output, etc., will be mentioned only rarely. But the list of references should contain sufficient indications for the direction of the reader interested in these aspects. The order in which these groups will be discussed is: USA, Great Britain, USSR, others, following, with one exception, the order of degree of my personal acquaintance. Within each subdivision, the order will in general be that of seniority. 2.1 The USA Groups THE SEATTLE GROUP Professor Erwin Reifler of the University of Washington, Seattle, started his investigations into MT in 1949, under the impact of the famous memorandum by Weaver , and has since been working almost continuously on MT problems. The group he created has been constantly increasing in size and is at present one of the largest in the States. In February 1959, it published a 600-page report describing in detail its total research effort. This report has not reached me at the time of writing this survey (April 1959) which is the more unfortunate as the latest publication stemming from this group is a talk presented by Reifler in August 1957 , and I was, due to a personal mishap, unable to visit Seattle during my stay in the States. It is not impossible that my present discussion is considerably behind the actual developments. The efforts of this group seem to have concentrated during the last years on the preparation of a very large Russian- English automatic dictionary containing approximately 200,000 so-called "operational entries'' whose Russian part is probably composed of what was termed above (Section 1.3) "inflected forms" (as against the million or so inflected forms corresponding to the total Russian vocabulary of one hundred thousand canonical forms). This dictionary was to be put on a photoscopic memory device, developed by Telemeter-Magnetics Inc. for the USA Air Force, which combines a very large storage capacity with very low access time and apparently is to be used in combination with one of the large electronic computers of the IBM 709 or UNIVAC 1105 types. The output of this system would then be one version of what is known as word-by-word translation, whose exact form would depend on the specific content of the operational entries and the translation program. Both are unknown to me though probably given in the above mentioned report. Word-by-word Russian-to-