How To Write A Machine Translation



Similar documents
Oconteudodopresenterelatorioedeunicaresponsabilidadedo(s)autor(es). (Thecontentsofthisreportarethesoleresponsibilityoftheauthor(s).

Ten Steps to Comprehensive Project Portfolio Management Part 3 Projects, Programs, Portfolios and Strategic Direction By R.

Unifying Epistemologies by Combining World, Description and Observer

CALIFORNIA CITIZENS COMPENSATION COMMISSION SALARY AND BENEFIT RESOLUTION April 14, 2011

Aplicando enfoque MDE a aplicaciones WEB-SOA

Handbook on Test Development: Helpful Tips for Creating Reliable and Valid Classroom Tests. Allan S. Cohen. and. James A. Wollack

Lab Experience 17. Programming Language Translation

Summary Report T2 Migration Test

Unit 1 Learning Objectives

Table 1: Stage 1, Semester 1

Simnet Registry Repair User Guide. Edition 1.3

Principal MDM Components and Capabilities

Scenario: Optimization of Conference Schedule.

Semantic Business Process Management Lectuer 1 - Introduction

Big Data Architect Certification Self-Study Kit Bundle

Chap 1. Introduction to Software Architecture

Business Data Authority: A data organization for strategic advantage

Human Resource Management. Scott Coplan, PMP David Masuda, M.D.

M.Sc. Program in Informatics and Telecommunications

IACBE Advancing Academic Quality in Business Education Worldwide

Reasons for need for Computer Engineering program From Computer Engineering Program proposal

Intro to Linguistics Semantics

Business Modeling with UML

<INSERT PROJECT NAME> DATA MIGRATION CHECKLIST

What Is Linguistics? December 1992 Center for Applied Linguistics

Draft dpt for MEng Electronics and Computer Science

Outline of today s lecture

Likewise, we have contradictions: formulas that can only be false, e.g. (p p).

Optimization of SQL Queries in Main-Memory Databases

ICT Competency Profiles framework Job Stream Descriptions

TOGAF TOGAF & Major IT Frameworks, Architecting the Family

TOGAF. TOGAF & Major IT Frameworks, Architecting the Family. by Danny Greefhorst, MSc., Director of ArchiXL. IT Governance and Strategy

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

Big Data Governance Certification Self-Study Kit Bundle

Pragmatic theories 1/15/2010 CHAPTER 2 ACCOUNTING THEORY CONSTRUCTION. Descriptive pragmatic approach: Criticisms of descriptive pragmatic approach:

A Survey on Requirement Analysis in the Nigerian Context

HAS BRAZIL REALLY TAKEN OFF? BRAZIL LONG-RUN ECONOMIC GROWTH AND CONVERGENCE

Treating Interfaces as Components

Smarter Balanced Assessment Consortium: English/Language Arts Practice Test Scoring Guide Grade 11 Performance Task

COMPENSATION PLAN. One Time Payment Fee Paid Commission Sales Agent $5,500 $1, Partial Payment Fee Paid Commission Sales Agent $1,000 $181.

Argumentum, 3 (2007), Kossuth Egyetemi Kiadó (Debrecen) Szakcikk. Native speaker and non-native speaker discourse marker use

The Enterprise Semantic Reference Framework

Nr.: Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

Application of ontologies for the integration of network monitoring platforms

Modeling coherence in ESOL learner texts

Theodoros. N. Arvanitis, RT, DPhil, CEng, MIET, MIEEE, AMIA, FRSM

Four degrees of separation from SaaS Jan Aleman, CEO Servoy

A Tipping Point for Automation in the Data Warehouse.

Bilingual Education Assessment Urdu (034) NY-SG-FLD034-01

Keywords: Fuzzy Logic, Control, Refrigeration Systems and Electronic Expansion Valves.

IACBE Advancing Academic Quality in Business Education Worldwide

Fun with Henry. Dirk Pattinson Department of Computing Imperial College London

FiberOptix TM. AP Sensor Zeroing and Calibration Guide INTRA-AORTIC BALLOON PRODUCT SUPPORT HOTLINE. U.S. and Canada: IABP (4227)

Broad and Integrative Knowledge. Applied and Collaborative Learning. Civic and Global Learning

Lecture Notes on Database Normalization

Realizing True Data Integrity Through Automated Discrepancy Management

The software process. Generic software process models. Waterfall model. Software Development Methods. Bayu Adhi Tama, ST., MTI.

Big Data Governance Certification Self-Study Kit Bundle

Presenting your thesis as a series of papers to do or not to do... Dr Sato Juniper Manager, Graduate Research and Scholarships

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective

Spanning Backup for Google Apps Service Level Agreement

User s manual for Android Application

Outline. The Spiral Model of Software Development and Enhancement. A Risk-Driven Approach. Software Process Model. Code & Fix

Verifying Semantic of System Composition for an Aspect-Oriented Approach

RFP Automated Agenda Workflow System Questions and Answers

How a Content Management System Can Help

UNESCO S CONTRIBUTIONS TO THE DRAFT OUTCOME STATEMENT OF THE NETMUNDIAL CONFERENCE. Introduction

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

Using High Availability Technologies Lesson 12

SVN Setup and Configuration Management

Some Methodological Clues for Defining a Unified Enterprise Modelling Language

Planning of Project Work (IS PM 6. Lecture, 2011 Spring)

Implementation Date Fall Marketing Pathways

USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT

Semantics and Verification of Software

E-Business: How Businesses Use Information Systems

Integrating Cloud File Sharing Platforms with Enterprise Applications & Intelligent Workflow

MEET THE HOTEL ASSET MANAGEMENT

Transcription:

Oconteudodopresenterelatorioedeunicaresponsabilidadedo(s)autor(es). (Thecontentsofthisreportarethesoleresponsibilityoftheauthor(s).) TextStructureAimingatMachineTranslation HoracioSaggionandAriadneCarvalho RelatorioTecnicoDCC{95-22 Dezembrode1995

TextStructureAimingatMachineTranslation DepartamentodeCi^enciadaComputac~ao HoracioSaggionandAriadneCarvalho UniversidadeEstadualdeCampinas fsaggion,ariadneg@dcc.unicamp.br 13081-970Campinas-SP-Brazil CaixaPostal6065 phenomenaconsideredhereisdeniteanaphora. tuguese.theemphasisoftheworkisonthedeterminationofthetextstructureofsuch abstracts,makinguseofthenotionsofcohesionandcoherence.themaincohesion machinetranslationsystem.wehaveusedassourcetextscienticabstractsinpor- beabletocapturethesemanticcontentoftheoriginaltext.theworkpresentedhere isconcernedwiththeautomaticgenerationofsucharepresentation,tobeusedbya Machinetranslationreliesontheexistenceofameaningrepresentationwhichmust Abstract 1Introduction AMachineTranslation(MT)systemcanbeseenasaspecialkindofNaturalLanguage Understanding(NLU)system.ThetextintheSourceLanguage(SL)isanalysedbya programwhichproducesameaningrepresentationofthetext,tobeusedbyageneration Keywords:MachineTranslation,TextualStructure,CoherenceRelations,DeniteAnaphora. programwhichwillproducetheconveyedmessageinthetargetlanguage(tl),seefigure freerepresentation,whichincludessemanticandsyntacticfeaturesnecessaryforrepresentingtheexactmeaningofwordsandsentences[lewis,1992]. ThisworkisbeingsponsoredbyFAPESP(Fundac~aodeAmparoaPesquisadoEstadodeS~aoPaulo) ThiskindofapproachtoMTreliesontheexistenceofaninterlingua,akindoflanguage Figure1:NLUSystem andcnpq(conselhonacionaldedesenvolvimentocientcoetecnologico) 1 1. SOURCE LANGUAGE UNDERSTANDING MEANING GENERATION TARGET LANGUAGE

MTsystem.Inourapproach,textistreatedasacoherentandcohesivewhole.Weaim tions[tucker,1984,scolum,1985]and,asaresult,textisrepresentedasadisconnected atanexplicitrepresentationofsomediscoursephenomenaasameansofobtaininghigh delitytotheoriginaltext[danlos,1987].weareconcernedherewiththedevelopmentof qualitytranslation.themainideaisthattherepresentationmustincludenotonlythe ameaningrepresentationofabstractsfortechnicalpapersinportuguese,tobeusedbyan sequenceofsentences.thechoiceontherepresentationwillinuenceaspectssuchas- MeaningrepresentationsinMTsystemsareusuallyextensionsofsyntacticrepresenta- formimprovestherightchoiceofconnectivesandsupercialformsduringthegeneration propositionalcontentofthetext,butalsothecoherencerelationsthattiepropositionstogetherinthetext.webelievethattherepresentationofcoherencerelationsinanabstract process[mooreandparis,1994].thechoiceofaconnectivealsoimposesrestrictionson therealizationofthepropositionsinvolvedintherelation. 2AbstractsandtheirStructure representationisshown;insection5wepresentadetailedexampleand,nally,insection 6wepresentourconclusions. anabstractispresented;insection3weintroducethemeaningrepresentationwhichwill begeneratedbythesystem;insection4theprocessofautomaticconstructionofthe Therestofthepaperisorganizedasfollows:inSection2thegeneralstructureof wayinspiteofthefactthatabstractsarewrittentobeasbriefandconciseaspossible,they presentcharacteristicsobservedinlargertexts,inparticularcoherenceandcohesiverelations[hobbs,1978a,hobbs,1978b,mannandthompson,1983,mannandthompson,1987, abstracttogivethemenoughinformationabouttheworkinordertodecideiftheywill Itprovidesthereaderwithabriefpreviewofthepaper.Manyreadersdependonthe Anabstractistherstsectionofareport,comingafterthetitleandbeforetheintroduction. readtheentirereportornot.abstractsfromalmostalleldsarewritteninaverysimilar HallidayandHasan,1976]. mayincludethefollowinginformation: andaccordingto[weissbergandbuker,1990,hutchins,1985,jordan,1991],anabstract anabstractwillbeamixtureofboth. andindicative,whichsimplystatethefactthatcertaintopicsarecovered.butingeneral tional.accordingtothebraziliantechnicalnormsassociation(abnt)[abnt,1987], informative,whichincludeactualresults,guresandconclusionsfromsourcedocuments, Accordingto[Hutchins,1985,Jordan,1991],twobasictypesofabstractsareidentied: background:where,asthenamesays,backgroundinformationmustbegiven; purposeorobjective:wheretheobjectivesoftheworkmustbedescribed; Thetypeandtheorderoftheinformationincludedinanabstractareveryconven- method:wherenewtechniquesandmethodologicalprinciplesaredescribed; 2

aboutthemethodwhichwasusedandtheresultswhichwereobtained,verbsshouldbe talkingaboutthepurposeofthework,verbsshouldbeinthepresenttense;whentalking languageconventions.themostcommonistheconventiononverbuses,whichis:when conclusion. results:wherenewfactsarepresented; auxiliariesmightbeused. Engenharia"1.Wehavechosentostudyabstractsfromthismagazinebecausesubmissions inthepasttense;intheconclusionverbscouldeitherbeinthepresenttenseormodal Ourresearchisbasedonacorpusofabstractstakenfromthe\RevistadeEnsinode Aswellasnormsonwhichinformationshouldbepresentedinanabstract,therearealso toitmustagreewiththenormsfromtheabnt. takenfromthismagazine: activeorpassivevoiceisusuallyusedwhentheauthoristalkingaboutthepaper.asthe \cues"areusefulfortheidenticationofthetypeofinformationpresentintheabstract. workisgenerallyprevioustothepaper,pasttenseisusedtointroducedetailsaboutthescienticwork(experiments,experiences,methodologicalandtheoreticalframework).these Someempiricalfactscanbeveriedinthecorpus.Forexample,presenttenseinthe Figure2showsanexampleofanabstract[daSilveiraNetoandHernandezMendoza,1988] (4)testee(5)analisetermicadestetipodetrocador.(6)Usandoometodore- (1)Trocadoresdecalorcompactoss~aoelementosbasicosedealtaeci^encia.(2) Estetrabalhoapresentaummetodosimples(3)usadoparaodesenvolvimento, method(7)undergraduatelevelstudentscanestimatepressuredropsandheat comendado,(7)alunosdegraduac~aopodemestimaraquedadapress~aoecoe- cientesdetrocadecalor(8)normalmenteutilizadosemengenhariatermica. workshowsasimplemethod,(3)usedforthedevelopment,(4)testingand(5) thermalanalysisofthistypeofheatexchanger.(6)usingtherecommended (1)Compactheatexchangersarebasicandhigheciencyelements.(2)This exchangecoecients(8)commonlyusedinthermalengineering. easyreferenceintherestofthepaper. 3RepresentingtheStructureoftheText Thenumbersattachedtothetextarenotpresentintheoriginal;theyareprovidedfor Figure2:Arstsampleabstract Inourapproachtextisrepresentedasastructurewhichcapturestheinformationalcontent,thecoherencerelationsbetweenpartsofthetextandthepropositionalcontent.The 1InEnglish:EngeneeringTeachingMagazine3

etc)andcopeswithagroupofsentenceswhichconstitutesoneofthemainsegmentsof ofthetextwillconveydierentkindsofinformation[grimes,1975].therootofthetree thetext,whichcanbeeithersinglepropositionsorgroupsofpropositionstiedtogetherby representsadierentinformationalstatus(backgroundinformation,method,conclusions, thetext.internalnodesofthetreerepresentcoherencerelationsbetweensubsegmentsof coherencerelations.finally,theleavesrepresentthepropositionalcontent. informationactuallypresentintheabstract.thisisbasedonthefactthatdierentparts representationcanbeseenasaforest.eachtreeintheforestrepresentsadierentkindof denedaccordingtothetypeofinformationpresentinthetext(asdescribedinsection 2)andalsoaccordingtotheanalysisofthecorpusofabstractstakenfromthemagazine. asetofinformationalcategories(ic)torepresenteachtextspan.thesecategorieswere groupedtogetheraccordingtosyntacticandsemanticinformation.weareworkingwith throughapartitionofthetextintotextspans,whicharelinearsequencesofsentences Theinformationalcontentrepresentsthemainsegmentsoftheabstract.Itisobtained 3.1InformationalContent semanticinformation,suchastense,aspect,andsemanticfeaturesattachedtothelexical items.thesecategoriesare: Theidenticationofeachcategorywhichappearsinanabstractreliesonsyntacticand Figure2,weobtainthefollowingsubset: ICfBackground,Objective,Method,Experiment,Recommendation,Suggestions, Backgroundisgiveninproposition(1),Objectiveisgiveninpropositionsfrom(2)to TheoveralltextisrepresentedthroughasubsetofIC.Fromtheexampleshownin AbstractfBackground,Objective,Recommendationg Conclusion,Resultsg ingforlocalcoherence.severalresearchershavealreadyinvestigatedtherelationsthat assistageneratorinthechoicestobemadewhenthetimefortranslationcomes. 3.2CoherenceRelations Theanalysisofthecorpusrevealedtheexistenceofvariouscoherencerelations,account- howthispartitionwasmade.theidenticationofthesecategoriesinthesourcetextwill (5),andRecommendationinpropositionsfrom(6)to(8).InSection5wedescribeindetail holdinapieceofcoherenttext:coherencerelations[hobbs,1978a],rhetoricalrelations [MannandThompson,1983],SemanticRelations[Hutchins,1987].Inourworkwegave system. formaldenitionstotheserelationsaimingattheircomputationalusebyanautomatic CRfElaboration,Parallel,Sequence,TemporalSequence,CauseConsequence, Weareworkingwiththefollowingsetofrelations: Contrastg 4

holdsbetweentwopropositions(a)and(b)if(a)introducesanentitythatisfurtherelaboratedinproposition(b).oneoftherulesfortheelaborationrelationfollows: Relationsmayrelatepropositionsorgroupsofpropositions.TheElaborationrelation areotherrulesforelaborationwhichwillnotbepresentedhere. proposition(b)containsaverbalpredicateqwhich\talksabout"thesameentityy.there and(b)ifproposition(a)containsaverbalpredicatepwhichintroducesanentityy,and InFigure2proposition(2)introducestheentity\metodo"thatisfurtherelaboratedin ThisrulestatesthatthereexistsanElaborationrelationbetweentwopropositions(a) q(event2;y;z)2(b)gj=(a)elaboration(b) fp(event1;x;y)2(a) propositionsfrom(3)to(5).theelaborationliesontheutilizationofthe\metodo".the samerelationholdsbetweenpropositions(7)and(8);(7)introducestheentities\quedada press~ao"and\coecientesdetrocadecalor";thetwoarefurtherelaboratedinproposition (8).Inthiscasewehavethefollowingstructure: proposition(8):use(useevent;y;z) proposition(7):estimate(estimateevent;x;y) wherethecontrastpredicatemustbedened.thisrelationisnotpresentintheabstract samemainpredicator,buttheargumentsareincontrast.moreformally TheContrastrelationlinkstwopropositions(a)and(b)ifbothpropositionssharethe fromfigure2.considernowthetextsegmentpresentedinfigure3[bazzoandpereira,1989]: contrast(y;z)gj=(a)contrast(b) p(event2;x;z)2(b) fp(event1;x;y)2(a) howeverempiricismandcreativitywillneverbedisregarded. ausa. (1)Engineeringdependsmoreandmoreonscienceandspecictechniques,(2) (1)Aengenhariadependecadavezmaisdasci^enciasedetecnicasnelas baseadas,(2)masjamaisvaiprescindirdoempirismoedacriatividadedequem veriedbetweenpropositions(1)and(2):inproposition(1)\engenharia"ispresentedas dependentonscience,andinproposition(2)itispresentedasdependenton\criatividade", twoopposedconcepts. Theexplicitmark\mas"isanindicationofcontrast.Thecontrastcanactuallybe Otherrelationsaresimilarlyspecied,andseveralrulesaredenedforeachofthem. Figure3:Textsegmentfromthecorpus 5

3.3PropositionalContent Eachsentenceinthesourcetextoriginatesoneormorepropositionsasaresultofsyntacticandsemanticanalysis.Thepropositionalcontentisasetofpropositionsthatcarry Asetofvariables themessageoftheabstract.therepresentationisbasedonrstorderrepresentations. Propositionsarerepresentedthrough: fromverbphrases.predicateshavemandatoryargumentsandalistisusedtorepresent Asetofpredicates Entitiesarederivedfromnounphrasesinthesentence.Eventsandstatesarederived standingforrelationsofxedarity,whichrelatetheentitiesintheproposition.predicatesareassociatedwithitemsfromtheportugueselanguage. representingtheentityreferentsandeventreferentsintheproposition. tureofthesyntactictreeandonthesemanticinformationattachedtolexicalitems. pairsofnonco-referringexpressions(intrasententialanaphora)usingsyntacticrestrictions (parsetree)producedbytheanalysis[winograd,1983].thetreeisalsousedtoidentify [Raposo,1992].Semanticanalysisproducespredicatesandargumentsbasedonthestruc- grammaticalrulesanditisbrokendownintopropositionsaccordingtothesyntactictree themultiplecasesofcomplementation.eachsentenceinthetextisanalysedaccordingto setofknownentities,whichrestrictthepossibleantecedentsforadenitenounphrasein [Hirst,1981a,Hirst,1981b]mustberesolved.Theresolutionisbasedonsyntacticrestrictions(genre,number)aswellasonworldknowledge[NirenburgandCarbonell,1987].A thefollowingtext,ismaintained.thesetincludestheentitiesearlierintroduced(explicitly) aswellastheentitiesassociatedwiththem.pronounresolutionisessentialineverymt system[wilks,1973]andthecorrectchoiceofanantecedentforadenitefullnounphrase isessentialinthevericationofthecoherencerelations. Inordertocorrectlyrepresenttheentitiesinthepropositionalcontent,deniteanaphora semanticinterpretationofthenounphraseresultedinaincompleteentityrepresentedby theexpression(a): Theresolutionofthedenitenounphrase\estadisciplina"isrequired,becausethe Asanotherexample,considertheabstractpresentedinFigure4[deAraujoandSzeremeta,1985]: thereisnopreviousexplicitoccurrenceofthispredicateinthetext.nevertheless,inthe ofknownentities.thesearchisbasedonthepredicate\disciplina".asitcanbeseen, rstsentenceoftheabstract,thenounphrase\calculonumerico"wasintroducedand belongstothelistofknownentities: thesemanticinterpretationofthisexpressionproducedthefollowingterm,whichindeed Theresolutionoftheexpressionisbasedonthesearchforanantecedentinthelist (a)disciplina(x1) (b)disciplina(x2,name:calculonumerico) 6

dosfuturosengenheiros.algumasalterac~oesnametodologiadeensinodesta Withtheincreasingutilizationofprogrammablecalculatorsandmicrocomputers,itbecomesnecessarytoupdatetheteachingapproachofNumericalAnalysiingthesesubjectsaresuggestedandsomerecommendationsaremadeforthe utilizationofprogrammablecalculators. instructionforfutureengineers.somechangesonthemethodologyforteach- calculadorasprogramaveis. disciplinas~aosugeridasefeitasalgumasrecomendac~oesquantoautilizac~aode necessariaaadequac~aodoplanodeensinodecalculonumericoparaaformac~ao Comautilizac~aodecalculadorasprogramaveisemicrocomputadores,torna-se cionaraosalunosfundamentosbasicossobreateoriademodelos.ociclode aulaspraticastemcomoobjetivos(...) hariaministradanocursodeengenhariamec^anicadaufuaqualvisapropor- OpresentetrabalhotemcomometadivulgaradisciplinaSimilitudeemEngen- Figure4:Asecondsampleabstract providingthestudentswithbasicfoundationsofmodelingtheory.practical activitieswereplanned(...) TheobjectiveofthispaperistodivulgethecourseSimilitudeinEngineering, taughtatthemechanicalengineeringdepartmentofufuwiththeobjectiveof praticas"isintroducedandthisphrasemustberesolved;thesemanticinterpretationof So,theresolutionofexpression(a)becamespossiblethroughexpression(b). Asanadditionalexample,considertheabstractpresentedinFigure5[GomideandFernandez,1985]: Inthesecondsentenceoftheabstract,thedenitenounphrase\Ociclodeaulas Figure5:Athirdsampleabstract entity\aulas".inordertoresolvethedeniteexpression\ociclodeaulaspraticas",itis necessarytoresolveexpression(d).intherstsentenceoftheabstractthenounphrase\a thisexpressionproducedthefollowingterms: disciplinasimilitudeemengenharia"isintroducedandthesemanticinterpretationofthis nounphraseresultedinacompleteexpressionrepresentedbythefollowingterm: Notethatterm(d)isincompletebecausethesystemdoesnotknowtheidentityofthe (d)aula(x4,qualier:praticas) (c)ciclo(x3,specier:x4) (e)disciplina(x5,name:similitudeemengenharia) 7

\aulas"thesystemintroducedthefollowingtermassociatedwithexpression(e): (d)becamesnallypossible. Withthesetermsincorporatedtothelistofknownentities,theresolutionofexpression Inadditiontothisexpression,andtakingintoaccountthefactthata\disciplina"has 4ConstructingtheMeaningRepresentation ThediagraminFigure6showsthemainprocessesandsourcesinvolvedintheconstruction (f)aula(a(x5)) text.theabstractstructureassemblerlinksacoherentspantotheglobalorganizationof Assemblerisresponsibleforselectingthecoherencerelationsthatlinkpropositionsinthe ofthetextstructure.thestructureisconstructedbytwomainprocesses.thecoherence thetext. COHERENCE PROPOSITIONS SEMANTIC SYNTACTIC SIGNALS COHERENCE PARTIAL STRUCTURE RULES ASSEMBLER COHERENT SemanticandSyntacticSignals:guidethecoherenceassemblerintheselectionof Propositions:areproducedasaresultofsyntacticandsemanticanalysis. Theseprocessesoperateonthefollowingcomponentsofthemeaningrepresentation: Figure6:MeaningRepresentationConstruction SPAN signalsincludediscoursemarkersthatdirectlysignalthestructureofthediscourse ABSTRACT [HirschbergandLitman,1993].Thesemarkersaretheprimaryindicationofthepresenceofacoherencerelationinthetext.Tense,aspectandsemanticinformation thecoherencerelationsandalsoindecidingwhereacoherentspanends.syntactic STRUCTURE ASSEMBLER attachedtolexicalitemsprovideameanstodecideaboutthelimitsofatextspan TEXT [GroszandSidner,1986]. STRUCTURE 8

BACKGROUND OBJETIVES PartialStructure:isusedtostorepropositionsandsegmentsalreadylinkedandwaitingforadditionalprocess.WhenprocessingapropositionPk,twoproblemsmustbe resolved: CoherenceRules:deneconditionsthatpropositionsmustsatisfyinordertobelinked togetherbyacoherencerelation. Propositionsmustbetemporarilysaveduntiladecisionismade. (b)decideonhowtheattachmenttoasegmentwillbedone. (a)decidetowhichtextsegmentthepropositionpkwillbeattached; Figure7showsthestructureproducedasaresultoftheanalysisoftheexamplefromFigure 5DetailedExample CoherentSpan:isagroupofpropositionsrelatedbycoherencerelations.ItcarriesinformationalcontentassociatedwithoneoftheInformationalCategoriesearlier 2.Themainprocessesthatledtothisstructureare: presented. (1) ELABORATION Breakingeachsentenceintopropositions:usingsyntacticandsemanticanalysis. (2) PARALLEL Determiningreferencesfordeniteanaphora:thenounphrase\estetrabalho"in RECOMENDATIONS (3) (4) (5) proposition(2)isresolvedusingspecicknowledgeaboutabstracts.thecorrespondingdenitenounphraseis\thispaper". Figure7:TextStructure ENABLE 9 (6) ELABORATION (7) (8)

Variousentitiesonlymakesenseinthecontextofabstracts.Theseentitiesinclude \theauthors",\thepaper",\thework",\theobjective"andthelike.thisinformation Determiningthelimitsofeachtextspan:inproposition(1)theuseoftheverbalform thepreviousdiscourse. inproposition(1).thenounphrase\ometodorecomendado"isalsoresolvedusing precedingdiscourse.theantecedentis\trocadoresdecalorcompactos"introduced antecedentforadenitenounphrase. Thenounphrase\estetipodetrocador"inproposition(5)isresolvedusingthe isincludedintheknowledgebasesystemandisveryusefulwhenlookingforan Objectivecategoryisselectedanditspansuptoproposition(5).Inproposition(6) wecandeducethatthepropositionreallymarkstheobjectiveofthepaper.sothe subjectinthesentence.takingintoaccountthefactthata\paper"hasanobjective, nounphrase\estetrabalho",whichwasfoundtomean\thispaper",isactingas carries,ingeneral,purposeorobjectiveinformation[jordan,1991].additionally,the classiedasbackground.inproposition(2)theverb\apresentar"isused.thisverb \Is-a"sentencesareusuallyanalysedinthisform[Sidner,1978].Soproposition(1)is \s~ao"carriessemanticalinformationaboutgeneralfacts(oneentityis\dened"). DeterminingCoherenceRelations:syntacticmarksguidetheselectionofcoherence orasequencerelation.butnotethatthesameargument,\estetipodetrocador",is relations.forexamplepropositionsfrom(3)to(5)arelinkedbycoordination,syntacticallyindicatedbycommasandbytheconjunction\e";thiscouldmarkaparallel theitem\recomendado"marksthebeginningofanewtextspanwhichisclassied asrecommendation.figure7showsthelimitsofeachtextspan. usedinthethreepropositions,whichsignalsapreferenceforaparallelrelation.the structureandthesourceinputwastreatedasadisconnectedsequenceofsentences.asa 6Conclusions Traditionalapproachestomachinetranslationhaveusuallyneglectedtheproblemoftext othercoherencerelationsfromtheabstractshowninfigure2areshowninfigure result,therepresentationusedbytheseapproacheswerenotabletocaptureandtomake 7. ofabstractsfromscienticpapersinportuguese.thisrepresentationmustcapturethe informationalcontent,thecoherencerelationsandthepropositionalcontentoftheinput useofthecoherencephenomenapresentintheinput. text.webelievethatthisrepresentationisappropriateformachinetranslationbecauseit choosethesupercialformsinordertocorrectlyexpressthemessageinthetargetlanguage, preservingtheoriginalstructureofthetext.severalstepsareinvolvedintheconstruction thetext.representingthelinguisticstructureofthetextenablesageneratorprogramto copesnotonlywiththemessagewhichisbeingconveyed,butalsowiththestructureof Weareconcentratedonthespecicationandconstructionofameaningrepresentation ofsucharepresentation:syntacticanalysis,semanticinterpretation,anaphoraresolution, 10

knowledgeaboutthedomainofthediscourseintothesystem;moreresearchisalsoneeded determinationoftextspansanddeterminationofcoherencerelations.weareworkingwith asetoftheserelations,whichweredenedaccordingtothephenomenaobservedinthe Also,wehaveonlytreatedtheproblemofdeniteanaphorathroughtheincorporationof corpus.additionalresearchisneededinordertoexpandthissettocopewithmorerelations. WewouldliketothankJorgeStolforhisvaluablecommentsonapreviousversionofthis paper. tocopewithotherkindsofanaphora. References Acknowledgements [BazzoandPereira,1989]Bazzo,W.A.andTeixeiradoValePereira,L.Criatividadena [Danlos,1987]Danlos,L.TheLinguisticBasisofTextGeneration.StudiesinNaturalLanguageProcessing,CambridgeUniversityPress,1987. [ABNT,1987]ABNT-Associac~aoBrasileiradeNormasTecnicas.Resumos.1987. Engenharia.RevistadeEnsinodeEngenharia,S~aoPaulo,8(1):8-11,10semestre 1989. [dearaujoandszeremeta,1985]dearaujo,n.d.andszeremeta,j.f.umaexperi^encia [dasilveiranetoandhernandezmendoza,1988]dasilveiraneto,a.andhernandezmendoza,o.s.trocadoresdecalorcompactos-bancadadetestes.revistadeensino noensinodecalculonumericonaufsc.revistadeensinodeengenharia, deengenharia,s~aopaulo,7(1):43-48,10semestre1988. [GomideandFernandez,1985]Gomide,H.A.andFernandez,E.F.CursodeSimilitudeem [Grimes,1975]Grimes,J.TheThreadofDiscourse.MoutonandCompany,TheHague, 4(2):138-139,S~aoPaulo,20Sem.1985. Engenharia.RevistadeEnsinodeEngenharia,4(2):125-132,S~aoPaulo,20Sem. 1985. [HallidayandHasan,1976]Halliday,M.A.andHasan,R.CohesioninEnglish.London, [GroszandSidner,1986]Grosz,B.J.andSidner,C.L.Attention,IntentionsandtheStructureofDiscourse.ComputationalLinguistics,Vol.12,Num.3,July-September NotesinComputerScience119.Springer-Verlag,1981. 1986. Netherlands,1975. [Hirst,1981a]Hirst,G.AnaphorainNaturalLanguageUnderstanding:ASurvey.Lecture LongmanPress,1976.

[Hobbs,1978a]Hobbs,J.R.CoherenceandCoreference.SRIInternational.TechnicalNote [Hirst,1981b]Hirst,G.Discurse-OrientedAnaphoraResolutioninNaturalLanguageUnderstanding:AReview.AmericanJournalofComputationalLinguistics,Vol.7, [HirschbergandLitman,1993]Hirschberg,J.andLitmanD.EmpiricalStudiesontheDisambiguationofCuePhrases.ComputationalLinguistics,Vol.19,Num.3,1993. 168,August1978. Num.2,April-June1981. [Hutchins,1987]Hutchins,W.J.Summarization:SomeProblemsandMethods.Meaning: [Hobbs,1978b]Hobbs,J.R.WhyIsDiscourseCoherent?.SRIInternational.TechnicalNote [Hutchins,1985]Hutchins,W.J.InformationRetrievalandTextAnalysis.InNewApproachestotheAnalysisofMass,Media,DiscourseandCommunication.T.A. 176,November1978. TheFrontierofInformatics.K.Jones(Ed.),Cambridge,London,1987. vandijk(ed.),gruyter,berlin,1985. [Jordan,1991]Jordan,M.P.TheLinguisticGenreofAbstracts.InA.DellaVolpe(ed.),The [Lewis,1992]Lewis,D.ComputersandTranslation.InComputersandWrittenTexts. SeventeenthLACUSForum.LinguisticsAssociationofCanadaandtheUnited States,1991. [MannandThompson,1987]Mann,W.C.andThompsonS.A.RhetoricalStructureTheory:ATheoryofTextOrganization.ISIReprintSeries,ISI/RS-87-190,June1987. Discourse.InformationSciencesInstitute,TechnicalReportRR-83-115,November ChristopherS.Butler.Blackwell,1992. [MannandThompson,1983]Mann,W.C.andThompsonS.A.RelationalPropositionsin 1983. [MooreandParis,1994]Moore,J.D.andParis,C.L.PlanningTextforAdvisoryDia- [NirenburgandCarbonell,1987]Nirenburg,S.andCarbonell,J.IntegratingDiscourse logues:capturingintentionalandrhetoricalinformation.computationallin- guistics,vol.19,num.4,1994. [Scolum,1985]Scolum,J.ASurveyofMachineTranslation.ComputationalLinguistics, [Raposo,1992]Raposo,E.P.TeoriadaGramatica.AFaculdadedaLinguagem.Ed.Caminho,Lisboa,1992. Vol.11,Num.1,1985. PragmaticsandPropositionalKnowledgeforMultilingualNaturalLanguageProcessing.ComputersandTranslation(2).ParadigmPress,Inc.,1987. 12

[WeissbergandBuker,1990]Weissberg,R.andS.Buker.WritingUPResearch.Prentice- [Tucker,1984]Tucker,A.B.APerspectiveonMachineTranslation:TheoryandPractice. [Sidner,1978]Sidner,L.S.TheUseofFocusasaToolforDisambiguationofDeniteNoun [Wilks,1973]Wilks,Y.AnArticialIntelligenceApproachtoMachineTranslation.In Hall,Inc.,1990. CommunicationsoftheACM.Vol.27.Num4.April1984. Phrases.TINLAP-2,1978. [Winograd,1983]Winograd,T.LanguageasaCognitiveProcess.Addison-WesleyPublishingCompany,INC.,1983. ComputerModelsofThoughtandLanguage,Schank,R.andColby,K.(Eds.), Freeman,SanFrancisco,1973. 13

92-01ApplicationsofFiniteAutomataRepresentingLargeVocabularies, 92-03OntheIrrelevanceofEdgeOrientationsontheAcyclicDirectedTwoDisjointPathsProblem,C.L.Lucchesi,M.C.M.T.Giglio 92-02PointSetPatternMatchingind-Dimensions,P.J.deRezende,D.T.Lee C.L.Lucchesi,T.Kowaltowski RelatoriosTecnicos{1992 92-05An(l;u)-TransversalTheoremforBipartiteGraphs,C.L.Lucchesi, 92-06ImplementingIntegrityControlinActiveDatabases,C.B.Medeiros, 92-04ANoteonPrimitivesfortheManipulationofGeneralSubdivisionsand thecomputationofvoronoidiagrams,w.jacometti D.H.Younger 92-08MaintainingIntegrityConstraintsacrossVersionsinaDatabase, 92-07NewExperimentalResultsForBipartiteMatching,J.C.Setubal 92-09OnClique-CompleteGraphs,C.L.Lucchesi,C.P.Mello,J.L.Szwarcter M.J.Andrade 92-10ExamplesofInformalbutRigorousCorrectnessProofsforTreeTraversing C.B.Medeiros,G.Jomier,W.Cellary 92-12BrowsingandQueryinginObject-OrientedDatabases,J.L.deOliveira, 92-11DebuggingAidsforStatechart-BasedSystems,V.G.S.Elias,H.Liesenberg R.deO.Anido Algorithms,T.Kowaltowski 14

93-02TheHierarchicalRingProtocol:AnEcientSchemeforReadingReplicatedData,NabordasC.Mendonca,RicardodeO.Anido HansK.E.LiesenbergRelatoriosTecnicos{1993 93-03MatchingAlgorithmsforBipartiteGraphs,HerbertA.BaierSaip,ClaudioL. 93-04AlexBFSAlgorithmforProperIntervalGraphRecognition,CelinaM.H. 93-05SistemaGerenciadordeProcessamentoCooperativo,Ivonne.M.Carrazana, Lucchesi defigueiredo,jo~aomeidanis,celiap.demello 93-01TransformingStatechartsintoReactiveSystems,AntonioG.FigueiredoFilho, 93-08IntrospectionandProjectioninReasoningaboutOtherAgents,Jacques 93-06Implementac~aodeumBancodeDadosRelacionalDotadodeumaInterface 93-07EstadogramasnoDesenvolvimentodeInterfaces,FabioN.deLucena,Hans Nelson.C.Machado,Celio.C.Guimar~aes 93-09Codicac~aodeSequ^enciasdeImagenscomQuantizac~aoVetorial,Carlos Wainer K.E.Liesenberg Cooperativa,NascifA.AbousalhNeto,AriadneM.B.R.Carvalho 93-11AnImplementationStructureforRM-OSI/ISOTransactionProcessing 93-10Minimizac~aodoConsumodeEnergiaemumSistemaparaAquisic~aode CastroMachado AntonioReinaldoCosta,PauloLciodeGeus 93-12Boole'sconditionsofpossibleexperienceandreasoningunderuncertainty, ApplicationContexts,FlavioMoraisdeAssisSilva,EdmundoRobertoMauro DadosControladoporMicrocomputador,PauloCesarCentoducatte,Nelson 93-13ModellingGeographicInformationSystemsusinganObjectOriented PierreHansen,BrigitteJaumard,MarcusPoggideArag~ao Madeira 93-15UsingExtendedHierarchicalQuorumConsensustoControlReplicated 93-14ManagingTimeinObject-OrientedDatabases,LincolnM.Oliveira,Claudia donca,ricardodeoliveiraanido15 Data:fromTraditionalVotingtoLogicalStructures,NabordasChagasMen- Framework,FatimaPires,ClaudiaBauzerMedeiros,ArdemirisBarrosSilva

93-16LL{AnObjectOrientedLibraryLanguageReferenceManual,Tomasz 93-19Modelamento,Simulac~aoeSntesecomVHDL,CarlosGeraldoKrugereMario 93-18RuleApplicationinGIS{aCaseStudy,ClaudiaBauzerMedeiros,Geovane 93-17MetodologiasparaConvers~aodeEsquemasemSistemasdeBancosde DadosHeterog^eneos,RonaldoLopesdeOliveira,GeovaneCayresMagalh~aes Kowaltowski,EvandroBacarin 93-20ReectionsonUsingStatechartstoCaptureHuman-ComputerInterface 93-21ApplicationsofFiniteAutomatainDebuggingNaturalLanguageVocabularies,TomaszKowaltowski,ClaudioLeonardoLucchesieJorgeStol LucioC^ortes 93-23RethinkingthednaFragmentAssemblyProblem,Jo~aoMeidanis 93-22MinimizationofBinaryAutomata,TomaszKowaltowski,ClaudioLeonardoLucchesieJorgeStol Behaviour,FabioNogueiradeLucenaeHansLiesenberg 93-24EGOLib UmaBibliotecaOrientadaaObjetosGracos,EduardoAguiar 93-25Compreens~aodeAlgoritmosatravesdeAmbientesDedicadosaAnimac~ao, 93-27AUniedCharacterizationofChordal,Interval,IndierenceandOther 93-26GeoLab:AnEnvironmentforDevelopmentofAlgorithmsinComputational Patrocnio,PedroJussieudeRezende 93-28ProgrammingDialogueControlofUserInterfacesUsingStatecharts,Fabio ClassesofGraphs,Jo~aoMeidanis RackelValadaresAmorim,PedroJussieudeRezende 93-29EGOLib{ManualdeRefer^encia,EduardoAguiarPatrocnioePedroJussieude NogueiradeLucenaeHansLiesenberg Geometry,PedroJussieudeRezende,WelsonR.Jacometti Rezende 16

94-02Incorporac~aodoTempoemumsgbdOrientadoaObjetos,^AngeloRoncalli 94-01AStatechartEnginetoSupportImplementationsofComplexBehaviour, FabioNogueiradeLucena,HansK.E.Liesenberg AlencarBrayner,ClaudiaBauzerMedeiros RelatoriosTecnicos{1994 94-04OnEdge-ColouringIndierenceGraphs,CelinaM.H.deFigueiredo,Jo~aoMeidanis,CeliaPicinindeMello 94-03OAlgoritmoKMPatravesdeAut^omatos,MarcusVinciusA.Andradee 94-05UsingVersionsingis,ClaudiaBauzerMedeirosandGenevieveJomier 94-06TimesAssncronos:UmaNovaTecnicaparaoFlowShopProblem,Helvio ClaudioL.Lucchesi 94-08Reasoningaboutanotheragentthroughempathy,JacquesWainer 94-07InterfacesHomem-Computador:UmaPrimeiraIntroduc~ao,FabioNogueira 94-09APrologmorphologicalanalyserforPortuguese,JacquesWainer,Alexandre PereiraPeixotoePedroSergiodeSouza delucenaehansk.e.liesenberg 94-12UmaMetodologiadeEspecicac~aodeTimesAssncronos,HelvioPereira 94-10Introduc~aoaosEstadogramas,FabioN.deLucena,HansK.E.Liesenberg 94-11MatchingCoveredGraphsandSubdivisionsofK4andC6,MarceloH.de Peixoto,PedroSergiodeSouza CarvalhoandClaudioL.Lucchesi Farcic 17

95-02Adaptiveenumerationofimplicitsurfaceswithanearithmetic,LuizHenriquedeFigueiredo,JorgeStol RelatoriosTecnicos{1995 95-01Paradigmasdealgoritmosnasoluc~aodeproblemasdebuscamultidimensional,PedroJ.deRezende,RenatoFileto 95-04Agreedymethodforedge-colouringoddmaximumdegreedoublychordal 95-03W3noEnsinodeGraduac~ao?,HansLiesenberg 95-05ProtocolsforMaintainingConsistencyofReplicatedData,RicardoAnido, 95-06GuaranteeingFullFaultCoverageforUIO-BasedMethods,RicardoAnido 95-07Xchart-BasedComplexDialogueDevelopment,FabioNogueiradeLucena, graphs,celinam.h.defigueiredo,jo~aomeidanis,celiapicinindemello 95-08ADirectManipulationUserInterfaceforQueryingGeographicDatabases, N.C.Mendonca HansK.E.Liesenberg andanacavalli 95-11ProcessadordeVizinhancaparaFiltragemMorfologica,IlkaMarinhoBarros, 95-10AHighlyRecongurableNeighborhoodImageProcessorbasedonFunctionalProgramming,NeucimarJ.Leite,MarceloA.deBarros 95-09BasesfortheMatchingLatticeofMatchingCoveredGraphs,ClaudioL. JulianoLopesdeOliveira,ClaudiaBauzerMedeiros 95-12ModelosComputacionaisparaProcessamentoDigitaldeImagensemArquiteturasParalelas,NeucimarJer^onimoLeite RobertodeAlencarLotufo,NeucimarJer^onimoLeite Lucchesi,MarceloH.Carvalho 95-15NP-HardnessResultsforTension-FreeLayout,C.F.X.deMendoncaN.,P. 95-14VertexSplittingandTension-FreeLayout,P.Eades,C.F.X.deMendoncaN. 95-13ModelosdeComputac~aoParalelaeProjetodeAlgoritmos,RonaldoParente demenezesejo~aocarlossetubal Eades,C.L.Lucchesi,J.Meidanis 95-17AnaisdaIIOcinaNacionalemProblemasCombinatorios:Teoria,AlgoritmoseAplicac~oes,Editores:MarcusViniciusS.PoggideArag~ao,CidCarvalho 95-16AgentesReplicanteseAlgoritmosdeEco,MarcosJ.C.Euzebio desouza 18

95-18AsynchronousTeams:AMulti-AlgorithmApproachforSolvingCombinatorialMultiobjectiveOptimizationProblems,RosianedeFreitasRodrigues, 95-21ALinearTimeAlgorithmforBinaryPhylogenyusingPQ-Trees,J.Meidanis 95-19wxWindows:UmaIntroduc~ao,CarlosNevesJunior,TallysHooverYunes,Fabio 95-20JohnvonNeumann:SuasContribuic~oesaComputac~ao,TomaszKowaltowski PedroSergiodeSouza NogueiradeLucena,HansKurtE.Liesenberg ande.g.munuera CaixaPostal6065 UniversidadeEstadualdeCampinas 13081-970{Campinas{SP DepartamentodeCi^enciadaComputac~ao IMECC BRASIL reltec@dcc.unicamp.br19