SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S Chinatu Aone, Doug McKee, Sandy Shinn, Hatte Bleje r Sytem Reearch and Application (SRA ) 2000 15th Street North Arlington, VA 2220 1 aonec@ra.com INTRODUCTION In thi paper, we report SRA' reult on the MUC-4 tak and decribe how we trained our natural languag e proceing ytem for MUC-4. We alo report on what worked, what didn't work, and leon learned. Our MUC-4 ytem embed the SOLOMON knowledge-baed NLP hell which i deigned for both domain - independence and language-independence. We are currently uing SOLOMON for a Spanih and Japane e text undertanding project in a different domain. Although thi wa our firt year participating in MUC, w e have built and are currently building other data extraction ytem. RESULTS Our TST3 and TST4 reult are hown in Figure 1 and 2. The imilarity of thee core a well a thei r imilarity to SRA-internal teting reult reflect the portability of SRA' MUC-4 ytem. In fact, our cor e on the TST4 text wa better than that of TST3, even though thoe text covered a different time perio d than that of the training text or TST3. Our matched-only preciion and recall for both tet et were very high (TST3 : 68/47, TST4: 73/49). When SOLOMON recognized a MUC event, it did a very accurate and complete job at filling the requiit e template. SOLOMON performance wa tuned o that the all-template recall and preciion were a cloe a poibl e to maximize the F-Meaure. A hown in Figure 3, our F-Meaure teadily increaed over time. The fact that thi lope ha not yet leveled off how SOLOMON' potential for improvement. EFFORT SPENT We pent a total of 9 taff month tarting January 1, 1992 through May 31, 1992 on MUC-4. A takpecific breakdown of effort i hown in Figure 4. The bulk of the work wa pent porting SOLOMON t o a new domain with new vocabulary, concept, template-output format, and fill rule. Approximately 72% of the effort wa domain-dependent. However, about 63% of the total effort wa language-independent, i.e. it would be directly applicable to undertanding text about terrorim in any language. We expect that our Englih MUC-4 ytem could be ported to a new language in about 3 month, given a baic grammar, lexicon and preproceing data imilar to the one which exited for Englih. We partially demontrated thi 137
REC PRE OVG FA L MATCHED/MISSING 27 68 8 MATCHED/SPURIOUS 47 32 5 7 MATCHED ONLY 47 68 8 ALL TEMPLATES 27 32 5 7 TEXT FILTERING 71 85 15 2 3 F-MEASURES P&R 29.29 2P&R 30.86 P&2R 27.87 Figure 1 : TST3 Reult REC PRE OVG FA L MATCHED/MISSING 38 73 4 MATCHED/SPURIOUS 49 31 5 9 MATCHED ONLY 49 73 4 ALL TEMPLATES 38 31 5 9 TEXT FILTERING 91 75 25 3 5 F-MEASURES P&R 34.14 2P&R 32.19 P&2R 36.3 6 Figure 2: TST4 Reult claim by howing our MUC-4 ytem proceing Englih, Japanee and Spanih newpaper article about the murder of Jeuit priet at the demontration eion of MUC-4. We pent le than 2 week after the final tet adding MUC-pecific word to Spanih and Japanee lexicon, and extending the grammar of th e two language. Data 40% of the total effort building MUC-data wa pent on lexicon and KB entry acquiition. Much of thi dat a wa acquired automatically. We ued the upplied geographical data to automatically build location lexicon and KB. Uing the development template, we acquired lexical and KB entrie for clae of domain term uch a human and phyical target and terrorit organization. We automatically derived ubcategorization information for the domain verb from the development text (cf. [1]). Thee automatically acquired lexicon and KB did require ome manual cleanup and correction. Certain multi-word phenomena which occur frequently in text but are unuitable for general paring wer e handled by pattern matching during Preproceing. For example, we created pattern for Spanih phrae, complex location phrae, relative time, and name of political, military and terrorit organization. Modification to SOLOMON' broad-coverage Englih grammar included adding more emantic retriction, extending ome phrae-tructure rule, and improving general robutne. Baed on our knowledge engineering effort, we built a et of commonene reaoning rule that are decribed in detail in our ytem decription. Our EXTRACT module recognize MUC-relevant event in the output of SOLOMON and tranlate them into MUC-4 filled template. We implemented all the domainpecific information a mapping rule or imple converion function (e.g. numeric value like "at leat 5 " mean "5-" ). Thi data i tored in the knowledge bae, and i completely language independent. 13 8
o T20 ' 30 13{ M T T2 0 M T4 20 T2, 10 T2 i I 0 i I I I I I I I I I I. 1 0 100 200 300 400 500 600 700 500 000 1000 1100 1200 1300 1400 moo JAN 1 MAR 25 MAY 1 MAY17 MAY 31 Hour of Effort Imo 11 3125 517 5124 5125 5127 5/3 1 Noun 0 300 1240 1380 1400 1440 1500 TST2 0 11.43 19.48 2625 27.43 2525 TST3 2020 T8T4 34.14 1 Figure 3: Tracking SOLOMON Performanc e Tak Category ~ % of Total Effort DATA 7 1 Knowledge Engineering 1 3 Data Acquiition 3 0 Grammar 7 Pragmatic Inference Rule 1 1 Extract Data 1 0 PROCESSING - 2 9 Meage Zoning 3 Extract Extenion 7 Teting 1 0 Mic. Bug Fixing 10 Figure 4 : Breakdown of Effort Spent for MUC- 4 13 9
Procein g We pent 1 week porting our exiting Meage Zoner to deal with meage header in MUC meage. The Meage Zoner could already recognize more general meage tructure uch a paragraph and entence. We extended EXTRACT while maintaining domain and language independence of the module. Feature added included event merging and handling of flat MUC template intead of the more object-oriente d databae record that SOLOMON i accutomed to. Our time pent on fixing bug wa ditributed throughout the ytem, but problem in Debri Paring and Debri Semantic received the mot attention. SYSTEM TRAININ G We ued TST2 text for blind teting and the entire 1300 development text for both teting and trainin g material. The development et wa crucial to both our automated data acquiition and our knowledge engineering tak. We performed frequent teting to track and direct our progre. To raie recall, w e focued on data acquiition ; to raie preciion, we focued on tricter definition of "legal" MUC event. To improve overall performance, we focued on more robut yntactic and emantic analyi and mor e reliable event merging. LIMITING FACTOR S The two main limiting factor were the number of development text and template and the amount of tim e allotted for the MUC-4 effort. With more text, we could have applied other more data-intenive automate d acquiition technique and had more example of phenomena to draw upon. With more time, we would add more domain-dependent lexical knowledge and additional pragmatic inference rule. We alo need to tune our EXTRACT mapping rule more finely and improve our dicoure module for both NP reference an d event reference reolution. Integration of exiting on-line reource uch a machine-readable dictionarie, the World Factbook, or WordNet would alo improve ytem performance. A more extenive teting and evaluation trategy at both the blackbox and glabox level would help direct progre, but wa not feaibl e in the amount of time we had. WHAT WAS OR WAS NOT SUCCESSFU L There were everal area where hybrid olution worked very well. Totally automated knowledge acquiition wa quite ucceful when upplemented by manual checking and editing of domain-crucial information. Similarly, augmenting a pure bottom-up parer with "imulated top-down paring" (See SRA ' MUC-4 Sytem Decription) worked well. Improved Debri Semantic and ignificantly extended Pragmatic Inferencing wer e alo important contributor to the ytem' performance. REUSABILITY SRA' SOLOMON NLP ytem ha been deigned for portability and proven to be highly reuable. Thi include portability to other domain, other language, and other application. A hown in Figure 5, a larg e 140
SOLOMON Popmaimiu MiW Ylwr Mee fume. U luwm Wad 11maly Wmd4mn MMydp M N+eV [wwq AIM PTV Rr' Mrplrnd nwm k Wprpr«won P Wmnal, g noop fdediin arr1rip nmmmi O+Wrw+ dtl1miimimir11 ~JIiWH Dwainapadk PMT.: NMMU PT,rw IMIRSupw. nom HM~M ertl~o MUC Oi Mo le MU C Emma B lbw 41..dol Smomld SINS, Figure 5: MUC NLP Sytem Reuability part of SOLOMON ' data and almot all of the proceing module are completely reuable for NLP in othe r domain or language. Currently, our Spanih and Japanee data extraction project MURASAKI i uing, without modification, the ame proceing module and the core knowledge bae a thoe ued for MUC-4. The MURASAKI ytem procee Spanih and Japanee language newpaper and journal article a well a TV trancript. Thi project' domain i the AIDS dieae. Thu, the only difference between our MUC-4 ytem an d MURASAKI ytem i that the latter ue Spanih and Japanee lexicon, pattern and grammar, an d MURASAKI domain-dependent knowledge bae. SOLOMON ha alo been embedded in everal Engli h meage undertanding ytem : ALEXIS (operational) and WARBUCKS. LESSONS LEARNED AND REAFFIRMED BY MUC- 4 We have learned and reaffirmed the following point a the mot crucial apect of ucceful text under - tanding for data extraction. Overcoming the Knowledge Acquiition Bottleneck : We mut develop technique and tool for acquiring timely, complete, and proven ytem data. Solving the Paring Problem : We need more robut, emantically contrained yntactic analyi. Grammar mut be broad-coverage and highly accurate on complex input. Developing Sophiticated Dicoure Analyi : We mut handle real world dicoure phenomena foun d in actual text. The dicoure architecture mut be flexible enough to accommodate particular dicour e phenomena which are crucial in particular domain or language. MUC-4 ha reaffirmed our knowledge of what i involved in porting an NLP ytem to a new domain. 9 taff month i a bare minimum for uch an effort. Improved knowledge acquiition tool a well a 141
on-line reource are deirable. To enure good reult, it i neceary to have ufficient time for knowledg e engineering, teting and evaluation. Our experience undercore the fact that natural language undertandin g i a highly data-driven problem. The ytem' performance i often proportional to the level of undertandin g of the input and output. The MUC-4 development text and template were extremely helpful in thi regard. Reference [1] Doug McKee and John Maloney. Uing Statitic Gained from Corpora in a Knowledge-Baed NLP Sytem. In Proceeding of The AAAI Workhop on Statitically-Baed NLP Technique, 1992. 142