SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S

Similar documents
A Spam Message Filtering Method: focus on run time

CASE STUDY ALLOCATE SOFTWARE

Project Management Basics

Tap Into Smartphone Demand: Mobile-izing Enterprise Websites by Using Flexible, Open Source Platforms


Laureate Network Products & Services Copyright 2013 Laureate Education, Inc.

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

Pekka Helkiö, 58490K Antti Seppälä, 63212W Ossi Syd, 63513T

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

INFORMATION Technology (IT) infrastructure management

CASE STUDY BRIDGE.

SELF-MANAGING PERFORMANCE IN APPLICATION SERVERS MODELLING AND DATA ARCHITECTURE

Development Progress

License & SW Asset Management at CES Design Services

Partial optimal labeling search for a NP-hard subclass of (max,+) problems

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

How To Prepare For A Mallpox Outbreak

462 Machine Translation Systems for Europe

TRADING rules are widely used in financial market as

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Bio-Plex Analysis Software

How Enterprises Can Build Integrated Digital Marketing Experiences Using Drupal

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

A technical guide to 2014 key stage 2 to key stage 4 value added measures

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

RO-BURST: A Robust Virtualization Cost Model for Workload Consolidation over Clouds

Tips For Success At Mercer

A Review On Software Testing In SDlC And Testing Tools

Linking Example-Based and Rule-Based Machine Translation. Michael Carl, Catherine Pease and Oliver Streiter

Benchmarking Bottom-Up and Top-Down Strategies for SPARQL-to-SQL Query Translation

Improving the Performance of Web Service Recommenders Using Semantic Similarity

OPINION PIECE. It s up to the customer to ensure security of the Cloud

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

A note on profit maximization and monotonicity for inbound call centers

Trusted Document Signing based on use of biometric (Face) keys

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

INTERACTIVE TOOL FOR ANALYSIS OF TIME-DELAY SYSTEMS WITH DEAD-TIME COMPENSATORS

Return on Investment and Effort Expenditure in the Software Development Environment

Progress 8 measure in 2016, 2017, and Guide for maintained secondary schools, academies and free schools

Acceleration-Displacement Crash Pulse Optimisation A New Methodology to Optimise Vehicle Response for Multiple Impact Speeds

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

Design of Compound Hyperchaotic System with Application in Secure Data Transmission Systems

Principal version published in the University of Innsbruck Bulletin of 8 April 2009, Issue 55, No 233

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology

Graduation Information 6

APEC Environmental Goods and Services Work Program

Group Mutual Exclusion Based on Priorities

Change Management Plan Blackboard Help Course 24/7

Cluster-Aware Cache for Network Attached Storage *

UNDERSTANDING SCHOOL LEADERSHIP AND MANAGEMENT IN CONTEMPORARY NIGERIA

Requirements Engineering Databases: The Good, The Bad, and The Ugly

Encrypted TCP chat using RSA and AES algorithm

The Import-Export Paradigm for High-Quality College Courses

Performance of a Browser-Based JavaScript Bandwidth Test

2. METHOD DATA COLLECTION

your Rights Consumer Guarantees Understanding Consumer Electronic Devices, Home Appliances & Home Entertainment Products

Testing Documentation for CCIH Database Management System By: John Reeves, Derek King, and Robert Watts

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

Software Engineering Management: strategic choices in a new decade

Decoding Predictive Marketing AN INTRODUCTORY GUIDE

Products and Services

The Application of Information Technology in Mechanical, Civil and Automation Engineering

Despeckling Synthetic Aperture Radar Images with Cloud Computing using Graphics Processing Units

Thank you for attending the MDM for the Enterprise Seminar Series!

Scheduling of Jobs and Maintenance Activities on Parallel Machines

Warehouse Security System based on Embedded System

Strategic Plan of the Codex Alimentarius Commission

How To Control A Power Plant With A Power Control System

naifa Members: SERVING AMERICA S NEIGHBORHOODS FOR 120 YEARS

Applications of Risk Analysis in Border Security Niyazi Onur Bakir, University of Southern California

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

Availability of WDM Multi Ring Networks

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

CHAPTER 5 BROADBAND CLASS-E AMPLIFIER

Achieving Quality Through Problem Solving and Process Improvement

Tracking Control and Adaptive Local Navigation for Nonholonomic Mobile Robots

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Algorithms for Advance Bandwidth Reservation in Media Production Networks

Comparison of Scheduling Algorithms for a Multi-Product Batch-Chemical Plant with a Generalized Serial Network

Universidad de Colima Dirección General de Relaciones Internacionales y Cooperación Académica. List of courses taught in English 2016.

Unit 11 Using Linear Regression to Describe Relationships

HUMAN CAPITAL AND THE FUTURE OF TRANSITION ECONOMIES * Michael Spagat Royal Holloway, University of London, CEPR and Davidson Institute.

Report b Measurement report. Sylomer - field test

! Search engines are highly profitable. n 99% of Google s revenue from ads. n Yahoo, bing also uses similar model

DUE to the small size and low cost of a sensor node, a

A New Optimum Jitter Protection for Conversational VoIP

Free Enterprise, the Economy and Monetary Policy

Exposure Metering Relating Subject Lighting to Film Exposure

AN OVERVIEW ON CLUSTERING METHODS

Utility-Based Flow Control for Sequential Imagery over Wireless Networks

Towards Control-Relevant Forecasting in Supply Chain Management

Transcription:

SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S Chinatu Aone, Doug McKee, Sandy Shinn, Hatte Bleje r Sytem Reearch and Application (SRA ) 2000 15th Street North Arlington, VA 2220 1 aonec@ra.com INTRODUCTION In thi paper, we report SRA' reult on the MUC-4 tak and decribe how we trained our natural languag e proceing ytem for MUC-4. We alo report on what worked, what didn't work, and leon learned. Our MUC-4 ytem embed the SOLOMON knowledge-baed NLP hell which i deigned for both domain - independence and language-independence. We are currently uing SOLOMON for a Spanih and Japane e text undertanding project in a different domain. Although thi wa our firt year participating in MUC, w e have built and are currently building other data extraction ytem. RESULTS Our TST3 and TST4 reult are hown in Figure 1 and 2. The imilarity of thee core a well a thei r imilarity to SRA-internal teting reult reflect the portability of SRA' MUC-4 ytem. In fact, our cor e on the TST4 text wa better than that of TST3, even though thoe text covered a different time perio d than that of the training text or TST3. Our matched-only preciion and recall for both tet et were very high (TST3 : 68/47, TST4: 73/49). When SOLOMON recognized a MUC event, it did a very accurate and complete job at filling the requiit e template. SOLOMON performance wa tuned o that the all-template recall and preciion were a cloe a poibl e to maximize the F-Meaure. A hown in Figure 3, our F-Meaure teadily increaed over time. The fact that thi lope ha not yet leveled off how SOLOMON' potential for improvement. EFFORT SPENT We pent a total of 9 taff month tarting January 1, 1992 through May 31, 1992 on MUC-4. A takpecific breakdown of effort i hown in Figure 4. The bulk of the work wa pent porting SOLOMON t o a new domain with new vocabulary, concept, template-output format, and fill rule. Approximately 72% of the effort wa domain-dependent. However, about 63% of the total effort wa language-independent, i.e. it would be directly applicable to undertanding text about terrorim in any language. We expect that our Englih MUC-4 ytem could be ported to a new language in about 3 month, given a baic grammar, lexicon and preproceing data imilar to the one which exited for Englih. We partially demontrated thi 137

REC PRE OVG FA L MATCHED/MISSING 27 68 8 MATCHED/SPURIOUS 47 32 5 7 MATCHED ONLY 47 68 8 ALL TEMPLATES 27 32 5 7 TEXT FILTERING 71 85 15 2 3 F-MEASURES P&R 29.29 2P&R 30.86 P&2R 27.87 Figure 1 : TST3 Reult REC PRE OVG FA L MATCHED/MISSING 38 73 4 MATCHED/SPURIOUS 49 31 5 9 MATCHED ONLY 49 73 4 ALL TEMPLATES 38 31 5 9 TEXT FILTERING 91 75 25 3 5 F-MEASURES P&R 34.14 2P&R 32.19 P&2R 36.3 6 Figure 2: TST4 Reult claim by howing our MUC-4 ytem proceing Englih, Japanee and Spanih newpaper article about the murder of Jeuit priet at the demontration eion of MUC-4. We pent le than 2 week after the final tet adding MUC-pecific word to Spanih and Japanee lexicon, and extending the grammar of th e two language. Data 40% of the total effort building MUC-data wa pent on lexicon and KB entry acquiition. Much of thi dat a wa acquired automatically. We ued the upplied geographical data to automatically build location lexicon and KB. Uing the development template, we acquired lexical and KB entrie for clae of domain term uch a human and phyical target and terrorit organization. We automatically derived ubcategorization information for the domain verb from the development text (cf. [1]). Thee automatically acquired lexicon and KB did require ome manual cleanup and correction. Certain multi-word phenomena which occur frequently in text but are unuitable for general paring wer e handled by pattern matching during Preproceing. For example, we created pattern for Spanih phrae, complex location phrae, relative time, and name of political, military and terrorit organization. Modification to SOLOMON' broad-coverage Englih grammar included adding more emantic retriction, extending ome phrae-tructure rule, and improving general robutne. Baed on our knowledge engineering effort, we built a et of commonene reaoning rule that are decribed in detail in our ytem decription. Our EXTRACT module recognize MUC-relevant event in the output of SOLOMON and tranlate them into MUC-4 filled template. We implemented all the domainpecific information a mapping rule or imple converion function (e.g. numeric value like "at leat 5 " mean "5-" ). Thi data i tored in the knowledge bae, and i completely language independent. 13 8

o T20 ' 30 13{ M T T2 0 M T4 20 T2, 10 T2 i I 0 i I I I I I I I I I I. 1 0 100 200 300 400 500 600 700 500 000 1000 1100 1200 1300 1400 moo JAN 1 MAR 25 MAY 1 MAY17 MAY 31 Hour of Effort Imo 11 3125 517 5124 5125 5127 5/3 1 Noun 0 300 1240 1380 1400 1440 1500 TST2 0 11.43 19.48 2625 27.43 2525 TST3 2020 T8T4 34.14 1 Figure 3: Tracking SOLOMON Performanc e Tak Category ~ % of Total Effort DATA 7 1 Knowledge Engineering 1 3 Data Acquiition 3 0 Grammar 7 Pragmatic Inference Rule 1 1 Extract Data 1 0 PROCESSING - 2 9 Meage Zoning 3 Extract Extenion 7 Teting 1 0 Mic. Bug Fixing 10 Figure 4 : Breakdown of Effort Spent for MUC- 4 13 9

Procein g We pent 1 week porting our exiting Meage Zoner to deal with meage header in MUC meage. The Meage Zoner could already recognize more general meage tructure uch a paragraph and entence. We extended EXTRACT while maintaining domain and language independence of the module. Feature added included event merging and handling of flat MUC template intead of the more object-oriente d databae record that SOLOMON i accutomed to. Our time pent on fixing bug wa ditributed throughout the ytem, but problem in Debri Paring and Debri Semantic received the mot attention. SYSTEM TRAININ G We ued TST2 text for blind teting and the entire 1300 development text for both teting and trainin g material. The development et wa crucial to both our automated data acquiition and our knowledge engineering tak. We performed frequent teting to track and direct our progre. To raie recall, w e focued on data acquiition ; to raie preciion, we focued on tricter definition of "legal" MUC event. To improve overall performance, we focued on more robut yntactic and emantic analyi and mor e reliable event merging. LIMITING FACTOR S The two main limiting factor were the number of development text and template and the amount of tim e allotted for the MUC-4 effort. With more text, we could have applied other more data-intenive automate d acquiition technique and had more example of phenomena to draw upon. With more time, we would add more domain-dependent lexical knowledge and additional pragmatic inference rule. We alo need to tune our EXTRACT mapping rule more finely and improve our dicoure module for both NP reference an d event reference reolution. Integration of exiting on-line reource uch a machine-readable dictionarie, the World Factbook, or WordNet would alo improve ytem performance. A more extenive teting and evaluation trategy at both the blackbox and glabox level would help direct progre, but wa not feaibl e in the amount of time we had. WHAT WAS OR WAS NOT SUCCESSFU L There were everal area where hybrid olution worked very well. Totally automated knowledge acquiition wa quite ucceful when upplemented by manual checking and editing of domain-crucial information. Similarly, augmenting a pure bottom-up parer with "imulated top-down paring" (See SRA ' MUC-4 Sytem Decription) worked well. Improved Debri Semantic and ignificantly extended Pragmatic Inferencing wer e alo important contributor to the ytem' performance. REUSABILITY SRA' SOLOMON NLP ytem ha been deigned for portability and proven to be highly reuable. Thi include portability to other domain, other language, and other application. A hown in Figure 5, a larg e 140

SOLOMON Popmaimiu MiW Ylwr Mee fume. U luwm Wad 11maly Wmd4mn MMydp M N+eV [wwq AIM PTV Rr' Mrplrnd nwm k Wprpr«won P Wmnal, g noop fdediin arr1rip nmmmi O+Wrw+ dtl1miimimir11 ~JIiWH Dwainapadk PMT.: NMMU PT,rw IMIRSupw. nom HM~M ertl~o MUC Oi Mo le MU C Emma B lbw 41..dol Smomld SINS, Figure 5: MUC NLP Sytem Reuability part of SOLOMON ' data and almot all of the proceing module are completely reuable for NLP in othe r domain or language. Currently, our Spanih and Japanee data extraction project MURASAKI i uing, without modification, the ame proceing module and the core knowledge bae a thoe ued for MUC-4. The MURASAKI ytem procee Spanih and Japanee language newpaper and journal article a well a TV trancript. Thi project' domain i the AIDS dieae. Thu, the only difference between our MUC-4 ytem an d MURASAKI ytem i that the latter ue Spanih and Japanee lexicon, pattern and grammar, an d MURASAKI domain-dependent knowledge bae. SOLOMON ha alo been embedded in everal Engli h meage undertanding ytem : ALEXIS (operational) and WARBUCKS. LESSONS LEARNED AND REAFFIRMED BY MUC- 4 We have learned and reaffirmed the following point a the mot crucial apect of ucceful text under - tanding for data extraction. Overcoming the Knowledge Acquiition Bottleneck : We mut develop technique and tool for acquiring timely, complete, and proven ytem data. Solving the Paring Problem : We need more robut, emantically contrained yntactic analyi. Grammar mut be broad-coverage and highly accurate on complex input. Developing Sophiticated Dicoure Analyi : We mut handle real world dicoure phenomena foun d in actual text. The dicoure architecture mut be flexible enough to accommodate particular dicour e phenomena which are crucial in particular domain or language. MUC-4 ha reaffirmed our knowledge of what i involved in porting an NLP ytem to a new domain. 9 taff month i a bare minimum for uch an effort. Improved knowledge acquiition tool a well a 141

on-line reource are deirable. To enure good reult, it i neceary to have ufficient time for knowledg e engineering, teting and evaluation. Our experience undercore the fact that natural language undertandin g i a highly data-driven problem. The ytem' performance i often proportional to the level of undertandin g of the input and output. The MUC-4 development text and template were extremely helpful in thi regard. Reference [1] Doug McKee and John Maloney. Uing Statitic Gained from Corpora in a Knowledge-Baed NLP Sytem. In Proceeding of The AAAI Workhop on Statitically-Baed NLP Technique, 1992. 142