Integration of an open source rule engine to enhance the IHTSDO Workbench testing Dr. Guillermo Reynoso Dr. Alejandro Lopez Osornio termmed IT Buenos Aires, Argentina 2009 termmed SA
Terminology maintenance logic Definition: rules that allow inference based on terminological facts Any FSN description must include an appropriate semantic tag A retired concept must be primitive Finding site relationship type can only be used on the Clinical finding hierarchy
Terminology IDE Domain Model Logic Concepts Descriptions Relationships Refsets... Actions if... then... else... case... while... commit retire description draw screen display alert
Terminology IDE IDE Logic Draw interface Respond to button clicks Respond to Drag & Drop Object persistence Events management Data synchronization Profiles, preferences Fire validation process Business process engine Queues engine etc. Business Logic Privileges & authorization Workflow Conflict resolution Publishing Auditing trails Business process Terminology Logic QA rules Datachecks Batch Mapping Context awareness Supporting information Documents Glossary Translation memory etc. Application help
Terminology IDE Special requirements IDE Logic Draw interface Respond to button clicks Respond to Drag & Drop Object persistence Events management Data synchronization Profiles, preferences Fire validation process Business process engine Queues engine etc. Business Logic Privileges & authorization Workflow Conflict resolution Publishing Auditing trails Business process Terminology Logic QA rules Datachecks Batch Mapping Context awareness Supporting information Documents Glossary Translation memory etc. Application help
Computer-assisted testing of terminology maintenance A large set of rules that needs to be managed, localized, and used uniformly. Policies should be explicit, rules should be shared, expected outcomes and remedial actions need to be documented Terminology experts should understand and ideally be able to modify existing rules or add new ones
Standard Java logic If... then... else structures for complex rule flows are difficult to read and to maintain for( i = 0; i < oprops.length; i++){ pr=oprops[i]; if (pr.getname().equals("2.- DESCRIPTION") pr.getname().equals("1.- FSN") ){ Qualifiers=pr.getQualifiers(); did=""; uid=""; for (j=0;j<qualifiers.size() ;j++){ Qual=(Qualifier)Qualifiers.get(j); if (Qual.getName().equals("DescriptionID")){ if (!did.equals("")){ reason=ruleprops.getproperty("115","115 - Rule not found"); // reason="error code: 115. Terms cannot have more than one DescriptionID"; return true; } did=qual.getvalue(); } if (Qual.getName().equals("UID")){ if (!uid.equals("")){ reason=ruleprops.getproperty("116","116 - Rule not found"); // reason="error code: 116. Terms cannot have more than one UID"; return true; } uid=qual.getvalue(); } } if (!did.equals("")){ npr=null; bexists=false; for( h = 0; h < props.length; h++){ bdidok=false; buidok=false; npr=props[h]; nqualifiers=npr.getqualifiers(); for (k=0;k<nqualifiers.size() ;k++){ nqual=(qualifier)nqualifiers.get(k); if (nqual.getname().equals("descriptionid") && nqual.getvalue().equals(did)){ bdidok=true; } if (nqual.getname().equals("uid") && nqual.getvalue().equals(uid)){ buidok=true; } } if (bdidok && buidok && pr.getvalue().equals(npr.getvalue())&& pr.getname().equals (npr.getname())){ bexists=true; break; } } if (!bexists ){ reason =RuleProps.getProperty("117","117 - Rule not found"); reason=reason.replacefirst("xxxx",pr.getvalue()); reason=reason.replacefirst("yyyy",did ); reason=reason.replacefirst("zzzz",uid); // reason="error code: 117. Cannot find published term: " + pr.getvalue() + "\nwith DescriptionID=" + did + " And UID=" + uid; return true; } } } else{ if (pr.getname().equals("3.- DEFINITION") ){ Qualifiers=pr.getQualifiers(); uid=""; for (j=0;j<qualifiers.size() ;j++){ Qual=(Qualifier)Qualifiers.get(j); if (Qual.getName().equals("UID")){ if (!uid.equals("")){ reason=ruleprops.getproperty("116","116 - Rule not found");
But the IHTSDO Workbench includes a testing framework Real time testing is handled by the framework, including support for interactive fixes and automation Data checks are small test units that can easily be plugged into the WB API enables flexible batch testing in the back-end (mojos) or the front end
WB Data checks and Batch Testing Data checks: Flexible support for validating rules before or during the IDE commit (real-time, including standardized support for user interface and interactive fix up). Batch testing can be launched in the back-end or interactively, through the IDE business processes Pre Commit Datacheck Evaluate Alert To Data Constraint Failure Fix Options Fix Up Commit Continue Batch testing Custom test case using the API Flexible logging or reporting
All data checks in specified client folders are executed sequentially Datacheck Evaluate Alert To Data Constraint Failure Fix Options Fix Up Datacheck Evaluate Alert To Data Constraint Failure Fix Options Fix Up Datacheck Evaluate Alert To Data Constraint Failure Fix Options Fix Up Datacheck Evaluate Alert To Data Constraint Failure Fix Options Fix Up All data checks OK Continue / enable commit completion
Sequential execution Sequential execution of large collections of rules is ineffective when applied to a large set of rules (>200) Similar expressions are checked several times The rule is executed even when there are no artifacts that could satisfy its conditions Easy to add new rules not easy to review the logic of each one or make changes
Data check Standing issues the data check handles testing and user interaction Reusing the same test logic both for interactive and batch testing is challenging data checks are intended to be granular, flexible and highly configurable One data check ~ one rule; difficult to model dependencies without flow control data checks are executed sequentially Difficult to predict or optimize sequencing for complex rules; potential for duplicate testing of the same condition in different data checks Full evaluation is performed even if some conditions have no chance of being matched Difficult to modify test plan according to previous results Difficult to optimize test plan when new rules are added to a big rule base all data checks in specific folders are executed Difficult to accomodate different contexts without creating more folders and redistributing checks
Possible enhancement: Reuse logic by moving rule evaluation out of the data check and into a test library Enabling reuse is a step forward Sequencing and optimization still a significant challenge Pre Commit Datacheck Call rule evaluation logic Alert To Data Constraint Failure Fix Options Fix Up Commit Handle UI Continue Rule library Evaluate Evaluate Evaluate Batch testing Batch Test plan Call rule evaluation logic Flexible logging or reporting
Possible enhancement: Integration of a rule engine to manage test sequencing and optimization Reuse is enabled, optimization is possible, logic is moved to rule files Rule development is more complex Pre Commit Datacheck Call rule evaluation logic Alert To Data Constraint Failure Fix Options Fix Up Commit Handle UI Continue Rule library Evaluate Drools Rules text files Subversion Sync Evaluate Evaluate Batch testing Batch Test plan Call rule evaluation logic Flexible logging or reporting
Production rules system Production Memory (rules) Working Memory Concept Description Relationship Description Relationship }Agenda
Production rules system Production Memory (rules) Working Memory Concept Description Relationship Description Relationship }Agenda Inference Engine
Rules execution efficiency Rules Rete Tree battle proven algorithm test only rules with chances to succeed test each condition only once, even if its shared by many rules
Readable rules Standard Drools rules language (DRL) is readable with minimal programming experience, after a short training Rules are independent of the rest of the application source code rule "No invalid chars" when $description : I_DescriptionTuple(text matches ".*[@&\\+\"\\ \\{\\}].*") $results : SimpleResultsCollector() eval($description.getversion() == Integer.MAX_VALUE) then System.out.println("Error: Description '" + $description.gettext() + "' matches a forbidden character."); $results.getalerts().add("error: Description '" + $description.gettext() + "' matches a forbidden character."); end
Readable rules rule "No invalid chars" when $description : I_DescriptionTuple(text matches ".*[@&\\+\"\\ \\{\\}].*") $results : SimpleResultsCollector() eval($description.getversion() == Integer.MAX_VALUE) then System.out.println("Error: Description '" + $description.gettext() + "' matches a forbidden character."); $results.getalerts().add("error: Description '" + $description.gettext() + "' matches a forbidden character."); end LHS (Left hand side) Conditions
Readable rules rule "No invalid chars" when $description : I_DescriptionTuple(text matches ".*[@&\\+\"\\ \\{\\}].*") $results : SimpleResultsCollector() eval($description.getversion() == Integer.MAX_VALUE) then System.out.println("Error: Description '" + $description.gettext() + "' matches a forbidden character."); $results.getalerts().add("error: Description '" + $description.gettext() + "' matches a forbidden character."); end RHS (Right hand side) Actions
Possible enhancement: Human-Readable rules by defining a Domain Specific Language to model them Rules language can be simplified even further using : DSL Domain Specific Languages Decision Tables
Domain Specific Language Rules are written using a simple structured language expander mrcm-rule.dsl; rule "DUE TO" when a defining relationship has type "=42752001" and domain is not in "<<404684003, <<272379006" or target is not in "<<404684003, <<272379006" then notify "Infraction to MRCM: Wrong 'Due to' domain or target" end rule "FINDING SITE" when a defining relationship has type "=363698007" and domain is not in "<<404684003" or target is not in "<<91723000, <<280115004" then notify "Infraction to MRCM: Wrong 'Finding site' domain or target" end rule "ASSOCIATED MORPHOLOGY" when a defining relationship has type "=116676008" and domain is not in "<<404684003" or target is not in "<<49755003" then notify "Infraction to MRCM: Wrong 'Associated morphology' domain or target" end
Domain Specific Language A DSL definition converts to actual Drools logic
Decision Tables Allows expression of rules using a simple table of data Object Property Condition Relationship Conc. 1 Type Conc. 2 Action AlertList Due to "<<404684003" "=42752001" "<<404684003, <<272379006" Due to MRCM error Finding site Morphology... "<<404684003" "=363698007" "<<91723000, <<280115004" "<<404684003" "=116676008" "<<49755003" Finding Site MRCM error Morphology MRCM error
Workbench integration I_FixUp() AlertToDataConstraintFailure() Drools Rules text files Subversion Sync Datacheck Batch QA Business process Drools Knowledge Base (RulesLibrary class) Report / Issue ticket Drools Expert engine
Workbench integration
Workbench integration
Workbench integration
Sharing Rules are stored in simple.txt files Rules are read and interpreted from the bundle/rules/ folder on runtime Copy and paste modifies rules Subversion synchronization (SVN) is possible Rules repositories can eventually be implemented
Rules repositories
Rules repositories
Workbench Integration
Proof of concept: Current Status Basic data checks refactored to leverage rule engine integration Implementation is transparent to the user, WB UI is not affected Logic residing in the rule library can be reused from batch processes, enabling the maintenance of a single set of rules MRCM testing implementation using a Domain Specific Language runs both interactively and in batch
Other WB enhancement opportunities Ongoing research at termmed IT MRCM testing Optimizing batch execution Extending rule engine usage Design of complex rule flows and optimization To present context-sensitive help To drive complex, rule based UI features