Archetypes and ontologies to facilitate the breast cancer identification and treatment process Ainhoa Serna 1, Jon Kepa Gerrikagoitia 1, Iker Huerga, Jose Antonio Zumalakarregi 2 and Jose Ignacio Pijoan 3 1 Computer Science Depart, Mondragon Unibertsitatea, Loramendi nº4, Arrasate Gipuzkoa, Spain {aserna,ikgerrikagoitia}@eps.mondragon.edu 2 Chief of Health Management Service, 3 Chief of Resarch Unit, Hospital of Cruces, Bizkaia, Spain {jzumalakarregi,joseignacio.pijoanzubizarreta}@osakidetza.net Abstract. The breast cancer medical process is almost entirely achieved manually and there is an evident risk of human errors due to the eventual lack of experience of the staff (substitutions, sick leaves and other reasons). Another fact is that the correct fulfillment of process may depend on the personal attitude of the administrative staff. In order to guarantee the security of the patient the whole process will be automatically orchestrated, and monitored. The support for the solution will be a service oriented architecture combined with semantic web techniques (archetypes and ontologies) to infer knowledge from predefined rules to make the process secure. Keywords: Electronic healthcare records, clinical archetypes, ontologies, OWL, Web Services, breast cancer, prognostic factor. 1 Introduction The breast cancer is an increasing disease that affects many women in our country. Every year there are diagnosed 40 cancer cases per 100.000 queries, thus breast cancer is the most frequent malignant tumor in female population. The diagnostic of the breast cancer involves a great number of professionals in different assistance areas: family doctors, gynecologists, radiologists, pathologists, oncologists, administrative staff, and with diverse diagnostic and treatment resources that increases the complexity of the way to handle. It supposes an organizational challenge because many services of the health care service are involved and many different people are interacting to succeed with the diagnostic and therapeutic process. The weakness in the chain is that currently the process can only be made manually and the whole process that involves administrative work, appointments and so is managed by the doctors.
The different doctors that watch the patient during the assistance process can make decisions individually that can change the treatment even though there is a written protocol to proceed in a general way. Due to previously exposed reasons there is a basic need that is to have a decision making tool that communicates with the different IT systems and guaranties the reliability, control and monitoring of the medical process. This tool will manage efficiently the resources independently who the operator is and his attitude to work in order to minimize human errors. The decision making backed in artificial intelligence mechanisms plays an important role in this project because human errors due to lack of experience can be reduced and the addition of automatic inferred knowledge will add relevant value to the research. The use of ontologies is a key factor in this project because the knowledge of a doctor about a diagnostic is difficult to transfer because the knowledge is based on personal experience and the representation of this knowledge to explain to others is not homogeneous. The lack of a homogeneous representation of the knowledge is a problem in order to share and compare experiences and knowledge among professionals. 2 Development Based on the research done by Matthew Hardy Williams 1 (Integrating Ontologies and Argumentation for decision-making in breast cancer) we produce our ontology for the data obtained from Cruces Hospital. 2.1 Identifying and classifying the process The first step for the early detection of the breast cancer is self-exam, that should be part of the monthly health revision of every woman. If they notice any change in the breast they should go to their oncologist as soon as possible. If they are older than 40 years or they have a high risk of suffer breast cancer they should have a yearly mammography and a physical exploration by the doctor. If any of those tests has a minimal chance of cancer the doctor should make the pertinent test to know the stage of the cancer. There are different types 2 of tests. In the medical terminology the stage of the breast cancer is defined by three principles: T (size of the tumor), N (lymphatic nodes), M (metastasis). 1. T (Primary Tumor): - T0: there is no any evidence of tumor. - T1: tumor size is less or equal than 2 cm. - T2: tumor size is between 2 and 5 cm. 1 http://ieeexplore.ieee.org/xplore/login.jsp?url=/iel5/4410240/4410339/04410388.pdf?arnum ber=4410388 2 http://www.breastcancer.org/symptoms/testing/types/
- T3: tumor size is bigger than 5cm. 2. N (lymphatic nodes): - N0: There is no any lymphatic node affected. - N1: The cancer has spread from 1 to 3 lymphatic nodes. - N2: The cancer has spread from 4 to 9 lymphatic nodes. - N3: The cancer has spread to more than 9 lymphatic nodes. 3. M (presence of metastasis ): - M0: There is no metastasis. - M1: There is metastasis, so the cancer has spread to near organs. Based on these features (characteristics) defines the different stages 3 of the breast cancer. 2.2 Clinical Archetypes An archetype is a re-usable, formal model of a domain concept. The formal concept was originally described in detail in a paper by Thomas Beale 4. For example an archetype for "Breast cancer identification" is a model of what information should be captured for this kind of identification - usually primary tumor, lymphatic nodes and presence of metastasis and instrument or other protocol information. In general, they are defined for wide re-use, however, they can be specialized to include local particularities. For this approach, only those related with breast cancer s context should be used. The key benefits of archetypes include: Knowledge-enabled systems: the separation of information and knowledge concerns in software systems, allowing cheap, future-proof software to be built; Knowledge-level interoperability: the ability of systems to reliably communicate with each other at the level of knowledge concepts; Domain empowerment: the empowerment of domain specialists to define the informational concepts they work with, and have direct control over their information systems. Intelligent Querying: to be used at runtime to enable the efficient querying of data based on the structure of archetypes from which the data was created. To achieve the aim of this paper, OpenEHR 5 archetypes are used as the main basis to build OWL classes, subclasses and properties. Once these clinical archetypes are translated into OWL objects we combine them to set up our Breast Cancer Ontology which becomes the starting point of the inference process. By example Figure. 1 shows Breast Cancer Archetype: 3 http://www.breastcancer.org/treatment/planning/cancer_stage/ 4 http://www.openehr.org/publications/archetypes/archetypes_beale_web_2000.pdf 5 http://www.openehr.org
Breast cancer Primary tumor cm [size] Lymphatic nodes Number of nodes Metastasis Fig.1. Breast cancer Archetype 2.2. Ontology The following example shows a practical approach of a part of the ontology. The definition of MsJones is described in Figure 2: For example: Let s suppose the following significant data to identify the breast cancer. Ms Jones is an aged 50 plus woman, postmenopausal, 53 years old who after done the breast cancer test has a 5 cm tumor in her breast. She has more than 9 lymphatic nodes infected and there is no metastasis. <Women rdf:id="msjones"> <hasmetastasis> <Metastasis rdf:id="met_negative"> <hasresult rdf:datatype="http://www.w3.org/2001/xmlschema#string" >negative</hasresult> </Metastasis> </hasmetastasis> <hasage rdf:datatype="http://www.w3.org/2001/xmlschema#int" >53</hasAge> <rdf:type rdf:resource="#aged50plus"/> <rdf:type rdf:resource="#postmenopausal"/> <hastumor> <Bigger5cm rdf:id="bigger5cm_1"/> </hastumor> <haslymphnodes rdf:resource="#more9node_1"/> </Women> Fig.2. Ms. Jones Definition 2.3 Inference rules Based on this representation inference rules will be created to identify the cancer and the possible treatment. Using natural language the code would be: If the tumor is bigger than 5 cm it is considered T3 (rule1),if there are more than 9 nodes infected N3 (rule2), and the results of metastasis tests are negative (rule3). With these results the diagnostic is a surgery cancer in stage IIIC (rule4). The 5, 6 and 7 rules indicate the treatment details (drugs, duration) for the breast and lymphatic nodes.
2.4 Inferred knowledge Based on the data shown in previous sections and after applying the inference rules defined the results of the inferred knowledge are displayed in Figure 3: <rdf:description rdf:about="http://acl/bmv#msjones"> <j.0:hasage rdf:datatype="http://www.w3.org/2001/xmlschema#int">53</j.0:hasage> <j.0:haslymphnodes rdf:resource="http://acl/bmv#more9node_1"/> <j.0:hasmetastasis rdf:resource="http://acl/bmv#met_negative"/> <j.0:hastumor rdf:resource="http://acl/bmv#bigger5cm_1"/> <rdf:type rdf:resource="http://acl/bmv#postmenopausal"/> <rdf:type rdf:resource="http://acl/bmv#aged50plus"/> <rdf:type rdf:resource="http://acl/bmv#women"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#thing"/> <rdf:type rdf:resource="http://acl/bmv#adults"/> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdfschema#resource"/> <j.0:recommendeddrugtreatment>tamoxifen 40mg during 2 years</j.0:recommendeddrugtreatment> <j.0:lymphnodesrecommendedtreatment>radiation to supravicular and/or internal mamary lymph nodes and removed auxiliary lymph nodes</j.0:lymphnodesrecommendedtreatment> <j.0:breastrecommendedtreatment>modified radical mastectomy followed by radiation and lumpectomy plus radiation following chemotherapy to shrink a large single cancer</j.0:breastrecommendedtreatment> <j.0:hascancerstage>operableiiic</j.0:hascancerstage> <j.0:metastasis>no</j.0:metastasis> <j.0:haskindlymph>n3</j.0:haskindlymph> <j.0:haskindtumor>t3</j.0:haskindtumor> </rdf:description> Fig.3. Inferred knowledge As we can see the current state of the cancer for the patient has been inferred as well as the recommended treatment for the lymphatic nodes (radiation to supravicular and/or internal mammary lymph nodes and removed auxiliary lymph nodes), the pills doses and the duration of the treatment (Tamoxifen 40mg during 2 years). We have implemented a prototype with this part of the functionality to make a demonstration to non experts in semantic web following the W3C accessibility and usability recommendations. The inferred knowledge is represented in a more user-friendly format using web development and transformation techniques. The prototype shows the inferred knowledge with the detail of the stage of the cancer, possible treatment etc It is important to say that the presented solution will be used to help the doctor to make a decision on the diagnostic or treatment. The final decision will always be a human decision.
4. Conclusions We developed a case study based on a breast cancer guideline, and in order to make this feasible we have provided a simple prototype. We aimed to achieve the following: - Model the results of clinical trials, and the background knowledge that provides the terms used to describe the results of the trials. - Model arguments for both belief and decisions. - Take a piece of medical knowledge and represent knowledge at different levels of abstraction. - Represent the terms related to breast cancer in order to unify concepts. - A rapid access to the updated information of the patient that will improve the diagnostic and treatment. - The prototype will help the doctors in the decision making for diagnostic and treatment. References 1. Douglas K. Barry. The Object Database Handbook: How to Select, Implement, and Use Object-Oriented Databases. John Wiley and Sons, 1th edition, 1996. 2. Anita Burgun, Olivier Bodenreider, Christian Jacquelinet Issues in the Classification of Disease Instances with Ontologies, MIE 2006 3. Bibbo M. Comprehensive Cyropathology. W.B. Saunders Co. Philadelphia. 1997 4. J. Broekstra, A. Kampman, y F. van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema, 2002. 5. Tim Berners-Lee, James Hendler, y Ora Lassila. The semantic web. Scientic American, 284(5):34{43, May 2001. 6. Clark D. P. Thyroid Cytopathology. ESSENTIALS IN CYTOPATHOLOGY. Foreword by Edmund S. Cibas, M.D. Series Editor Dorothy L.Rosenthal. Springer 2005 7. Amarnath Gupta et alia: Towards a formalization of disease-specific ontologies for neuroinformatics, 2003. 8. T. R. Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 6(2):199{221, 1993. 9. M. Hardy Williams. Integrating Ontologies and argumentation for decision-making in breast cancer. Doctoral Thesis, University College London, 2008.