Rulex s Logic Learning Machines successfully meet biomedical challenges.



Similar documents
testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

Differential diagnosis of pleural mesothelioma using Logic Learning Machine

Differential diagnosis of pleural mesothelioma using Logic Learning Machine

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Social Media Mining. Data Mining Essentials

Targeting Specific Cell Signaling Pathways for the Treatment of Malignant Peritoneal Mesothelioma

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Chapter 12 Discovering New Knowledge Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

life science data mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining - Evaluation of Classifiers

Nuevas tecnologías basadas en biomarcadores para oncología

Introduction to Data Mining

Medical Informatics II

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

PharmaSUG2011 Paper HS03

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

1. Classification problems

CHAPTER 2: UNDERSTANDING CANCER

Maschinelles Lernen mit MATLAB

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

Data Mining and Machine Learning in Bioinformatics

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

International Journal of Software and Web Sciences (IJSWS)

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Real Time Data Analytics Loom to Make Proactive Tread for Pyrexia

Big Data Analytics for Healthcare

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining On Diabetics

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

2010 ACR/EULAR Classification Criteria for Rheumatoid Arthritis

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

An Introduction to Advanced Analytics and Data Mining

DATA MINING AND REPORTING IN HEALTHCARE

Disease/Illness GUIDE TO ASBESTOS LUNG CANCER. What Is Asbestos Lung Cancer? Telephone

Pentaho Data Mining Last Modified on January 22, 2007

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Mining using Artificial Neural Network Rules

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

Introduction to Pattern Recognition

Non-invasive diagnosis of pleural malignancies: The role of tumour markers

The Data Mining Process

Chapter 6. The stacking ensemble approach

How To Predict Diabetes In A Cost Bucket

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Data Mining Part 5. Prediction

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

A leader in the development and application of information technology to prevent and treat disease.

not possible or was possible at a high cost for collecting the data.

Integration of biospecimen data with clinical data mining

: Introduction to Machine Learning Dr. Rita Osadchy

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

Early Detection Research Network: Validation Infrastructure. Jo Ann Rinaudo, Ph.D. Division of Cancer Prevention

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES

Data Isn't Everything

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

8. Machine Learning Applied Artificial Intelligence

Guidelines for using V-CODES (Status Codes)

An Introduction to Data Mining

Evolutionary Tuning of Combined Multiple Models

micrornas Non protein coding, endogenous RNAs of 21-22nt length Evolutionarily conserved

Smarter Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

Introduction Objective Methods Results Conclusion

COMPARATIVE ANALYSIS OF DATA MINING TECHNIQUES FOR MEDICAL DATA CLASSIFICATION

co-sponsored by the Health & Physical Education Department, the Health Services Office, and the Student Development Center

A Content based Spam Filtering Using Optical Back Propagation Technique

Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network

Azure Machine Learning, SQL Data Mining and R

Update on Mesothelioma

How To Use Neural Networks In Data Mining

Mining Health Data for Breast Cancer Diagnosis Using Machine Learning

Data Mining. Nonlinear Classification

Machine Learning with MATLAB David Willingham Application Engineer

Comparison of Six Classification Techniques for Post Operative Patient data in the Medicable discipline

Healthcare Measurement Analysis Using Data mining Techniques

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Multivariate Tools for Modern Pharmaceutical Control FDA Perspective

Report series: General cancer information

APPLICATION OF DATA MINING TECHNIQUES FOR THE DEVELOPMENT OF NEW ROCK MECHANICS CONSTITUTIVE MODELS

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

A survey on Data Mining based Intrusion Detection Systems

Learning is a very general term denoting the way in which agents:

Survey of clinical data mining applications on big data in health informatics

Database Marketing, Business Intelligence and Knowledge Discovery

Rare Thoracic Tumours

Data Mining Fundamentals

.org. Osteochondroma. Solitary Osteochondroma

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Estimates of the impact of extending the scope of the Mesothelioma payment scheme. December 2013

Diagnostic Challenge. Department of Pathology,

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Microarray Data Mining: Puce a ADN

Transcription:

Rulex s Logic Learning Machines successfully meet biomedical challenges. Rulex is a predictive analytics platform able to manage and to analyze big amounts of heterogeneous data. With Rulex, it is possible, by means of machine learning methods, to create models that describe the data provided by the user, without any prior information. The models generated by Rulex can be used to forecast the behavior of the system in future situations. Besides standard statistical and machine learning methods, Rulex incorporates innovative approaches, named Logic Learning Machines (LLMs): the peculiarity of LLMs consists in the possibility of generating intelligible rules about the problem at hand. As a matter of fact, models produced by standard machine learning methods are able to forecast the future behaviors but, being described by complex equations, cannot provide to the user an insight of the studied system. On the contrary, the LLMs approach generates a set of threshold rules that can be easily understood by a human being. Consider, for example, the case of medical diagnosys. While standard machine learning methods provide the state of a patient (e.g. ill or healthy) as a complex function of the inputs (e.g. blood_pressure, markers etc ), LLMs produce rules in the form: IF blood_pressure > x AND lymphoid_cell_concentration > y AND.. then Patient is sick where x and y are thresholds automatically determined by the algorithm. The number of conditions in a rule is variable and depends on the complexity of the considered problem. The rules generated by Rulex-LLMs can be interpreted by a physician and, in case, adjusted on the basis of her experience. Moreover, Rulex-LLMs, with no additional effort, a ranking of the most relevant features in determining, for example, if a patient is sick or healthy. Thus, it is possible to discover if a variable is not important and, in case, to remove it from the list of the quantities to be monitored.

Rulex-LLMSs has been employed to solve problems related to different application fields. In particular, several applications in the biomedical area have been carried out successfully: some examples are reported below. 1. Extraction of rules for pleural mesothelioma diagnosis Malignant pleural mesothelioma (MPM) is a rare highly fatal tumor, whose incidence is rapidly increasing in developed countries due to the widespread past exposure to asbestos in environmental and occupational settings. The correct diagnosis of MPM is often hampered by the presence of atypical clinical symptoms that may cause misdiagnosis with either other malignancies (especially adenocarcinomas) or benign inflammatory or infectious diseases (BD) causing pleurisies. Cytological examination (CE) may allow to identify malignant cells, but sometimes a very high false negative proportion may be encountered due to the high prevalence of non-neoplastic cells. Moreover, in most cases a positive result from CE examination only does not allow to distinguish MPM from other malignancies [3]. Many tumor markers (TM) have been demonstrated to be useful complementary tools for the diagnosis of MPM. In particular, recent investigations analyzed the concentrations of three tumor markers in pleural effusions, namely: the soluble mesothelin-related peptide (SMRP), CYFRA 21-1 and CEA, and their association with a differential diagnosis of MPM, pleural metastasis from other tumors (MTX) and BD. SMRP showed the best performance in separating MPM from both MTX and BD, while high values of CYFRA 21-1 were associated to both MPM and MTX. Conversely, high concentrations of CEA were mainly observed in patients with MTX. Taken together, these results indicate that information from the three considered markers and from CE might be combined together in order to obtain a classifier to separate MPM from both MTX and BD. In this context, Rulex has been applied for the differential diagnosis of MPM by identifying simple and intelligible rules based on CE and TM concentration. The results have been compared to those obtained by other supervised methods showing that Rulex outperforms all the competing approaches (Decision Trees, K-Nearest Neighbors and Artificial Neural Networks). [Bibliography: S. PARODI, R. FILIBERTI, P. MARRONI, R. LIBENER, G.P. IVALDI, M. MUSSAP, E. FERRARI, C. MANNESCHI, E. MONTANI, M. MUSELLI Differential diagnosis of pleural mesothelioma using Logic Learning Machine. Submitted to BMC Bioinformatics (2014).]

2. Extraction of a simplified gene expression signature for neuroblastoma prognosis. Cancer patient s outcome is written, in part, in the gene expression profile of the tumor. In this study, a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma was previously identified and showed to stratify neuroblastoma patients in good and poor outcome. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. In this paper, Rulex was applied to gene expression data; in particular the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones was employed as a pre-processing step for rule extraction by means of Logic Learning Machine. The application of Rulex-LLMs produced 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the Rulex-LLMs applied to microarray data and patients classification. Rulex-LLMs performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting outcome among the 62 of the NB-hypo signature. Rules are easily interpretable as they involve only few conditions. [Bibliography: D. CANGELOSI, M. MUSELLI, S. PARODI, F. BLENGIO, J. KOSTER, A. SCHRAMM, A. GARAVENTA, C. GAMBINI, L. VARESIO Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients. To appear on BMC Bioinformatics (2014).]

3. Extraction of intelligible rules concerning the prognosis of neuroblastoma. Neuroblastoma is the most common pediatric solid tumor. About fifty percent of high risk patients die despite treatment making the exploration of new and more effective strategies for improving stratification mandatory. Hypoxia is a condition of low oxygen tension occurring in poorly vascularized areas of the tumor associated with poor prognosis. The aim of this study was the development of a prognostic classifier of neuroblastoma patients outcome blending existing knowledge on clinical and molecular risk factors with the prognostic NB-hypo signature. Classifiers outputting explicit rules, that could be easily translated into the clinical setting, are particularly interesting in this context. Rulex-LLMs exhibited a good accuracy and promised to fulfill the aims of the work. This algorithm was utilized to classify NB-patients on the bases of the following risk factors: Age at diagnosis, INSS stage, MYCN amplification and NBhypo. The algorithm generated explicit classification rules in good agreement with existing clinical knowledge.through an iterative procedure, the examples causing instability in the rules were identified and removed from the dataset. This workflow generated a stable classifier, very accurate in predicting good and poor outcome patients. The good performance of the classifier was validated in an independent dataset. NB-hypo was an important component of the rules with a relevance similar to that of tumor staging. [Bibliography: D. CANGELOSI, F. BLENGIO, R. VERSTEEG, A. EGGERT, A. GARAVENTA, C. GAMBINI, M. CONTE, A. EVA, M. MUSELLI, L. VARESIO Logic Learning Machine creates explicit and stable rules stratifying neuroblastoma patients. BMC Bioinformatics 14:S12 (2013).]

4. Validation of a new classification for Multiple Osteochondromas patients. Multiple osteochondromas (MO), previously known as hereditary multiple exostoses (HME), is an autosomal dominant disease characterized by the formation of several benign cartilage-capped bone growth defined osteochondromas or exostoses. Various clinical classifications have been proposed but a consensus has not been reached. The aim of this study was to validate (using a machine learning approach) an easy to use tool to characterize MO patients in three classes according to the number of bone segments affected, the presence of skeletal deformities and/or functional limitations. The proposed classification has been validated (with a highly satisfactory mean accuracy) by analyzing 150 different variables on 289 MO patients through Rulex LLMs. This approach allowed us to identify ankle valgism, Madelung deformity and limitation of the hip extra-rotation as tags of the three clinical classes. In conclusion, the proposed classification provides an efficient system to characterize this rare disease and is able to define homogeneous cohorts of patients to investigate MO pathogenesis. [Bibliography: M. MORDENTI, E. FERRARI, E. PEDRINI, N. FABBRI, L. CAMPANACCI, M. MUSELLI, L. SANGIORGI Validation of a New Hereditary Multiple Exostoses Classification Through Switching Neural Networks. American Journal of Medical Genetics 161 (2013) 556 560 DOI: 10.1002/ajmg.a.35819.]

5. Benchmarking Rulex-LLMs performances on standard biomedical datasets. In this study, Rulex-LLMs were applied to three benchmark datasets regarding different biomedical problems. The datasets, are taken from the UCI archive, a collection of data for machine learning benchmarking, and include: Diabetes: it regards the problem of diagnosing diabetes starting from the values of 8 variables: all the 768 considered patients are females at least 21 years old of Pima Indian heritage: 268 of them are cases whereas remaining 500 are controls. Heart: it deals with the detection of heart disease from a set of 13 input variables concerning patient status; the total sample of 270 elements is formed by 120 cases and 150 controls. DNA: it has the aim of recognizing acceptors and donors sites in a primate gene sequences with length 60 (basis); the dataset consists of 3186 sequences, subdivided into three classes: acceptor, donor, none. Rulex-LLMs performances were compared to those of other supervised methods, namely Decision Trees (DT), Artificial Neural Networks (ANN), Logistic Regression (LR) and K- Nearest Neighbor (KNN). These tests showed that Rulex-LLMs results are better than those of ANN, DT (that produce rules) and KNN, and are comparable with those of LR. [Bibliography: M. MUSELLI Extracting knowledge from biomedical data through Logic Learning Machines and Rulex. EMBnet Journal 18B (2012), 56 58.] Bibliography [1] S. PARODI, R. FILIBERTI, P. MARRONI, R. LIBENER, G.P. IVALDI, M. MUSSAP, E. FERRARI, C. MANNESCHI, E. MONTANI, M. MUSELLI Differential diagnosis of pleural mesothelioma using Logic Learning Machine. Submitted to BMC Bioinformatics (2014). [2] D. CANGELOSI, M. MUSELLI, S. PARODI, F. BLENGIO, J. KOSTER, A. SCHRAMM, A. GARAVENTA, C. GAMBINI, L. VARESIO Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients. To appear on BMC Bioinformatics (2014). [3] D. CANGELOSI, F. BLENGIO, R. VERSTEEG, A. EGGERT, A. GARAVENTA, C. GAMBINI, M. CONTE, A. EVA, M. MUSELLI, L. VARESIO Logic Learning Machine creates explicit and stable rules stratifying neuroblastoma patients. BMC Bioinformatics 14:S12 (2013). [4] M. MORDENTI, E. FERRARI, E. PEDRINI, N. FABBRI, L. CAMPANACCI, M. MUSELLI, L. SANGIORGI Validation of a New Hereditary Multiple Exostoses Classification Through Switching Neural Networks. American Journal of Medical Genetics 161 (2013) 556 560 DOI: 10.1002/ajmg.a.35819. [5] M. MUSELLI Extracting knowledge from biomedical data through Logic Learning Machines and Rulex. EMBnet Journal 18B (2012), 56 58.