Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Similar documents
Simple vs. True. Simple vs. True. Calculating Empirical and Molecular Formulas

CHEMISTRY. Real. Amazing. Program Goals and Learning Outcomes. Preparation for Graduate School. Requirements for the Chemistry Major (71-72 credits)

Chemical safety and big data: the industry s demands

Dr Alexander Henzing

REGULATIONS FOR THE DEGREE OF BACHELOR OF PHARMACY IN CHINESE MEDICINE [PART-TIME] (BPharm[ChinMed])

Cheminformatics and its Role in the Modern Drug Discovery Process

A Statistician s View of Big Data

THE CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE (CCDC)

ANDREA MAURI. Address: Via Ombria, , Caprino Bergamasco (BG), Italy Tel Sex: Male Nationality: Italian EDUCATION

FACULTY OF VETERINARY MEDICINE

Data Visualization in Cheminformatics. Simon Xi Computational Sciences CoE Pfizer Cambridge

CHEMICAL SCIENCES REQUIREMENTS [61-71 UNITS]

COURSE TITLE COURSE DESCRIPTION

Pharmacology skills for drug discovery. Why is pharmacology important?

Chemistry & The Next Generation Science Standards

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Bachelor s Programme in Agricultural and Environmental Management Study line: Environmental Management. Total Study description

CTC Technology Readiness Levels

Kazan (Volga region) Federal University, Kazan, Russia Institute of Fundamental Medicine and Biology. Master s program.

CHEMISTRY, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN CHEMICAL SCIENCE

Information Management course

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Interpretation of Data (IOD) Score Range

PURE and APPLIED CHEMISTRY

CRIME SCENE FORENSICS

1 st day Basic Training Course

STATISTICS and PROBABILITY Definition Statistics is the science and practice of developing human knowledge through the use of empirical data

BIOSCIENCES COURSE TITLE AWARD

Chapter 8 How to Do Chemical Calculations

SCIENCE. Introducing updated Cambridge International AS & A Level syllabuses for. Biology 9700 Chemistry 9701 Physics 9702

Neuve_Communiqué_April_2009.pdf

CNAS ASSESSMENT COMMITTEE CHEMISTRY (CH) DEGREE PROGRAM CURRICULAR MAPPINGS AND COURSE EXPECTED STUDENT LEARNING OUTCOMES (SLOs)

GUIDELINES FOR THE VALIDATION OF ANALYTICAL METHODS FOR ACTIVE CONSTITUENT, AGRICULTURAL AND VETERINARY CHEMICAL PRODUCTS.

EUROPASS DIPLOMA SUPPLEMENT

Multivariate Tools for Modern Pharmaceutical Control FDA Perspective

Chapter Test B. Chapter: Measurements and Calculations

Chemistry Course Descriptions

Bachelor of Science in Biochemistry and Molecular Biology

Chemical Risk Assessment in Absence of Adequate Toxicological Data

Masters Learning mode (Форма обучения)

M.Sc. in Nano Technology with specialisation in Nano Biotechnology

GUIDELINES FOR THE REGISTRATION OF BIOLOGICAL PEST CONTROL AGENTS FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS

Biochemistry. Entrance Requirements. Requirements for Honours Programs. 148 Bishop s University 2015/2016

CPO Science and the NGSS

The Mole Concept. The Mole. Masses of molecules

Data, Measurements, Features

2. SUMMER ADVISEMENT AND ORIENTATION PERIODS FOR NEWLY ADMITTED FRESHMEN AND TRANSFER STUDENTS

UNIVERSITY OF MARIBOR FACULTY OF CHEMISTRY AND CHEMICAL ENGINEERING INFORMATION PACKAGE / INTERNATIONAL EXCHANGE STUDENTS' GUIDE 2016/2017.

Having regard to the Treaty establishing the European Economic Community, and in particular Article 100 thereof;

CHEMISTRY, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN BIOCHEMISTRY

Chapter 1: Chemistry: Measurements and Methods

STATUS OF NUCLEAR TECHNOLOGY EDUCATION IN MONGOLIA

Cheminformatics and Pharmacophore Modeling, Together at Last

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

CHEMISTRY What can I do with this major?

APPENDIX A. DEGREE FIELD LIST

SPECIFICATIONS AND CONTROL TESTS ON THE FINISHED PRODUCT

Nuclear Structure. particle relative charge relative mass proton +1 1 atomic mass unit neutron 0 1 atomic mass unit electron -1 negligible mass

is a knowledge based expert decision support tool for predicting the metabolic fate of chemicals in mammals.

COMPLEXITY RISING: FROM HUMAN BEINGS TO HUMAN CIVILIZATION, A COMPLEXITY PROFILE. Y. Bar-Yam New England Complex Systems Institute, Cambridge, MA, USA

Alterações empresariais sustentadas pelo conceito de engenharia do Produto Patrício Soares da Silva, MD, PhD

POGIL Lesson Plan. Author: Sui Sum Olson. Title: Understanding the Mole and Molar Mass Concepts

The following module is compulsory for students who do not have an A-level pass in Mathematics. CH1M Chemistry M 20 4

Edward Odenkirchen, Ph.D. Office of Pesticide Programs US Environmental Protection Agency

Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke

Quality by Design Concept

Eventuale spazio per nome struttura o altro. Pharmacy education. The Italian academic viewpoint

IMPURITIES IN NEW DRUG PRODUCTS

How do we build and refine models that describe and explain the natural and designed world?

COURSE SYLLABUS. Luis Hernandez Chemical & Environmental Building J TBA. luis.hernandez@harlingen.tstc.edu

In Silico Models: Risk Assessment With Non-Testing Methods in OSIRIS Opportunities and Limitations

Information Visualization WS 2013/14 11 Visual Analytics

Pre-Masters. Science and Engineering

ICH guideline Q11 on development and manufacture of drug substances (chemical entities and biotechnological/ biological entities)

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

MSc Applied Child Psychology

Chapter 7: Data Mining

Evaluation of Quantitative Data (errors/statistical analysis/propagation of error)

BOSTON UNIVERSITY SCHOOL OF PUBLIC HEALTH PUBLIC HEALTH COMPETENCIES

Graduate School of Science

MSc Finance & Business Analytics Programme Design. Academic Year

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP ENVIRONMENTAL SCIENCE

Interfaculty Graduate Environmental Sciences Program (IGESP)

ANTARES A new project for Alternative Methods and REACH

The INFUSIS Project Data and Text Mining for In Silico Modeling

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

THE ROYAL VETERINARY COLLEGE UNIVERSITY OF LONDON. Applies to the cohort commencing 2013

MSc in Toxicology. Master Degree Programme

Transcription:

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1 20126 Milano (Italy) The concept of molecular structure is one the most important concepts in the development of the scientific knowledge of the XX century. As a matter of fact the reasoning based on the molecular structure has been the main engine for the great development of physical chemistry, molecular physics, organic chemistry, quantum chemistry, chemical synthesis, polymer chemistry, medicinal chemistry, etc. By definition, a system is complex when its behaviour as a whole is not derivable from the properties of its parts: a molecule, together with its imbedded concept of molecular structure, exactly fulfils these conditions. In fact, molecule properties do not depend only on the properties of the component atoms but also on their mutual connections: it is in principle a holistic system, i.e. its emergent properties cannot be derived as the sum of the properties of its parts, but they are inherent to the whole molecule organisation and stability. As a consequence of its complexity, molecular structure cannot be represented by a unique formal model; several molecular representations can represent the same molecule, depending on the level of the underlying theoretical approach and these representations are often not derivable from each other. Figure 1 pag. 1

Different molecule representations were proposed, such as the 3-dimensional Euclidean representation, 2-dimensional representations based on the graph theory, or vectorial representations (fingerprints) where the frequencies of several molecular fragments are stored. Each representation constitutes a different conceptual model of the molecule and by each model different sources of chemical information become available. The molecule thought as a real object implicitly contains all the chemical information, but only a part of this information can be extracted by experimental measurements. Molecular descriptors are numbers able to extract small pieces of chemical information from the different molecule representations (Figure 1). The role of the molecular descriptors In the last decades, several scientific researches have been focussed on studying how to catch and convert by a theoretical pathway - the information encoded in the molecular structure into one or more numbers called molecular descriptors used to establish quantitative relationships between structures and properties, biological activities and other experimental properties. Figure 2 Therefore, molecular descriptors are now playing a key role in scientific research (Figure 2). In fact they are derived from several different theories, such as quantum chemistry, information theory, organic chemistry, graph theory, etc. and are applied in modelling several different properties in fields such as toxicology, analytical chemistry, physical chemistry, medicinal and pharmaceutical chemistry, environmental and toxicological studies and regulatory tools. Evidence of the interest of the scientific community in the molecular descriptors is provided by the huge number of descriptors proposed until today: more than 2000 of descriptors [1] are actually pag. 2

defined and computable by using dedicated software tools. Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule and, as a consequence, the large number of descriptors is continuously increasing with the increasing of the complexity of the investigated chemical systems. By now molecular descriptors are became among the most important variables used in molecular modelling, and as a consequence of that they have a strong relationship with statistics, chemometrics and chemoinformatics. Statistics, chemometrics and chemoinformatics are the fields where methods for data elucidation, data mining and modelling are developed. In particular, chemometrics since about 30 years has developed several classification and regression methods able to provide although not always - reliable models, both in reproducing the known experimental data and in predicting the unknown data. In fact, the modelling process has usually not only explanatory purposes, but in particular in these last years predictive purposes, i.e. it aims at developing models with reliable predictive qualities. Figure 3 It has to be noted that the use of the molecular descriptors provided a big change in the scientific paradigm. In fact, while until 30 years ago molecular modelling mainly consisted in searching for mathematical relationships between experimentally measured quantities, now it is mainly performed modelling a measured property by the use of molecular descriptors able to catch structural chemical information (Figure 3). pag. 3

To explain the complex relationships between molecules and observed quantities, two main streams were developed, the first related to the search for relationships between molecular structures and physico-chemical properties (QSPR, Quantitative Structure-Property Relationships) and the second between molecular structures and biological activities (QSAR, Quantitative Structure-Activity Relationships). The successes reached by using these approaches have encouraged the scientific community to apply them in other fields and relationships between molecular descriptors and environmental, toxicological, and technological properties are now widely investigated (Table 1). Table 1 Can mathematical models replace experiments? Several people and I am among them foresee that in the future several quantities will be obtained by using predictive mathematical models, avoiding heavy and expensive experimental measurements. This extreme idea is supported by the successes of well established strategies for building regression and classification models, and by the capabilities of the developed models to suggest new active molecules (drug design) and new molecules with the required technological properties, to build priority lists of molecules for toxicological risk, to help in understanding molecule modes of action, to evaluate the environmental risks. In more details, chemometric models are empirical mathematical relationships (functions f) obtained between a dependent experimental property (a measure y or a membership to a predefined class c) and some independent variables which are related to and relevant for the studied property: 1, 2, p or 1, 2, p y f x x x c f x x x In this context, the independent variables x are molecular descriptors and the models are built to reproduce to the greatest extent the known experimental responses by searching for relationships with these theoretical variables (Figure 4, step 1). The validation tools developed in these last 20 years allow to evaluate not only the degree of agreement between the calculated and experimental responses, but also to estimate the future model capability to predict unknown responses. pag. 4

If the predictive quality of the model is considered satisfactory, the model can be used for real future predictions of the modelled property (Figure 4 step 2), resulting in a significant cost and animal testing reduction. In fact, the cost of the using the model as a predictive tool (second step) is almost zero (except the costs of the operator time and the current for the PC)! Figure 4 Conclusions We can say that chemical research based on chemometric modelling and molecular descriptors is by now well established and further relevant results are expected. These expectations are in fact well founded on the two basic principles that 1) biological activities, as well as physico-chemical and chemical properties of organic compounds, are related to the molecular structure and that 2) similar compounds behave in similar way. The reality is unique and, certainly, models are an approximation of the reality; however, different models represent different perspectives of the same reality and some of them might be useful. The fact that reality is represented by more than one model is often considered as a weakness of the theory due to our still limited knowledge: the scientific goal should be to reach a unique accepted model. This philosophical position is in my opinion mistaken: being a model only one possible interpretation of the reality, the availability of several models simply reflects the complexity of the studied systems where each model is able to catch a part of the whole information, i.e. it is a point of view: in summary, it is better to watch at a beautiful landscape by different windows. References 1. R. Todeschini and V. Consonni: Handbook of Molecular Descriptors, WILEY-VCH, 2000. pag. 5