CÉLINE LE BAILLY DE TILLEGHEM. Institut de statistique Université catholique de Louvain Louvain-la-Neuve (Belgium)
|
|
- Francis Page
- 8 years ago
- Views:
Transcription
1 STATISTICAL CONTRIBUTION TO THE VIRTUAL MULTICRITERIA OPTIMISATION OF COMBINATORIAL MOLECULES LIBRARIES AND TO THE VALIDATION AND APPLICATION OF QSAR MODELS CÉLINE LE BAILLY DE TILLEGHEM Institut de statistique Université catholique de Louvain Louvain-la-Neuve (Belgium) Journée Jeunes Chercheurs - September 21st, 2007 p. 1/23
2 Context of the research Lead optimisation using combinatorial chemistry diabetes library provided by Eli Lilly and Company: combinatorial library composed of 3 R-groups and = compounds. Objective: select the most promising compounds Journée Jeunes Chercheurs - September 21st, 2007 p. 2/23
3 Proposed methodology: It gathers in a coherent framework existing and new tools of statistics and chemometrics, mainly: (-) the development and validation of QSAR models to predict drugability properties, (-) the definition of a desirability index to summarise those properties and assessment of the propagation of QSAR models predictions, and (-) an efficient algorithm to screen the combinatorial library and select the most promising compounds. Journée Jeunes Chercheurs - September 21st, 2007 p. 3/23
4 Problem description Construction of the combinatorial library chemists divide the lead and select reagents to add on each part: = compounds Definition of the objective - select the best combinatorial sublibrary of size n 1 n 2 n 3 (5 5 5) or - select the sublibrary with the m best compounds (m = 100) Definition of the optimised drugability properties (Y) - min Y 1 = quantity of substance to inject around the receptor R 1 to have a binding - min Y 2 = quantity of substance to inject around the receptor R 2 to have a binding - max Y 3 = quantity of substance to inject around the receptor R 3 to have a binding - max Y 4 = quantity of substance to inject around the receptor R 4 to have a binding Journée Jeunes Chercheurs - September 21st, 2007 p. 4/23
5 Problem description Definition of the available chemical descriptors (x) - descriptors are computed using a specific software on the basis of SMILES - groups of descriptors: describing the molecule as a whole (number of atoms, number of rings, molecular weight...), quantifying the overall charge distribution (total absolute charge, total positive charge, total negative charge...), measuring electrotopological properties, molecular surface properties, connectivity properties,... - More than 9000 molecular descriptors can be computed at Eli Lilly!!! Journée Jeunes Chercheurs - September 21st, 2007 p. 5/23
6 Proposed methodology: Journée Jeunes Chercheurs - September 21st, 2007 p. 6/23
7 QSAR models development QSARs (Quantitative Structure-Activity Relationships) are mathematical models approximating the link between chemical properties (x) and biological activities (Y) of compounds. Models assumptions For each optimised response, different QSAR models are assumed: - Mutliple Linear Regression (forward regression minimising BIC), - PLS Regression (minimise the bias-corrected 10-fold CV estimate of the MSEP), - binary Regression Tree + pruning (minimising a cost complexity measure based on the RSS and the splits number) + bagging Journée Jeunes Chercheurs - September 21st, 2007 p. 7/23
8 QSAR models development Data collection and models fit - 4 training sets after pretreatment and cleaning of the collected data : Observed molecules Available descriptors Descriptors kept after cleaning Y Y Y Y MLR, PLSR and RT are fitted on those 4 training sets, selecting entered explanatory variables as explained before. Journée Jeunes Chercheurs - September 21st, 2007 p. 8/23
9 QSAR models development Model selection and assessment - goodness-of-fit criteria MLR N K 1 R 2 Radj 2 S F-test p-value Y < Y < Y < Y < PLSR N K RY 2 RX 2 S Y Y Y Y Journée Jeunes Chercheurs - September 21st, 2007 p. 9/23
10 QSAR models development Model selection and assessment - goodness-of-fit criteria RT Bagging Bagging No pruning Pruning No pruning Pruning N K R 2 S K R 2 S R 2 S R 2 S Y Y Y Y Journée Jeunes Chercheurs - September 21st, 2007 p. 10/23
11 QSAR models development Model selection and assessment - Fitted vs observed MLR Y 1 Y 2 Y 3 Y 4 Y 1 Y 2 Y 3 Y 4 PLSR Journée Jeunes Chercheurs - September 21st, 2007 p. 11/23
12 QSAR models development Model selection and assessment - Fitted vs observed RT-no pruning Y 1 Y 2 Y 3 Y RT-pruning Y 1 Y 2 Y 3 Y Journée Jeunes Chercheurs - September 21st, 2007 p. 12/23
13 QSAR models development Model selection and assessment - Fitted vs observed RT-no pruning-bag Y 1 Y 2 Y 3 Y RT-pruning-bag Y 1 Y 2 Y 3 Y Journée Jeunes Chercheurs - September 21st, 2007 p. 13/23
14 QSAR models development Model selection and assessment - Internal predictive power : Q 2 = cross-validated R 2 Y 1 Y 2 Y 3 Y 4 RT bagging - no pruning RT bagging - pruning MLR PLSR RT pruning RT no pruning MLR models are selected - External validation if possible! Journée Jeunes Chercheurs - September 21st, 2007 p. 14/23
15 QSAR models development Applicability domain - Definition: the applicability domain is the set of molecules for which the QSAR model is valid. - Computation: descriptors ranges, convex hull, leverages, other distance measurements (Euclidean, Mahalanobis or L 1 distance), the Hotteling T 2, density measurements... Y 1 Y 2 Y 3 Y 4 LEVERAGE OBSERVATION NUMBER LEVERAGE OBSERVATION NUMBER LEVERAGE OBSERVATION NUMBER LEVERAGE OBSERVATION NUMBER Journée Jeunes Chercheurs - September 21st, 2007 p. 15/23
16 Proposed methodology: Journée Jeunes Chercheurs - September 21st, 2007 p. 16/23
17 Molecules optimisation Definition of the optimised criterion (DF and DI) - Multicriteria optimisation!!! - Desirability Functions: d 1 (Y 1 ) d 2 (Y 2 ) d 3 (Y 3 ) d 4 (Y 4 ) d 1 (Y 1 ) d 2 (Y 2 ) d 3 (Y 3 ) d 4 (Y 4 ) Y 1 Y 2 Y 3 Y 4 - Desirability Index of 1 molecule: E[D(Y x)] = E[ Q 4 i=1 (d i(y i x)) 1/4 ] - Loss of a sublibrary with m molecules: P m i=1 (1 E[D(Y x i)]) 2 /m - The best sublibrary is the sublibrary with the smallest loss Journée Jeunes Chercheurs - September 21st, 2007 p. 17/23
18 Molecules optimisation WEALD - WEALD (Weighted Exchanges Algorithm for Library Design) is an efficient algorithm to screen combinatorial libraries of molecules - Principle: select a sublibrary at random and perform exchanges between reagents to decrease the loss - Application of WEALD to select the 100 best compounds in the diabetes library: by exploring 4729 molecules (only 4.28% of the whole library), WEALD selects 100 compounds that are within the 105 best compounds of the library LOSS NUMBER OF EXPLORED MOLECULES Journée Jeunes Chercheurs - September 21st, 2007 p. 18/23
19 Molecules optimisation Uncertainty analysis - For all molecules explored by WEALD, drugability properties are estimated by the fitted QSAR models. Check for any explored molecule if it is in the applicability domains of the QSARs. Among the 4729 explored molecules, 1948 molecules (more than 41%) are outside at least one applicability domain. B QSAR models are often extrapolating! Journée Jeunes Chercheurs - September 21st, 2007 p. 19/23
20 Molecules optimisation Uncertainty analysis - For a given molecule with descriptors x 0, the desirability index is estimated: Ê[D(Y x 0 )]. Construct a confidence interval for E[D(Y x 0 )]. For the 4729 explored molecules, the average CI length is 0.12 but may vary from 0.04 to nearly 1! B Desirability indexes cannot be compared as if they were exact! Journée Jeunes Chercheurs - September 21st, 2007 p. 20/23
21 Molecules optimisation Uncertainty analysis - As the desirability indexes are estimated, some molecules are not significantly worse than the optimal one. (Indistinguishable Optimal Zone) For any explored molecules with descriptors x, test H 0 : E[D(Y x)] E[D(Y x opt )] against H 1 : E[D(Y x)] < E[D(Y x opt )]. Among the 4729 explored molecules, 230 molecules are not significantly worse than the optimal one. B Desirability indexes of two molecules are compared taking QSAR models prediction error into account. Journée Jeunes Chercheurs - September 21st, 2007 p. 21/23
22 Molecules optimisation Uncertainty analysis D^ M(x) TOP 100 : molecule out of at least one applicability domain : molecule included in all applicability domains. Green CI for E[D(Y x)]: molecules equivalent to the optimal one and Black CI for E[D(Y x)]: molecules significantly worse than the optimal one Journée Jeunes Chercheurs - September 21st, 2007 p. 22/23
23 Conclusion Integrated methodology to virtually screen combinatorial molecules libraries - QSAR models development - Desirability index - WEALD QSAR models should be validated - Goodness-of-fit - Internal and external predictivity - Applicability domain The uncertainty of the desirability indexes should be quantified - Confidence interval - Indistinguishable Optimal Zone Journée Jeunes Chercheurs - September 21st, 2007 p. 23/23
Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.
Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationA Statistician s View of Big Data
A Statistician s View of Big Data Max Kuhn, Ph.D (Pfizer Global R&D, Groton, CT) Kjell Johnson, Ph.D (Arbor Analytics, Ann Arbor MI) What Does Big Data Mean? The advantages and issues related to Big Data
More informationRunning Large Workflows in the Cloud
Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk The team: Jacek Cala, Hugo Hiden, Simon Woodman, David Leahy
More informationAnalysis and Interpretation of Clinical Trials. How to conclude?
www.eurordis.org Analysis and Interpretation of Clinical Trials How to conclude? Statistical Issues Dr Ferran Torres Unitat de Suport en Estadística i Metodología - USEM Statistics and Methodology Support
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationClassification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationStudying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationEvaluation of Quantitative Data (errors/statistical analysis/propagation of error)
Evaluation of Quantitative Data (errors/statistical analysis/propagation of error) 1. INTRODUCTION Laboratory work in chemistry can be divided into the general categories of qualitative studies and quantitative
More informationModel Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationChemical Risk Assessment in Absence of Adequate Toxicological Data
Chemical Risk Assessment in Absence of Adequate Toxicological Data Mark Cronin School of Pharmacy and Chemistry Liverpool John Moores University England m.t.cronin@ljmu.ac.uk The Problem Risk Analytical
More informationProspective Life Tables
An introduction to time dependent mortality models by Julien Antunes Mendes and Christophe Pochet TRENDS OF MORTALITY Life expectancy at birth among early humans was likely to be about 20 to 30 years.
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationNISS. Technical Report Number 105 March, 2000
NISS A Sequential Approach for Identifying Lead Compounds in Large Chemical Databases Markus Abt, Yong Bin Lim, Jerome Sacks, Minge Xie, and S. Stanley Young Technical Report Number 105 March, 2000 National
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationWe discuss 2 resampling methods in this chapter - cross-validation - the bootstrap
Statistical Learning: Chapter 5 Resampling methods (Cross-validation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2
More informationThe Mole Concept. The Mole. Masses of molecules
The Mole Concept Ron Robertson r2 c:\files\courses\1110-20\2010 final slides for web\mole concept.docx The Mole The mole is a unit of measurement equal to 6.022 x 10 23 things (to 4 sf) just like there
More informationReporting Low-level Analytical Data
W. Horwitz, S. Afr. J. Chem., 2000, 53 (3), 206-212, , . [formerly: W. Horwitz, S. Afr. J. Chem.,
More informationThe risks of mesothelioma and lung cancer in relation to relatively lowlevel exposures to different forms of asbestos
WATCH/2008/7 Annex 1 The risks of mesothelioma and lung cancer in relation to relatively lowlevel exposures to different forms of asbestos What statements can reliably be made about risk at different exposure
More informationQsarDB first 100 DOIs for predictive models
QsarDB first 100 DOIs for predictive models Uko Maran Institute of chemistry, University of Tartu, Estonia LOD: Content Data Predictive (and descriptive) models? Goal Components Persistent digital identifiers
More informationA Comparison of Variable Selection Techniques for Credit Scoring
1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationCheminformatics and Pharmacophore Modeling, Together at Last
Application Guide Cheminformatics and Pharmacophore Modeling, Together at Last SciTegic Pipeline Pilot Bridging Accord Database Explorer and Discovery Studio Carl Colburn Shikha Varma-O Brien Introduction
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number
application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000
More informationHOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14
HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design
More informationQ-edit: Documentation
Q-edit: Documentation Scope Q-edit is a new QPRF editor developed under OpenTox which aims at exploiting implemented web services to provide functionalities that facilitate the creation of QPRF reports
More informationMonitoring chemical processes for early fault detection using multivariate data analysis methods
Bring data to life Monitoring chemical processes for early fault detection using multivariate data analysis methods by Dr Frank Westad, Chief Scientific Officer, CAMO Software Makers of CAMO 02 Monitoring
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationData Mining Analysis of HIV-1 Protease Crystal Structures
Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.
More informationFuzzy Modeling of Labeled Point Cloud Superposition for the Comparison of Protein Binding Sites
Fuzzy Modeling of Labeled Point Cloud Superposition for the Comparison of Protein Binding Sites Thomas Fober Eyke Hüllermeier Knowledge Engineering & Bioinformatics Group Mathematics and Computer Science
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationCheminformatics and its Role in the Modern Drug Discovery Process
Cheminformatics and its Role in the Modern Drug Discovery Process Novartis Institutes for BioMedical Research Basel, Switzerland With thanks to my colleagues: J. Mühlbacher, B. Rohde, A. Schuffenhauer
More informationLUCKY AHMED Department of Chemistry and Biochemistry Yale University, New Haven, CT 06511 Email: lucky.ahmed@yale.edu
LUCKY AHMED Department of Chemistry and Biochemistry Yale University, New Haven, CT 06511 Email: lucky.ahmed@yale.edu EDUCATION PhD in Computational Chemistry Spring- Dissertation Title: Computational
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationCombinatorial Chemistry and solid phase synthesis seminar and laboratory course
Combinatorial Chemistry and solid phase synthesis seminar and laboratory course Topic 1: Principles of combinatorial chemistry 1. Introduction: Why Combinatorial Chemistry? Until recently, a common drug
More informationThe INFUSIS Project Data and Text Mining for In Silico Modeling
The INFUSIS Project Data and Text Mining for In Silico Modeling Henrik Boström 1,2, Ulf Norinder 3, Ulf Johansson 4, Cecilia Sönströd 4, Tuve Löfström 4, Elzbieta Dura 5, Ola Engkvist 6, Sorel Muresan
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationReal-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
More informationVirtual Met Mast verification report:
Virtual Met Mast verification report: June 2013 1 Authors: Alasdair Skea Karen Walter Dr Clive Wilson Leo Hume-Wright 2 Table of contents Executive summary... 4 1. Introduction... 6 2. Verification process...
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationAtomic Masses. Chapter 3. Stoichiometry. Chemical Stoichiometry. Mass and Moles of a Substance. Average Atomic Mass
Atomic Masses Chapter 3 Stoichiometry 1 atomic mass unit (amu) = 1/12 of the mass of a 12 C atom so one 12 C atom has a mass of 12 amu (exact number). From mass spectrometry: 13 C/ 12 C = 1.0836129 amu
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information4.2 Bias, Standards and Standardization
4.2 Bias, Standards and Standardization bias and accuracy, estimation of bias origin of bias and the uncertainty in reference values quantifying by mass, chemical reactions, and physical methods standard
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationValidation of measurement procedures
Validation of measurement procedures R. Haeckel and I.Püntmann Zentralkrankenhaus Bremen The new ISO standard 15189 which has already been accepted by most nations will soon become the basis for accreditation
More informationData Visualization in Cheminformatics. Simon Xi Computational Sciences CoE Pfizer Cambridge
Data Visualization in Cheminformatics Simon Xi Computational Sciences CoE Pfizer Cambridge My Background Professional Experience Senior Principal Scientist, Computational Sciences CoE, Pfizer Cambridge
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationList the 3 main types of subatomic particles and indicate the mass and electrical charge of each.
Basic Chemistry Why do we study chemistry in a biology course? All living organisms are composed of chemicals. To understand life, we must understand the structure, function, and properties of the chemicals
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationUse the Force! Noncovalent Molecular Forces
Use the Force! Noncovalent Molecular Forces Not quite the type of Force we re talking about Before we talk about noncovalent molecular forces, let s talk very briefly about covalent bonds. The Illustrated
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationTwo- and Three-Dimensional Quantitative Structure-Activity Relationships Studies on a Series of Diuretics
Latin American Journal of Pharmacy (formerly Acta Farmacéutica Bonaerense) Lat. Am. J. Pharm. 28 (6): 927-31 (2009) Short Communication Received: January 20, 2009 Accepted: July 31, 2009 Two- and Three-Dimensional
More informationScience Stage 6 Skills Module 8.1 and 9.1 Mapping Grids
Science Stage 6 Skills Module 8.1 and 9.1 Mapping Grids Templates for the mapping of the skills content Modules 8.1 and 9.1 have been provided to assist teachers in evaluating existing, and planning new,
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationThe Bi-Objective Pareto Constraint
The Bi-Objective Pareto Constraint Renaud Hartert and Pierre Schaus UCLouvain, ICTEAM, Place Sainte Barbe 2, 1348 Louvain-la-Neuve, Belgium {renaud.hartert,pierre.schaus}@uclouvain.be Abstract. Multi-Objective
More informationBinary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm
More informationFinite Differences Schemes for Pricing of European and American Options
Finite Differences Schemes for Pricing of European and American Options Margarida Mirador Fernandes IST Technical University of Lisbon Lisbon, Portugal November 009 Abstract Starting with the Black-Scholes
More informationSingle item inventory control under periodic review and a minimum order quantity
Single item inventory control under periodic review and a minimum order quantity G. P. Kiesmüller, A.G. de Kok, S. Dabia Faculty of Technology Management, Technische Universiteit Eindhoven, P.O. Box 513,
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationA hierarchical multicriteria routing model with traffic splitting for MPLS networks
A hierarchical multicriteria routing model with traffic splitting for MPLS networks João Clímaco, José Craveirinha, Marta Pascoal jclimaco@inesccpt, jcrav@deecucpt, marta@matucpt University of Coimbra
More informationEfficiency in Software Development Projects
Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software
More informationPharmacology skills for drug discovery. Why is pharmacology important?
skills for drug discovery Why is pharmacology important?, the science underlying the interaction between chemicals and living systems, emerged as a distinct discipline allied to medicine in the mid-19th
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationChapter 3 Quantitative Demand Analysis
Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGraw-Hill/Irwin Copyright 2010 by the McGraw-Hill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationACST829 CAPITAL BUDGETING AND FINANCIAL MODELLING. Semester 1, 2011. Department of Applied Finance and Actuarial Studies
ACST829 CAPITAL BUDGETING AND FINANCIAL MODELLING Semester 1, 2011 Department of Applied Finance and Actuarial Studies MACQUARIE UNIVERSITY FACULTY OF BUSINESS AND ECONOMICS UNIT OUTLINE Study Period:
More informationDe novo design in the cloud from mining big data to clinical candidate
De novo design in the cloud from mining big data to clinical candidate Jérémy Besnard Data Science For Pharma Summit 28 th January 2016 Overview the 3 bullet points Cloud based data platform that can efficiently
More informationHow to Biotinylate with Reproducible Results
How to Biotinylate with Reproducible Results Introduction The Biotin Streptavidin system continues to be used in many protein based biological research applications including; ELISAs, immunoprecipitation,
More informationThree Aspects of Predictive Modeling
Three Aspects of Predictive Modeling Max Kuhn, Ph.D Pfizer Global R&D Groton, CT max.kuhn@pfizer.com Outline Predictive modeling definition Some example applications Ashortoverviewandexample How is this
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationGerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
More informationInductive Data Mining: Automatic Generation of Decision Trees from Data for QSAR Modelling and Process Historical Data Analysis
18 th European Symposium on Computer Aided Process Engineering ESCAPE 18 Bertrand Braunschweig and Xavier Joulia (Editors) 2008 Elsevier B.V./Ltd. All rights reserved. Inductive Data Mining: Automatic
More informationOMCL Network of the Council of Europe QUALITY MANAGEMENT DOCUMENT
OMCL Network of the Council of Europe QUALITY MANAGEMENT DOCUMENT PA/PH/OMCL (12) 77 7R QUALIFICATION OF EQUIPMENT ANNEX 8: QUALIFICATION OF BALANCES Full document title and reference Document type Qualification
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationitesla Project Innovative Tools for Electrical System Security within Large Areas
itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France samir.issad@rte-france.com PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationCHEMICAL FORMULA COEFFICIENTS AND SUBSCRIPTS. Chapter 3: Molecular analysis 3O 2 2O 3
Chapter 3: Molecular analysis Read: BLB 3.3 3.5 H W : BLB 3:21a, c, e, f, 25, 29, 37,49, 51, 53 Supplemental 3:1 8 CHEMICAL FORMULA Formula that gives the TOTAL number of elements in a molecule or formula
More informationIntegrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
More informationAP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
More information