Tamoxifen Resistance in Breast Cancer
|
|
- Daisy Brooks
- 7 years ago
- Views:
Transcription
1 Gene Expression Analysis : Tamoxifen Resistance in Breast Cancer B. Haibe-Kains 1,2 G. Bontempi 2 C. Sotiriou 1 1 Unité Microarray, Institut Jules Bordet 2 Machine Learning Group, Université Libre de Bruxelles September 29, 2006 Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
2 Biological Question and Datasets Biological question : Can we predict which patients will resist to the Tamoxifen treatment in an adjuvant setting? Tamoxifen treated patients coming from 3 different institutions, can we pool the data? gene-expressions : normalization survival data : model fitting. 255 eligible patients (samples) probes (variables) Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
3 Survival Data Tamoxifen treated patients survival curves Tamoxifen treated patients hazard functions Probability of survival OXFT KIT GUYT 99 patients (19 events) 69 patients (20 events) 87 patients (28 events) Hazard Rate OXFT KIT GUYT 99 patients (19 events) 69 patients (20 events) 87 patients (28 events) Time Time Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
4 Tamoxifen Analysis Raw gene expession data cross-validation Normalization Data transformation Feature selection Model fitting In order to keep the design simple, we fix : the number of models to aggregate (see feature selection step) the cutoff selection (see the model fitting step). Performance assessment Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
5 Normalization Goal : reduce the systematic variability between samples. Method : normalization. Procedure 1 Background correction, expression quantification and normalization were performed using Robust Multichip Average [Irizarry et al., 2003, Bolstad et al., 2003]. 2 RMA performed separately per population 3 Gene median centering per population. No artifact highlighted by unsupervised clustering. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
6 Data Transformation Goal : reduce the dimensionality of the problem. Method : cluster highly correlated variables. Procedure 1 Use of an independent dataset of untreated patients (137 patients, probes) 2 Filtering of the less variant probes. 3 Hierarchical clustering (average linkage, uncentered Pearson correlation). 4 Cut the tree at height Keep only clusters with at least 5 known UniGenes. 110 clusters where half are significantly associated with survival (on the Tamoxifen dataset). Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
7 Data Transformation Feature Pairwise Correlation Histogram of pairwize correlations Frequency pairwize correlation mean: median: std: range: [ 0.651,1] Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
8 Feature Selection Goal : fast selection of the relevant features. Method : ranking based on univariate model significance. Procedure 1 For each feature, compute the likelihood ratio test of the univariate Cox model. 2 Perform the ranking based on the p-value. 3 Select the top n features (n is fixed). n features are selected. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
9 Model Fitting Goal : fit a simple model with low variance. Method : aggregation of n univariate classifiers. Procedure 1 The univariate models for the n top features were computed during the previous step. 2 Compute a linear combination of these classifiers with weight of 1. Continuous score representing the risk of a patient. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
10 Performance Assessment We do binary classification from survival data. We can not use traditional statistics (sensitivity, specificity, χ 2 test,... ) Adaptation of such estimators to deal with censoring. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
11 Survival Statistics for 2 Groups There exist several ways to assess difference in survival between 2 groups Kaplan-Meier estimator and Logrank test : KM method estimates survivor function such that # death at time t j Ŝ(t) = {}}{ d j [1 ]. n j j:t j t }{{} # at risk at time t j logrank method tests H 0 : S 1 (t) = S 2 (t) t 0. Hazard ratio (HR) : relative hazard between 2 groups using Cox model with one dummy variable (G = 0/1 for low and high-risk groups). Time-dependent ROC Curves and area under ROC curves : extension of traditional ROC curves dealing with censoring data. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
12 Performance in LOO Logrank and Time-Dependent ROC Curve Survival curves COXSM13 (LOO classification) Tamoxifen treated patients time dependent ROC curves (5 yrs) Probability of survival logrank = 1.041e 07 loglik = e 06 low risk 179(33) high risk 76(34) sensitivity COXSM13 (AUC=0.767) Time 1 specificity Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
13 Performance Hazard Ratio and Logrank Training set Test set Hazard ratio Log-rank p OXFT (99/19) KIT/GUYT (156/48) 2.17 [1.2,3.91] 8.46e-3 KIT (69/20) OXFT/GUYT (186/47) 4.07 [2.23,7.41] 7.98e-7 GUYT (87/28) OXFT/KIT (168/39) 5.93 [3,11.75] 1.24e-9 KIT/GUYT (156/48) OXFT (99/19) [5.38,39.52] 1.74e-11 OXFT/GUYT (186/47) KIT (69/20) 3.44 [1.36,8.67] 5.27e-3 OXFT/KIT (168/39) GUYT (87/28) 2.23 [1.05,4.71] 3.11e-2 Leave-one-out c.v [2.32;6.41] 1.04e-7 Multiple 10-fold c.v [2.66,3.84] 9.04e-7 Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
14 Performance wrt Signature Size Hazard ratio (CI) wrt signature size 10FOLD CV Logrank p value wrt signature size 10FOLD CV log2 hazard ratio log10 logrank p value signature.size signature.size Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
15 Performance vs Random At level of performance estimation : 1000 random permutations of the labels and perform the whole procedure only 1% of such classifications gives better discrimnation. At level of feature selection : random selection of n features most of the feature selection result in good classifiers because of the high proportion of relevant features. use of the anti -ranking very poor performance. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
16 Feature Selection Stability Signature stabilit COXSM13 (10FOLDCV) pclust.79 pclust.148 pclust.120 pclust.112 pclust.375 pclust.201 pclust.859 pclust.521 pclust.784 pclust.360 pclust.231 pclust.110 pclust.337 pclust.971 pclust.13 pclust.1008 pclust.606 pclust.289 pclust.435 pclust.88 pclust.34 pclust.425 pclust.752 pclust.851 pclust.714 pclust.26 pclust.207 pclust.350 pclust.968 pclust.692 pclust.709 pclust.1318 pclust.17 pclust.399 pclust.252 pclust.188 pclust.149 pclust.43 pclust.146 pclust.62 pclust.177 pclust.667 pclust.462 pclust.181 Over the multiple 10-fold cross-validations, several top n features are selected If the same set of features are selected every time, we see high relative frequency for these features relative frequency Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
17 Feature Selection Stability Signature stabilit COXSM13 (10FOLDCV) pclust.79 pclust.148 pclust.120 pclust.112 pclust.375 pclust.201 pclust.859 pclust.521 pclust.784 pclust.360 pclust.231 pclust.110 pclust.337 pclust.971 pclust.13 pclust.1008 pclust.606 pclust.289 pclust.435 pclust.88 pclust.34 pclust.425 pclust.752 pclust.851 pclust.714 pclust.26 pclust.207 pclust.350 pclust.968 pclust.692 pclust.709 pclust.1318 pclust.17 pclust.399 pclust.252 pclust.188 pclust.149 pclust.43 pclust.146 pclust.62 pclust.177 pclust.667 pclust.462 pclust.181 partial ranking area Over the multiple 10-fold cross-validations, several top n features are selected If the same set of features are selected every time, we see high relative frequency for these features. We can compute the area under partial ranking w.r.t. the number of selected features relative frequency Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
18 Stability wrt Signature Size Partial ranking stability wrt signature size 10FOLD CV Area Under Partial Ranking n = 13 seems to be a good trade-off between the number of selected features (signature) and its stability Signature Size Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
19 Final Model Use the same method with all the samples. The result is a model with a set of features. The model and the features are expected from previous results. pclust.120 pclust.360 pclust.201 pclust.148 pclust.521 pclust.784 pclust.375 pclust.231 pclust.110 pclust.337 pclust.112 pclust.79 pclust.859 pclust.120 pclust.360 pclust.201 pclust.148 pclust.521 pclust.784 pclust.375 pclust.231 pclust.110 pclust.337 pclust.112 pclust.79 pclust.859 Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
20 Future Works Study the stability of the initial clustering (data transformation step). Use of Gene Ontology to study the signature in a biological point of view. Objective comparison with the traditional clinical variables. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
21 Current Research Interests Meta-analysis. Ranking statistics. Input space transformation (using biological knowledge, such that gene list enrichment or GO). Optimization framework for binary classification of survival data. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
22 Thank you for your attention. Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
23 Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2): Irizarry, R. A., Boldstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. R. (2003). Summaries of affymetrix genechip probe level data. Nucleic Acids Research, 31(4). Benjamin Haibe-Kains (ULB) IRIDIA Microarray Meeting September 29, / 21
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationPersonalized Predictive Medicine and Genomic Clinical Trials
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationFrozen Robust Multi-Array Analysis and the Gene Expression Barcode
Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationStep-by-Step Guide to Basic Expression Analysis and Normalization
Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationBuilding risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationA Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias
A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias B. M. Bolstad, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5 Group in Biostatistics, University
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationSurvival Analysis of Dental Implants. Abstracts
Survival Analysis of Dental Implants Andrew Kai-Ming Kwan 1,4, Dr. Fu Lee Wang 2, and Dr. Tak-Kun Chow 3 1 Census and Statistics Department, Hong Kong, China 2 Caritas Institute of Higher Education, Hong
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More informationSurvey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationData Analysis, Research Study Design and the IRB
Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationRow Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationLinda Staub & Alexandros Gekenidis
Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until
More informationInformation processing for new generation of clinical decision support systems
Information processing for new generation of clinical decision support systems Thomas Mazzocco tma@cs.stir.ac.uk COSIPRA lab - School of Natural Sciences University of Stirling, Scotland (UK) 2nd SPLab
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
More informationData mining for prediction
Data mining for prediction Prof. Gianluca Bontempi Département d Informatique Faculté de Sciences ULB Université Libre de Bruxelles email: gbonte@ulb.ac.be Outline Extracting knowledge from observations.
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationAnalysis of the colorectal tumor microenvironment using integrative bioinformatic tools
MLECNIK Bernhard & BINDEA Gabriela Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools INSERM U872, Jérôme Galon Team15: Integrative Cancer Immunology Cordeliers Research
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationChapter 7. Feature Selection. 7.1 Introduction
Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature
More information0 value2. 3. Assign labels to the proteins in each interval of the ranked list
Ranked list of protein degrees in decreasing order 0 100 Proteins with few connections Proteins with many connections 1. Sort a random number between 80 and 98 (value1) 0 value1 2. Sort a random number
More informationOctober 31, 2014. The effect of price on youth alcohol. consumption in Canada. Stephenson Strobel & Evelyn Forget. Introduction. Data and Methodology
October 31, 2014 Why should we care? Deaths 0 500 1000 1500 2000 2500 2000 2002 2004 2006 2008 2010 2012 Year Accidents (unintentional injuries) Intentional self-harm (suicide) Other causes of death Assault
More informationaffyplm: Fitting Probe Level Models
affyplm: Fitting Probe Level Models Ben Bolstad bmb@bmbolstad.com http://bmbolstad.com April 16, 2015 Contents 1 Introduction 2 2 Fitting Probe Level Models 2 2.1 What is a Probe Level Model and What is
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationThe NCPE has issued a recommendation regarding the use of pertuzumab for this indication. The NCPE does not recommend reimbursement of pertuzumab.
Cost Effectiveness of Pertuzumab (Perjeta ) in Combination with Trastuzumab and Docetaxel in Adults with HER2-Positive Metastatic or Locally Recurrent Unresectable Breast Cancer Who Have Not Received Previous
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationCluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
More informationGuided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.
Guided Reading Educational Research: Competencies for Analysis and Applications 9th Edition EDFS 635: Educational Research Chapter 1: Introduction to Educational Research 1. List and briefly describe the
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More informationStrategies for Identifying Students at Risk for USMLE Step 1 Failure
Vol. 42, No. 2 105 Medical Student Education Strategies for Identifying Students at Risk for USMLE Step 1 Failure Jira Coumarbatch, MD; Leah Robinson, EdS; Ronald Thomas, PhD; Patrick D. Bridge, PhD Background
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationCapturing Best Practice for Microarray Gene Expression Data Analysis
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro KDnuggets gps@kdnuggets.com Tom Khabaza SPSS tkhabaza@spss.com Sridhar Ramaswamy MIT / Whitehead Institute
More informationLearning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
More informationPackage ROC632. February 19, 2015
Type Package Package ROC632 February 19, 2015 Title Construction of diagnostic or prognostic scoring system and internal validation of its discriminative capacities based on ROC curve and 0.633+ boostrap
More informationHistogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm
Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm Markus Goldstein and Andreas Dengel German Research Center for Artificial Intelligence (DFKI), Trippstadter Str. 122,
More informationIdentifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
More informationCredit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationLecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
More informationTime series experiments
Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design
More informationAn Application of Weibull Analysis to Determine Failure Rates in Automotive Components
An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationA General Framework for Weighted Gene Co-expression Network Analysis
Please cite: Statistical Applications in Genetics and Molecular Biology (2005). A General Framework for Weighted Gene Co-expression Network Analysis Bin Zhang and Steve Horvath Departments of Human Genetics
More informationDesign and Analysis of Phase III Clinical Trials
Cancer Biostatistics Center, Biostatistics Shared Resource, Vanderbilt University School of Medicine June 19, 2008 Outline 1 Phases of Clinical Trials 2 3 4 5 6 Phase I Trials: Safety, Dosage Range, and
More informationROOT: A data mining tool from CERN What can actuaries do with it?
ROOT: A data mining tool from CERN What can actuaries do with it? Ravi Kumar, Senior Manager, Deloitte Consulting LLP Lucas Lau, Senior Consultant, Deloitte Consulting LLP Southern California Casualty
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationEasily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationIdentification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation
Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation Bering Limited Report generated on September 19, 2014 Contents 1 Dataset summary 2 1.1 Project
More informationPredictive Modeling with JMP Genomics
Predictive Modeling with JMP Genomics Page 1 Introduction Predictive modeling or biomarker discovery with modern genomics data is a popular activity and is a critical component in delivering on the promise
More information2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number
application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationRandomized trials versus observational studies
Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James
More informationMethods for Meta-analysis in Medical Research
Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University
More information# For usage of the functions, it is necessary to install the "survival" and the "penalized" package.
###################################################################### ### R-script for the manuscript ### ### ### ### Survival models with preclustered ### ### gene groups as covariates ### ### ### ###
More informationProtein Prospector and Ways of Calculating Expectation Values
Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More information