Contributed EMA Symposium Awards Young Statisticians Posters

Size: px
Start display at page:

Download "Contributed EMA Symposium Awards Young Statisticians Posters"

Transcription

1

2

3 8

4 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Conference Courses SC Joint Modelling of Longitudinal and Time to Event Data Mon :5-7:5 Lecture Hall KR 7 Instructor(s): Rizopoulos D. Erasmus MC, Rotterdam, Netherlands In follow-up studies different types of outcomes are typically collected for each subject. These include longitudinally measured responses (e.g., biomarkers), and the time until an event of interest occurs (e.g., death, dropout). Often these outcomes are separately analyzed, but in many occasions it is of scientific interest to study their association. This type of research question has given rise in the class of joint models for longitudinal and time-to-event data. These models constitute an attractive paradigm for the analysis of follow-up data that is mainly applicable in two settings: First, when focus is on a survival outcome and we wish to account for the effect of endogenous time-dependents covariates measured with error, and second, when focus is on the longitudinal outcome and we wish to correct for non-random dropout. This course is aimed at applied researchers and graduate students, and will provide a comprehensive introduction into this modeling framework. We will explain when these models should be used in practice, which are the key assumptions behind them, and how they can be utilized to extract relevant information from the data. Emphasis is given on applications, and after the end of the course participants will be able to define appropriate joint models to answer their questions of interest. Necessary background for the course: This course assumes knowledge of basic statistical concepts, such as standard statistical inference using maximum likelihood, and regression models. In addition, basic knowledge of R would be beneficial but is not required. SC2 Bayesian Biopharmaceutical Applications Using SAS Mon :5-2:45 Lecture Hall KR 8 Instructor(s): Chen F., Liu F. 2 SAS Institute, United States, 2 Merck, Sharp & Dohme Corp., United States This two-part, half-day tutorial first introduces the general-purpose simulation MCMC procedure in SAS, then presents a number of pharma-related data analysis examples and case studies in detail. The objective is to equip attendees with useful Bayesian computational tools through a series of worked-out examples drawn from situations often encountered in the pharmaceutical industry. The MCMC procedure is a general-purpose Markov chain Monte Carlo simulation tool designed to fit a wide range of Bayesian models, including linear and nonlinear models, multilevel hierarchical models, models with a nonstandard likelihood function or prior distributions, and missing data problems. The first part of the tutorial briefly introduces PROC MCMC and demonstrates its use with simple applications, such as Monte Carlo simulation, regression models, and random-effects models. The second part of the tutorial takes a topic-driven approach to explore a number of case studies in the pharmaceutical field. Topics include posterior predictions, use of historical information, hierarchical modeling, analysis of missing data, and topics in Bayesian design and simulation. This tutorial is intended for statisticians who are interested in Bayesian computation. Attendees should have a basic understanding of Bayesian methods (the tutorial does not cover basic concepts of Bayesian inference) and experience using the SAS language. The tutorial is based on SAS/STAT 4.. SC3 Interval-Censored Time-to-event Data: Methods and Applications Mon :45-7:5 Lecture Hall KR 8 Instructor(s): Chen D., Sun J. 2 University of North Carolina at Chapel Hill, United States, 2 University of Missouri, United States This tutorial provides a thorough presentation of statistical analyses of interval-censored failure time data with detailed illustrations using real data arising from clinical studies and biopharmaceutical applications. Specifically, we will start with some basic review of commonly used concepts and the problems of common interest to practitioners. Commonly used statistical procedures will then be discussed and illustrated as well as some recent development in the literature. In addition, some available software functions and packages in R and SAS for the problems considered will be discussed and illustrated. The specific topics to be discussed include: Biases inherent in the common practice of imputing interval-censored time-to-event data Nonparametric estimation of a survival function: Three basic and commonly used procedures will be discussed for nonparametric estimation of a survival function along with their comparison. Nonparametric treatment comparisons: We will start with generalized log-rank tests and then introduce several other recently developed nonparametric test procedures. A couple of R packages are available for the problem considered and will be discussed. Semiparametric regression analysis: For this part, we will first introduce several commonly used regression models including the proportional hazards model and the linear transformation model. The corresponding inference procedures are then introduced and illustrated using read data. Analysis of multivariate interval-censored failure time data: This part will discuss nonparametric estimation of joint survival functions and regression analysis of multivariate interval-censored failure time data. For the former, the focus will be on bivariate failure time data. 2

5 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed SC4 Statistical Evaluation of Surrogate Endpoints Mon :45-7:5 Lecture Hall KR 9 Instructor(s): Burzykowski T. Hasselt University, Belgium The tutorial will focus on the use of surrogate endpoints in drug development. Efficacy of new drugs is assessed using clinical endpoints. Often, the most clinically relevant endpoint is difficult to use in a trial. This happens if the measurement of this clinical endpoint requires, for instance, a large sample size (because of low incidence of the event of interest) or a long follow-up time. A potential strategy in these cases is to look for a surrogate endpoint or a surrogate (bio)marker that can be measured more cheaply, more conveniently, more frequently, or earlier than the clinical endpoint. From a regulatory perspective, an endpoint is considered acceptable for efficacy determination only after its establishment as a valid indicator of clinical benefit, i.e., after its evaluation as a surrogate endpoint for the clinical endpoint of interest. In the tutorial we will formalize the concept of surrogate endpoints and present the main issues related to their application. The major part of the tutorial will be devoted to a review of the statistical methods of evaluation of surrogate endpoints. In particular, we will focus on the so-called metaanalytic approach; however, other approaches will be briefly mentioned as well. All discussed methods and concepts will be illustrated by using real-life examples from clinical oncology. SC5 Evaluating Therapies in Rare Diseases Tue :30-8:00 Lecture Hall KR 2 Instructor(s): Day S., Senn S. 2 CTCT LDT, United Kingdom, 2 Luxembourg Institute of Health, Strassen, Luxembourg A programme for developing therapies for common diseases will typically involve dozens of trials and thousand of patients. Such an approach cannot work for rare diseases, where a conventional drug development programme might require the recruitment of all patients suffering from the disease over several decades and would be completely unreaslistic. Thus, in many cases, another model altogether is needed. This course will consider possible statistical solutions to the challenges that studying treatments for rare diseases raise. Amongst the matters that will be covered are: approporiate standards of evidence, alternative clinical trial designs, exploiting covariate information and using non-interevential studies. Real examples from the presenters own experiences will be used throughout. The course will be given by two experienced statisticians, well known for their thought-provoking writings on statistics in drug development but also for their attitude to planning and analysis: the way to judge the value of a statistical method is to ask if it helps to find useful treatments for patients. SC6 An Introduction to Confirmatory Adaptive Designs Fri :45-6:5 Lecture Hall KR 9 Instructor(s): Brannath W., Wassmer G. 2 University Bremen, Bremen, Germany, 2 Medical Universiy of Vienna, Austria Confirmatory adaptive designs are a generalization of group sequential designs. With these designs, interim analyses are performed in order to stop the trial prematurely under control of the Type I error rate. In adaptive designs, it is additionally permissible to perform a data-driven change of relevant aspects of the study design at interim stages. This includes, for example, a sample-size reassessment, a treatment-arm selection or a selection of a pre-specified sub-population. This adaptive methodology was introduced in the 990s (Bauer et al, 206). Since then, it has become popular and the object of intense discussion and still represents a rapidly growing field of statistical research. This shortcourse provides an introduction to the confirmatory adaptive design methodology. We start with a short introduction to group sequential methods. This is necessary for understanding and applying the adaptive design methodology supplied in the second part of the course. Essentially, the combination testing principle and the conditional Type I error approach are described in detail. We consider designing and planning issues as well as methods for analyzing an adaptively planned trial. This includes estimation methods and methods for the determination of an overall p-value. An overview of software for group sequential and confirmatory adaptive designs is supplemented. Literature Bauer, P., Bretz, F., Dragalin, V., König, F., Wassmer, G.: 25 years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine 206, 35: Wassmer, G., Brannath, W.: Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer Science and Business Media,

6 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Keynote Sessions K Opening and Keynote: John P. A. Ioannidis Tue :5-:00 Lecture Hall HS Chair(s): Martin Posch K. Opening We welcome all delegates of CEN-ISBS Vienna 207 by short addresses given by: Markus Müller, Rector of the Medical University of Vienna Brian Cullis, Representative of the International Biometric Society (IBS) Tim Friede, President of the German Region of IBS and SPC co-chair Tomasz Burzykowsky, Representative of the Polish Region of IBS and SPC co-chair Jie Chen, President of the International Society for Biopharmaceutical Statistics (ISBS) and SPC co-chair Georg Heinze, Chair of the Local Organizing Committee The session will be moderated by Martin Posch, President of the Austro-Swiss Region (ROeS) of IBS and SPC chair. K.2 Conceptual and statistical issues on reproducibility Ioannidis J.P.A. Stanford University, Stanford, United States There is a lot of interest and debate about reproducible research and reproducibility. These terms mean different things in different contexts, scientific fields and disciplines. The keynote will discuss different aspects of reproducibility of methods, results, and inferences from a conceptual and statistical viewpoint, along with evidence on reproducibility of these three types across diverse fields. Suggestions for standardizing practices of reproducibility will also be discussed. K2 Keynote: Ulrich Dirnagl Wed :30-09:25 Lecture Hall HS Chair(s): Tim Friede K2. Statisticians to the rescue: a humble stroke researchers proposal how to improve the quality of preclinical biomedicine Dirnagl U.,2 Berlin Institute of Health, Center for Transforming Biomedical Research, Berlin, Germany, 2 Charite Universitätsmedizin Berlin, Department of Experimental Neurology, Berlin, Germany More than 90% of scientists believe that we are in the midst of a 'crisis'. Apparently, findings across biomedicine are unreliable and the rate of successful translation of spectacularly effective therapies in animal models into treatments benefiting patients is disappointingly low. In the preclinical realm, low internal and external validity, low statistical power, as well as undisclosed 'flexibility' and 'selectivity' in experimental design, analysis and reporting appear to have led to an excessive number of false positive results and inflated effect sizes. Misconceptions and misinterpretations of statistics and the results of its tests have been exposed and scolded by professionals for many decades. These abuses of statistics have not only remained rampant, but may even have increased substantially. Potential reasons for this trend include an exponential increase of studies over time and increased complexity of methodology and data, potentiated by perverse incentives and rewards in academic biomedicine. An additional problem may have been the progressive retreat of statisticians from proactive participation in the biomedical research process to acting as 'postmortem examiners' of experiments (RA Fisher). Due to the current introspection and soul-searching among scientists the crisis provides a great opportunity to return from this partially selfimposed exile. I propose that statisticians re-engage and scale-up still existing activities. In my talk I will provide examples elaborating on why I think that statisticians can play a critical role in overcoming the crisis by improving the quality of research. Specifically, they could provide ) new approaches to teach the very basic concepts of statistics (and not how to run tests) to students, postdocs and PIs, 2) scale up their conceptual contribution to and counseling of projects, submissions, journals, and funders, 3) as well as develop and popularize novel and more effective ways to design and analyze studies and to aggregate evidence. K3 Keynote: Alison Smith Thu :30-09:25 Lecture Hall HS Chair(s): Hans-Peter Piepho K3. Design Tableau: an aid to specifying the linear mixed model for a comparative experiment Smith A., Cullis B. University of Wollongong, National Institute for Applied Statistics Research Australia, Wollongong, Australia The design and analysis of comparative experiments has changed dramatically in recent times. There has been a move away from "text-book" designs towards complex, non-orthogonal designs. At the same time, analysis of variance (ANOVA) techniques have been 4

7 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed superseded by the use of linear mixed models (LMM). The latter have the advantage of accommodating non-orthogonality and of allowing more complex variance models, which may be beneficial for improving the efficiency of treatment comparisons or for providing a more plausible structure. However, this flexibility has come at a cost, since in the transition from ANOVA to LMM, some of the key principles in the analysis of comparative experiments are often neglected. In order to address this we have developed a simple but general approach for specifying the LMM for a comparative experiment. In doing so, we defer to the seminal work of John Nelder and Rosemary Bailey and extend it to encompass multi-environment trials and multi-phase experiments. K4 Keynote: Stijn Vansteelandt and Closing Fri :20-5:25 Lecture Hall HS Chair(s): Tomasz Burzykowski K4. Inferring causal pathways from data: challenges and some solutions Vansteelandt S.,2 Ghent University, Gent, Belgium, 2 London School of Hygiene & Tropical Medicine, London, United Kingdom The question what is the mechanism whereby an exposure affects an outcome is key to many fields of science, most notably medicine, epidemiology, psychology and sociology. A proliferation of so-called mediation analysis techniques has therefore developed since the early 980 s with the aim to infer mechanistic pathways from data, using the linear structural equation model as a backbone. Here, each pathway describes how exposure affects an intermediate variable or mediator, which then in turn affects the outcome. Developments in causal inference, pioneered by Robins, Greenland and Pearl, have revealed major limitations of these techniques. In this talk, I will discuss these limitations and provide an introduction to causal mediation analysis, which has mostly developed over the past 0 years. The majority of these techniques has focused on simple applications involving a single mediator, in view of difficulties of handling multiple mediators. However, investigators are often interested in multiple pathways. Moreover, even when a single pathway is of interest, recognising multiple mediators is usually necessary for a valid analysis because of confounding induced by other mediators. I will address this concern by discussing recent developments on multiple and longitudinal mediators, and illustrate these by applications from clinical trials, complex community interventions and genetic association studies. Invited Sessions I0 Inference with Multiple Objectives Tue :30-6:00 Lecture Hall HS Chair(s): Frank Bretz I0. Design, data monitoring and analysis of clinical trials with co-primary endpoints (invited) Hamasaki T., Evans S.R. 2, Asakura K. National Cerebral and Cardiovascular Center, Suita, Japan, 2 Harvard T.H. Chan School of Public Health, Boston, United States Use of co-primary endpoints in clinical trials is increasingly common, especially in medical product development, where indications include Alzheimer disease, migraine, Duchenne and Becker muscular dystrophy, and so on. "Co-primary" means that a trial is designed to evaluate whether a test intervention has an effect on all of the primary endpoints. Failure to demonstrate an effect on any single endpoint implies that the beneficial effect to the control intervention cannot be concluded. In many such trials, the sample size is often unnecessarily large and impractical.to overcome the issue, recently many authors have discussed the approach to the design and analysis of co-primary endpoints trials in fixed-sample (size) and group-sequential designs. In this presentation, we provide an overview of the design, data monitoring, and analyses of clinical trials with multiple co-primary endpoints. We review recently developed methods for fixed-sample and group-sequential settings. We discuss practical considerations and provide guidance for the application of these methods. I0.2 A new omnibus test for the global null hypothesis Zehetmayer S., Futschik A. 2 Medical University of Vienna, Center for Medical Statistics, Informatics, and Intelligent Systems, Vienna, Austria, 2 Johannes Kepler University Linz, Department of Applied Statistics, Linz, Austria Global hypothesis tests are an important tool in the context of, e.g, clinical trials, genetic studies or meta analyses, when researchers are not interested in testing individual hypotheses, but in testing whether none of the hypotheses is false. There are several possibilities how to test the global null hypothesis when the individual null hypotheses are independent. If it is assumed that many of the individual null hypotheses are false, combinations tests (e.g, Fisher or Stouffer test), which combine data from several endpoints to a single test statistic, have been recommended to maximise power. If, however, it is assumed that only one or a few null hypotheses are false, global tests based on individual test statistics are more powerful (e.g., Bonferroni or Simes test). However, usually there is no a-priori knowledge on the number of false individual null hypotheses. We therefore propose an omnibus test based on the combination of p-values. We show that this test yields an impressive overall performance. I0.3 Change-point approach to multiple testing Hlávka Z., Hušková M. Charles University, Deparment of Statistics, Prague, Czech Republic We propose two-sample gradual change analysis and show that it is more powerful than usual multiple testing techniques. We start by investigating a real data set: maximum jumping speeds observed for 432 girls and 364 boys between 6 and 9 years. Two- 5

8 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed sample t-tests suggest that jumping speeds for boys and girls are about the same from 6 to 0 years and that boys jumping speeds are clearly higher from 3 years on (the p-value is in the age category 2--3 years). Applying standard multiple testing corrections, we observe statistically significant difference after 3 years. In order to circumvent the need for multiple testing, we investigate this data set from the point of view of change-point analysis. By estimating a single change-point estimator (instead of looking at thirteen independent two-sample t-tests), we are able to detect statistically significant difference already at.26 years. In order to obtain correct p-values and confidence intervals, we investigate the asymptotic distribution of the least-squares gradual changepoint estimator. In homoscedastic setup, we derive asymptotic normality. Assuming heteroscedasticity, the distribution of the change-point estimator is approximated by wild bootstrap. In this way, two-sample change-point analysis allows to replace many independent t-tests by a single one-sided confidence interval or by corresponding one-sided tests. We discuss also interpretation and some practical issues such as bias caused by binning of the underlying continuous variables. I0.4 Clinical trial optimization approaches to Phase III trials with multiple objectives Paux G., Dmitrienko A. 2 Institut de Recherches Internationales Servier, Suresnes, France, 2 Mediana Inc., Kansas City, United States Confronted with the increasing cost, duration and failure rate of new drug development programs, the use of innovative trial designs and analysis strategies has considerably increased over the past decade. More specifically, in modern drug development, clinical trial sponsors are often interested in assessing multiple clinical objectives. Due to this multiplicity of tests, the probability of erroneously claiming the effectiveness of a new drug, i.e. the Type I error rate, will be inflated and must be controlled to support reliable statistical inferences. In recent decades, several innovative approaches for addressing multiplicity issues in clinical trials have been developed. However, as in general no closed-form expression exists to calculate the power of the trial when such procedures are used, simulation-based methods are essential to calibrate the trial design (e.g. sample size) and the analysis strategy (e.g. choice of the multiple testing procedure). In 207, the FDA released its draft guidance on Multiple Endpoints in Clinical Trials which emphasized the use of clinical trial simulations to determine appropriate sample size to ensure that the study is adequately powered. In this presentation we will introduce key principles of clinical trial optimization in the context of Phase III trials with multiplicity issues to arrive at the optimal selection of design and analysis parameters. The Clinical Scenario Evaluation framework will be used to facilitate a quantitative assessment of the trial's performance. Additionally, key features of the Mediana R package will be presented. This R package has been developed to provide a standardized approach to clinical trial simulations to facilitate a systematic simulation-based assessment of trial designs and analysis methods in clinical trials or across development programs. I02 Prediction Models for High Dimensional Data Wed :30-3:00 Lecture Hall HS 4 Chair(s): Tomasz Burzykowski I02. Prediction and interpretation: two statistical tasks? (invited) Houiwing-Duistermaat J., Tissier R. 2, Rodriguez Girondo M. 3 University of Leeds, Department of Statistics, Leeds, United Kingdom, 2 Leiden University, Leiden, Netherlands, 3 Leiden University Medical Center, Leiden, Netherlands Nowadays in many studies several omics datasets are available for analysis. These omics datasets are used for building prediction models. Because of the presence of correlation between the omics variables and the large number of variables, regularized regression techniques are typically used for this purpose. However stacking of omics datasets with different sizes, scales, structures and measurement errors might not be optimal. Another drawback of these methods is that the results might be hard to interpret by biologists and epidemiologists especially when the variables are correlated as is the case in most omics datasets. Our goal is to build models which have a good predictive ability and are also biological interpretable. We propose a three-step approach: namely ) network construction per omics dataset, 2) clustering to empirically derive modules within omics and across omics datasets, and 3) building a prediction model, where we use the information on the modules. For the first step we use two commonly methods namely one based on weighted correlation and one based on Gaussian graphical modeling (partial correlation). Identification of modules (groups) of features is performed by hierarchical clustering. To incorporate the grouping information in a prediction model we conduct group-based variable selection with group-specific penalization parameters. We compare the performance of our new approaches with standard regularized regression (LASSO and Ridge) approaches via simulations. To qualify performance we use cross-validated calibration measures and variable selection properties. Finally our approaches are applied to two different studies with omics datasets. Namely metabolomics and gene expression available in an epidemiological study with Body Mass Index as outcome variable and CNV and gene expression available in cell lines with drug response as outcome variable. The prediction tasks differ because the correlation between metabolomics and gene expression is relatively small, while it is large between CNV and gene expression. I02.2 Added predictive value of omics data depends on the clinical model Volkmann A., De Bin R. 2, Sauerbrei W. 3, Boulesteix A.-L. LMU München, Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, München, Germany, 2 University of Oslo, Department of Mathematics, Oslo, Norway, 3 University of Freiburg, Institut für Medizinische Biometrie und Statistik, Freiburg, Germany It has come to general acceptance that omics data can be very informative in survival modeling and prediction. They may improve the prognostic ability even in medical areas where established and well validated clinical models already exist. If, however, clinical information is not taken into consideration when building a model, often gene signatures that are highly correlated with clinical variables are selected. Recent research has therefore focused on integrating both omics and clinical sources of data, yet often neglecting the need for appropriate model building for clinical predictors. We explicitly want to investigate the added predictive value of gene expression data for clinical models of varying complexity when building and validating prediction rules of the survival of breast cancer patients. We first construct several prediction models using varying amount of the clinical information. These models are then used as a starting point (i.e. included as a clinical offset) for identifying possible informative gene expression predictors using resampling procedures and penalized regression approaches with a clinical offset. In order to assess the added predictive value of the selected gene signatures, several measures of prediction accuracy and separation are examined on a validation subset for both the clinical models and the models that combine the two sources of information. We expect the omics data to improve the predictive ability of simpler clinical models but less so for more elaborate approaches of modeling clinical information. 6

9 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I02.3 Penalized regression for clinical outcome prediction based on several blocks of clinical and omics data Boulesteix A.-L., De Bin R. 2, Klau S., Sauerbrei W. 3 Ludwig-Maximilians-University of Munich, Munich, Germany, 2 University of Oslo, Oslo, Norway, 3 University of Freiburg, Freiburg, Germany In this talk we will discuss several penalized regression approaches for the prediction of clinical outcomes (e.g., survival time or response to therapy) based on several blocks of variables including classical clinical variables and various types of "omics" data (e.g., transcriptomic, methylation or copy number data). The available methods differ not only in the type of penalization they apply to the variables (such as L or L2 penalization) but also in the way they treat the different data blocks. After a brief review of the available methods in the considered context, we will draw particular attention to the following aspects, which we focused on in our recent research: ) the integration of clinical data in omics-based prediction rules (including the "offset" and the "favoring" strategies), 2) the integration of multiple (possibly highdimensional) data blocks using the "IPF-LASSO" approach consisting in applying different penalty parameters to the different blocks, 3) the integration of multiple (possibly high-dimensional) data blocks under consideration of priorities specified by the medical scientist using the new hierarchical method "prioritylasso" and 4) practical issues related to the applicability of prediction rules in independent studies. The methods will be illustrated through applications to data from The Cancer Genome Atlas. I02.4 Who s afraid of Bayesian non-collapsibility? Anti-shrinkage from penalization Heinze G., Geroldinger A., Greenland S. 2 Medical University of Vienna, Section for Clinical Biometrics, CeMSIIS, Wien, Austria, 2 University of California, Department of Epidemiology and Department of Statistics, Los Angeles, United States In logistic regression analyses, Bayesian non-collapsibility can arise if the posterior mode of a regression coefficient does not fall between the prior mode and the maximum likelihood estimate, and can be viewed as a version of Simpson's paradox (Greenland, Am Stat 200). It is well known that penalized likelihood techniques, which are popular shrinkage methods, can be interpreted as Bayesian methods with the likelihood penalty interpreted as prior distribution. Thus, the possibility of Bayesian non-collapsibility implies that penalization may sometimes lead to undesired anti-shrinkage. We investigate Bayesian non-collapsibility arising in odds-ratio estimation from tabular data and multivariable logistic regression with different penalties and priors, including the Firth penalty (Jeffreys prior), the symmetric log-f (data augmentation) prior, and the ridge penalty (normal prior). We give real-data examples and discuss implications of the problem from both Bayesian and frequentist viewpoints. I0 Simulation Based Modelling and Prediction Tue :30-3:00 Lecture Hall HS Chair(s): Valentin Rousson I0. Dionysus: a bootstrapped version of the prometheus wildland fire growth model (invited) Braun J. UBC, Department of Statistics, Kelowna, Canada The Prometheus Fire Growth Model is a deterministic wildfire simulator used to predict the growth of a wildifre, in Canada and other countries. Given weather, topographical and fuel information, the simulated fire front is plotted at equally spaced times. Unpredictability of fire behaviour makes deterministic predictions inaccurate. This talk will briefly describe a risk analysis study undertaken using Prometheus applied to random weather streams, pointing out limitations with that approach and motivating an alternative viewpoint, based on bootstrapping. By statistically modelling the data to which the Prometheus model equations are fit, it is possible to obtain a distribution of fire front predictions. This approach allows us to estimate the probability that a growing fire will eventually burn a particular location. Repeated stochastic simulation is not required, so probability contours require no more computing time than deterministic contours. We conclude with a discussion of the implications for improved fire risk analysis. I0.2 Distributional and quantile regression for quality control in fetal weight estimation Mayr A.,2, Schmid M. 3, Faschingbauer F. 4 Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2 Ludwig-Maximilians-University of Munich, München, Germany, 3 Universität Bonn, Bonn, Germany, 4 Universitätsklinikum Erlangen, Erlangen, Germany The estimated birth weight of the fetus is an important predictive parameter for neonatal morbidity and mortality. The estimates are based on the last sonography before birth and incorporate linear models for the different measured biometric parameters. Measurement errors are, however, inevitable and should therefore be subject to statistical analysis. We propose two approaches to analyse measurement errors based on advanced statistical modelling. The first incorporates distributional regression and aims to model systematic bias and random error simultaneously via generalized additive models for location, scale and shape (GAMLSS). These type of models have gained a lot of attention recently, however, have not been adapted so far to directly evaluate measurement errors. The second approach focuses on quantile regression to evaluate the distribution of z-scores. The advantage of this setting is, that we do not need the true values but only reference curves to assess the accuracy of measurements. All proposed models are illustrated with quality control in sonographic weight estimation, analysing the e ffect of the examiner and his experience on the accuracy. I0.3 Are published complex prediction rules currently applicable for readers? A survey of applied random forest literature and recommendations Boulesteix A.-L., Janitza S. 2, Hornung R., Probst P., Busen H. 3, Hapfelmeier A. 4 University of Munich, Department of Medical Informatics, Biometry and Epidemiology, Munich, Germany, 2 Robert Koch Institute, Berlin, Germany, 3 Helmholtz Zentrum Munich, Institute of Computational Biology, Munich, Germany, 4 Technical University of Munich, Institute of Medical Statistics and Epidemiology, Munich, Germany Ideally, biomedical prediction rules should be published in such a way that readers may apply them to make predictions for their own data. While this task is straightforward for prediction rules derived by classical methods such as logistic regression, warranting applicability of 7

10 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed complex prediction rules derived by machine learning tools is usually much more difficult. We conducted a structured survey of articles (PLOS ONE: ) reporting prediction rules that were obtained using the random forest algorithm with the aim to identify issues related to the applicability of the prediction rules presented in these articles. Application was possible for only two of the thirty identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. In the other cases, various problems, such as non-response of the authors, impeded the application of the prediction rules. Based on our experiences from this survey, we formulate a set of recommendations for authors publishing complex prediction rules to ensure their applicability for readers. I0.4 Statistical considerations in prediction: the role of "Predicting Observables" Heyse J. Merck Research Laboratories, Biostatistics and Research Decision Sciences, North Wales, United States Estimation and hypothesis testing of parameters has been the mainstay of modern statistical inference directed at evaluating medical interventions. However, there is a growing interest on applications of prediction, especially in applications of precision medicine. Basing statistical inference on "observables" offers many advantages in these applications. There is a more direct connection with a specific decision or objective for the analysis; there is a specific rationale and basis for model development; and more meaningful capability for model validation including comparing analytical models. Both Bayesian and non-bayesian methods are available and can be utilized. This presentation will highlight the growing interest in prediction and describe predicting observables as a useful statistical framework. Two examples will be used to illustrate the main points: () an application of health economics for predicting health care costs, and (2) a disease classification problem. The presentation will finish with concluding remarks about the relevance to precision medicine and machine learning. I2 Analysis of High Dimensional Genomic Data Wed :30-3:00 Lecture Hall HS Chair(s): Malgorzata Graczyk I2. Statistical analysis for RNA-seq data: bringing results to scientists (invited) Zyprych-Walczak J. Poznan University of Life Sciences, Department of Mathematical and Statistical Methods, Poznan, Poland RNA-Seq uses the capabilities of next-generation sequencing (NGS) technologies to measure the presence of sequences transcribed from all the genes in organism simultaneously. Those measurements can be used to estimate the differential expression between the groups of biological samples or for detection of novel transcripts and isoforms of genes. There is a number of statistical and computational methods that can tackle the analysis and management of the massive and complex datasets produced by the sequencers. The analysis of RNA-seq data starts with primary analysis, which is most often normalization of the data. In this talk, I will discuss how we re using R language to automate gene expression analysis and provide scientists with the tools to analyze and understand their data. Next, I will focus on a comprehensive comparison of different normalization methods and their impact on the results of gene expression analysis. I will show that primary analysis has profound effect on the results of the analysis and I will provide suggestions on possible good practices that can make the RNA-seq data analysis closer to the "biological truth" that it attempts to find. I2.3 Methods for dealing with missing covariate data in epigenome-wide association studies Mills H., Heron J., Suderman M., Relton C., Tilling K. University of Bristol, MRC Integrative Epidemiology Unit, Bristol, United Kingdom Multiple imputation (MI) is a well-established method for dealing with missing data. When using MI, all analysed variables must be included in the imputation model - failure to do so will bias associations towards the null. Consequently, MI is computationally intensive for high dimensional datasets, e.g. in epigenetic epidemiology (450,000+ variables). Often, analyses on such data are reduced to complete cases (CC) only, limiting power and potentially introducing bias. Here we test multiple MI methods for efficiency and accuracy on high dimensional data: imputing separately for each variable; using subsets of variables identified by a CC; imputing in groups of variables (bins). We use complete epigenetic datasets and simulate missingness. The best methods were then applied to epigenetic data from a cohort study with missingness in some covariates. In the simulation study, all imputation methods had increased power over the CC analysis. Imputing the missing covariate separately for each variable was computationally inefficient, but "binning" variables at random into evenly sized groups improved efficiency and lowered bias. Methods imputing solely using subsets of variables identified by the CC were biased towards the null. However, if these subsets were included in every group in the random bins, then bias was reduced. All methods applied to the cohort study identified additional associations over the CC analysis. The best methods were those which binned variables: these reduced the number of imputations while keeping bias low. These results are also applicable to many other high dimensional datasets, including the rapidly-expanding area of 'omics studies. I2.4 Detecting eqtls in high-dimensional sequencing data Kammers K., Taub M.A. 2, Matthias R.A., Leek J.T. 2 Johns Hopkins University School of Medicine, Baltimore, United States, 2 Johns Hopkins Bloomberg School of Public Health, Baltimore, United States The goal of an eqtl analysis is to detect patterns of transcript or gene expression related to specific genetic variants. In this talk we present our recently developed analysis protocol for performing extensive eqtl analyses - from raw RNA-sequencing reads and genotype data to eqtl plots showing gene-snp interactions. We explain in detail how expression and genotype data are filtered, transformed, and batch corrected. We also discuss possible pitfalls and artifacts that may occur when analyzing genomic data from different sources jointly. Our protocol is tested on a publicly available data set of the RNA-sequencing project from the geuvadis consortium and applied to recently generated omics data from the GeneSTAR project at Johns Hopkins. One goal of this project is to understand the biology of platelet aggregation. Therefore, we examined genetic and transcriptomic data from megakaryocytes (MKs), the precursor cells for anucleate platelets, that are derived from induced pluripotent stem cells (ipscs). Given a high genetic and transcriptomic integrity of MKs, we found several thousand cis-eqtls in European Americans and African Americans and see a high replication between the two groups. Approximately 30% of the cis-eqtls are unique to MKs compared to other tissue types that are reported in GTEx Portal. 8

11 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I3I5I6 Evidence Synthesis and Individualized Inference: Fusion Learning, Network Meta- Analysis and Bayesian Inference Thu :30-3:00 Lecture Hall HS 3 Chair(s): Arne Bathke, Leonhard Held, Hans-Peter Piepho Combining and synthesizing results from different studies often renders overall inference more powerful. This session deals with modern variations and extensions of meta-analysis. Among these are network meta-analysis, which combines studies with multiple, possibly different treatments, and i-fusion, a fusion learning approach for individualized inference, based on the concept of confidence distributions. I3I5I6. Network meta-analysis for evidence synthesis in the plant and agricultural sciences (invited) Madden L. Ohio State University, Plant Pathology, Wooster, United States Meta-analysis, the methodology for analyzing the results from multiple studies, has grown tremendously in popularity since being first proposed by Smith and Glass in 977. Although most meta-analyses involve a single effect size from each study (e.g., mean difference for two treatments), there are often multiple treatments of interest across the network of studies. Multi-treatment or network meta-analysis (NMA) can be used for simultaneously analyzing the results from all the treatments simultaneously. With this approach, correlations of treatment effects are automatically taken into account, and more studies may be included in the analysis because individual studies need not contain all of the treatments of interest. NMA can be based on contrasts with a baseline treatment from each study or directly on treatment arms from each study, with the estimation of contrasts performed after the model fit. The contrast-based approach is probably more popular, overall, thanks to the statistical work and advocacy by Lu, Ades, and colleagues. Piepho, Williams and Madden showed that the results are very similar for contrast- and arm-based methods, and equivalent under some circumstances, if the appropriate mixed model is chosen. Arm-based methods are much easier to perform with standard mixed-model software, and are straight-forward to expand for incorporation of effects of study-level covariates on the response variable. In the plant sciences, arm-based NMA is most common. The most extensive use of NMA has been in the estimation of the effects of chemical treatments (fungicides) in controlling the most economically important disease of wheat in the world, Fusarium head blight. There are now over 300 studies in the database, with over 25 different treatments. We demonstrate the mixed-model arm-based NMA for this dataset, and introduce the use of a natural cubic spline to determine if treatment effects are stable over the 9 years of the study results. I3I5I6.2 i-fusion: Efficient fusion learning for individualized inference from diverse data sources (invited) Liu R. Rutgers University, Statistics & Biostatistics, New Jersey, United States Inferences from different databases or studies can often be fused together to yield a more powerful overall inference than individual studies alone. Fusion learning refers to effective approaches for such synergizing learnings from different data sources. Effective fusion learning is in great demand, especially in dealing with the ubiquitous massive automatic data collection nowadays. Decision-making processes in many domains such as medicine, life science, social studies, etc. can all benefit from fusion learning from different sources, possibly even with varying forms of complexity and heterogeneity in data structure. This talk presents some new fusion approaches for extracting and merging useful information. Particular focus is the i-fusion method, which a novel learning by individual-to-clique approach to fuse information from relevant entities to make inference for the target individual entity. Drawing inference from a clique allows "borrowing strength" from similar entities to enhance the inference efficiency for each individual. The i-fusion method is flexible, computationally efficient, and can be scaled up to search through massive databases. The key tool underlying those fusion approaches is the so-called "confidence distribution" (CD), which, simply put, is a versatile distributional inferential scheme (unlike the usual point or interval inferences) without priors. Time permits, applications of the i-fusion method in financial modeling, star formation in galaxies, precision medicine and weather forecast will also be discussed. Acknowledgements: This is joint work with John Kolassa, Jieli Shen and Minge Xie, Rutgers University. I3I5I6.3 Bayesian inference from multiple sources to inform infectious disease health policy (invited) De Angelis D. University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom Health-related policy decision-making for epidemic control is increasingly based on the use of models that realistically approximate the processes of interest and, crucially, incorporate all available information. Assimilation of information from a variety of heterogeneous, incomplete and biased sources poses a number of problems. We describe how a Bayesian approach to such evidence synthesis can accommodate all information in a single coherent probabilistic model and give examples to illustrate current challenges in this area. 9

12 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I7 Group Sequential Trials Thu :30-6:00 Lecture Hall HS 2 Chair(s): Christopher Jennison I7. Optimal group sequential tests for delayed responses with non-binding futility boundaries (invited) Hampson L.V.,2, Jennison C. 3 AstraZeneca, Statistical Innovation, Advanced Analytics Centre, Cambridge, United Kingdom, 2 Lancaster University, Mathematics & Statistics, Lancaster, United Kingdom, 3 University of Bath, Mathematical Sciences, Bath, United Kingdom Group sequential tests propose monitoring data as they accumulate from a clinical trial rather than waiting until the end of a study to make a final decision. One-sided group sequential tests are usually designed assuming that recommendations to stop a trial early, for efficacy or lack of benefit, will always be adhered to. Unplanned deviations from a futility stopping rule will then cause inflation of the type I error rate. However, a sponsor may wish to have the flexibility to continue a trial even after a futility boundary has been crossed in order to gather additional data on safety or key secondary endpoints. In this presentation we formulate group sequential designs for delayed responses with non-binding futility rules. Delayed responses are common in practice since clinical interest often lies in testing the long-term effects of a new treatment on patients' health. We seek designs which control the type II error rate when early stopping recommendations are always followed, and control the type I error rate when futility boundaries are always disregarded. We use dynamic programming to find optimal versions of designs satisfying the stated frequentist error rate constraints as solutions to Bayes decision problems. Properties of optimal rules are used as a benchmark to understand the relative efficiencies of competing designs specifying futility boundaries on the basis of predictive power or error spending functions, for example. I7.2 Sample size (re-)estimation for count data allowing for covariates Zapf A., Friede T. University Medical Center Goettingen, Department of Medical Statistics, Goettingen, Germany In randomized clinical trials it is standard to include important prognostic variables as covariates in the primary analysis []. In sample size calculations, however, these covariates are often ignored. This is likely to lead to studies sizes that are larger than necessary which could be considered unethical [2]. In this talk we introduce an approach for sample size planning considering covariates for count data, which is based on the likelihood ratio test [3]. In a simulation study the approach is compared with the Wald test and with the likelihood ratio test without covariates [4]. Furthermore, the approaches are applied to two examples studies. For the scenario without covariates the likelihood ratio and the Wald approach lead to practically the same sample size. By considering covariates the sample size can be substantially reduced with the likelihood ratio approach, depending from the strength of the covariate effect and the amount of dispersion. Beside the gain of power a further advantage of the proposed approach is that a blinded sample size re-estimation is feasible. References: [] ICH E9 Expert Working Group (999). Stat Med; 8: [2] Kahan and Morris (202). BMJ; 345:e5840. [3] Lyles, Lin, and Williamson (2007). Stat Med; 26: [4] Friede and Schmidli (200). Stat Med; 29: Funding: DFG - FR 3070/- I7.3 Addressing statistical challenges in post-market drug and vaccine safety surveillance using electronic health records Nelson J.,2, Cook A.,2, Yu O., Wellman R., Jackson L. 3 Kaiser Permanente Washington Health Research Institute, Biostatistics Unit, Seattle, United States, 2 University of Washington, Department of Biostatistics, Seattle, United States, 3 Kaiser Permanente Washington Health Research Institute, Seattle, United States Gaps in post-market drug and vaccine safety evidence have spurred the development of new national systems in the United States that prospectively monitor large observational cohorts of health plan enrollees. These multi-site efforts include the Centers for Disease Control and Prevention s (CDC's) Vaccine Safety Datalink (VSD) and the Food and Drug Administration s (FDA s) Sentinel Initiative. These systems attempt to leverage the vast amount of administrative and clinical information that is captured during the course of routine medical care by health care delivery systems and health insurance plans. One method that has been applied in this context to rapidly detect increases in adverse event risk after the introduction of a new vaccine or drug using these electronic health record data is sequential testing. However, many challenges arise when adapting clinical trial-based sequential methods to this observational database setting. We will give an overview of the VSD and Sentinel Initiative and the role of statisticians in these efforts. We will illustrate some of the design and analysis complications that arise in this setting using example data from a VSD safety study. These include confounding, rare adverse event outcomes, an inability to pool individual level data across health plan sites due to privacy and proprietary concerns, and regular updating of the health care data over time by health plans during the course of safety monitoring. Last, we will describe a group sequential approach designed to address these challenges. I7.4 Planning and analysis of group sequential designs for clinical trials with negative binomial outcomes Mütze T., Glimm E. 2, Schmidli H. 2, Friede T.,3 Universitätsmedizin Göttingen, Institut für Medizinische Statistik, Göttingen, Germany, 2 Novartis Pharma AG, Statistical Methodology, Basel, Switzerland, 3 DZHK (German Centre for Cardiovascular Research), Partner Site Göttingen, Göttingen, Germany Count data in clinical trials such as the number of hospitalizations in heart failure trials or the number of relapses or MRI lesions in multiple sclerosis trials are often modeled by negative binomial distributions. In this presentation we study planning and analyzing clinical trials with group sequential designs for negative binomial outcomes. We propose a group sequential testing procedure for negative binomial data based on Wald-type statistics using maximum-likelihood estimators. The proposed group sequential test follows at least asymptotically the canonical joint distribution and hence has the independent increment structure. We apply this result for planning group sequential designs based on the canonical joint distribution to the negative binomial distribution with particular emphasis on determining the sample size from the maximum information. The information level for the 0

13 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed negative binomial distribution depends not only on the sample size and the parameters of the negative binomial distribution, but also on the patient specific exposure time. The finite sample size properties of the proposed group sequential test and the methods for planning the respective clinical trial are assessed in a simulation study with scenarios motivated by clinical trials in chronic heart failure and multiple sclerosis. We conclude that the proposed group sequential tests can be applied to practical relevant scenarios. The statistical methods studied in this talk are implemented in an R package which will be discussed in this presentation, too. I9 Advances in Causal Inference Thu :30-8:00 Lecture Hall HS 5 Chair(s): Ronja Foraita I9. Causal discovery wih cohort data (invited) Didelez V., Foraita R. Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany In many fields, especially epidemiology or sociology, it is very common to conduct cohort studies. Data is typically collected at regular time intervals, where measurements on participants are taken and / or questionnaires used to gain information for instance on events in between time points. Such studies are a valuable resource for researchers to study life-course developments, especially regarding the relation between earlier exposures or behaviours on later outcomes. Typically, analyses of cohort data have the ultimate aim to inform public health policies or medical decision making, i.e. they aim at causal results or conclusions. Mostly, traditional regression based approaches are employed to this end. The purpose of this talk is to present a new and very different avenue: using methods of causal discovery so as to fully exploit the potential wealth of information provided by cohort data. We outline a formal probabilistic graph-based framework for causal discovery with cohort data, suited to the particular challenges posed, such as the time structure, mixed variable types, missingness etc. The proposed framework will include the definition and characterisation of a suitable class of causal graph models, with and without assuming causal sufficiency. Further, we investigate some of their properties, especially the corresponding equivalence classes and discuss how existing search algorithms may efficiently be adapted to this case. Finally we discuss the pros and cons, as well as caveats for causal discovery approaches in the context of an existing cohort study. I9.2 Propensity score matching based on automatic variable selection for data with unmeasured confounders and complex correlation structures Zöller D., Wockner L.F., Binder H. University Medical Center Mainz, Mainz, Germany When investigating late effects of cancer, data from cancer registries provide an established way for contacting a representative sample of cancer survivors. Yet, reference data from a healthy population are required to judge the health status. While general population reference values are available, comparisons to individual data allow for a fine-grade matching with respect to a larger number of confounders. Such an approach can be performed with the help of propensity scores combining several characteristics into a score used for matching. However, multivariable model building for propensity score matching is challenging. The regression model for the propensity score may not only require variable selection, but it is still unclear to what extent effects on the exposure and the outcome should be required as a selection criterion. Unmeasured confounders, complex correlation structures, and non-normal covariate distributions further complicate matters. We consider the performance of different modeling strategies in a simulation design with complex but realistic data with effects on a binary outcome. Of the two main investigated strategies for variable selection, one focusses solely on the exposure, and one requires association with both the exposure and the outcome. As a result, the strategies will be compared with respect to bias in estimated marginal exposure effects and increase in variance. When investigating the effect of unmeasured confounders on both, we distinguish between three types of covariates that might be missing (generally or strictly replaceable and bystander), and suggest tools based on resampling for potentially identifying the type of unmeasured confounder at hand. I9.3 A taxonomy of covariate selection strategies for causal inference from observational data Witte J., Didelez V. Leibniz Institute for Prevention Research and Epidemiology - BIPS GmbH, Bremen, Germany When causal effects are to be estimated from observational data, it is crucial to adjust appropriately for confounding; otherwise, the estimated effect may reflect an association only but not a causal relationship. However, it is often not clear which covariates among a huge set of available ones are required for adjustment. A broad variety of covariate selection strategies, with sometimes very different objectives, have been suggested. To name three approaches of many, there are inclusion rules based on subject matter knowledge only [], those based on causal discovery algorithms [2] and propensity-score-based methods for coping with high-dimensional data [3]. In this work, I compare and classify covariate selection methods in a taxonomy-like approach. Primary classification criteria are the assumptions underlying each method (e.g. no residual confounding and parametric modelling assumptions) and the kind of adjustment set each method would find under 'ideal' conditions (e.g. minimal sufficient adjustment set or adjustment set maximising estimation efficiency). This is supplemented with results from a simulation study comparing the above ideal case with the finite sample performance for different methods. Results so far apply to the low-dimensional setting (p < < n); they demonstrate that while all methods rely on some untestable assumptions that can only be justified with subject matter expertise, there is a clear trade-off between restrictiveness of these assumptions and statistical performance or computational effort. I conclude with an outlook on the question how different methods perform in the highdimensional case (p > n). [] VanderWeele TJ, Shpitser I, Biometrics 67(4), (20). [2] Entner D, Hoyer PO, Spirtes P, in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 3 (Scottsdale, AZ, 203), pp [3] Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA, Epidemiology 20(4), (2009).

14 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I9.4 Survival bias in Mendelian randomisation studies - can frailty models help? Strohmaier S.,2, Stensrud M.J. 2,3, Aalen O.O. 2 Harvard Medical School, Channing Division of Network Medicine, Boston, United States, 2 University of Oslo, Oslo Center for Biostatistics and Epidemiology, Oslo, Norway, 3 Diakonhjemmet Sykehus, Oslo, Norway Mendelian randomisation (MR) holds the potential to reveal causal relations, and the method is gaining popularity in epidemiology. The technique is appealing because it allows us to estimate causal effects in the presence of unmeasured confounders and inverse causation. Still, obtaining unbiased estimates from MR analyses is not trivial. More specifically, MR studies may face bias if they condition on survival until a time point after conception. Intuitively, the bias arises because genes are randomised at conception, but follow-up starts in adult life. From conception until the start of follow-up, individuals may have been lost, e.g. due to death, leading to selection bias. We suggest a convenient method to assess that bias based on frailty models. More particularly, we illustrate the ideas by means of a real life example, aiming to more accurately assess the effect of LDL cholesterol on all-cause mortality in a MR study among the elderly. Using published data, we could approximate the parameters in the frailty models and we found that an observed hazard ratio of.23 would imply a causal hazard ratio larger than.5 in four different models. Hence, our analysis provides stronger support to the argument that LDL cholesterol is causally associated with mortality, even in the elderly. In general, our analysis suggests that conventional MR techniques sometimes may systematically bias the causal effect estimates. This may e.g. challenge the claim that a null result from MR is robust evidence of no causal effect. Invited / Topic-Contributed Sessions I03/TCS007 Evidence Synthesis and Meta-Analysis Thu :30-6:00 Lecture Hall HS 3 Chair(s): Tim Friede This session covers different methods for Bayesian evidence synthesis and their application to clinical research. I03/TCS007. Bayesian evidence synthesis for robust prediction and extrapolation (invited) Schmidli H. Novartis Pharma AG, Statistical Methodology, Basel, Switzerland Prediction and extrapolation are important tasks in clinical research. Examples include extrapolation of treatment effects from adults to children and from exploratory to confirmatory trials. This requires a synthesis of the available evidence, and a model to link the parameters of the historical and the new data. For a robust extrapolation, one has to take into account the possibility that the historical information may not be relevant. A Bayesian approach is taken here, extending methods developed for use of historical controls (Schmidli et al., 204). References: Schmidli H, Gsteiger S, Roychoudhury S, O Hagan A, Spiegelhalter D, Neuenschwander B. Robust meta-analyticpredictive priors in clinical trials with historical control information. Biometrics (4): I03/TCS007.2 Bayesian meta-analysis with few studies Röver C. University Medical Center Göttingen, Department of Medical Statistics, Göttingen, Germany In evidence synthesis contexts, one may often not be able to rely on large-sample asymptotics, and so the careful formulation of models and priors is especially important (Friede et al, 207). We investigate the potential of Bayesian methods for meta-analysis applications, with a focus on the combination of only few studies. We show how model specification may be used to balance prior dependence and robustness, and how the simple random-effects model may be utilized for extrapolation. References: Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of few small studies in orphan diseases. Research Synthesis Methods, 8():79-9, 207. I03/TCS007.3 Power priors based on multiple historical studies Held L., Gravestock I. University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much recent discussion. For example, in the context of clinical trials of antibiotics for drug resistant infections, where patients with specific infections can be difficult to recruit, there is often only limited and heterogenous information available from the historical trials. To make the best use of the combined information at hand, we consider an approach based on the multiple power prior which allows the prior weight of each historical study to be chosen adaptively by empirical Bayes. This estimate has advantages in that it varies commensurably with differences in the historical and current data and can choose weights near if the data from the corresponding historical study are similar enough to the data from the current study. Fully Bayesian approaches are also considered. An analysis of the operating characteristics in a binomial setting shows that the proposed adaptive method work well, compared to several alternative approaches, including the meta-analytic prior. 2

15 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I03/TCS007.4 A Bayesian hierarchical framework for evidence synthesis for a single randomized controlled trial and observational data in small populations Unkel S., Röver C., Friede T. University Medical Center Göttingen, Department of Medical Statistics, Göttingen, Germany Well powered double-blind randomized controlled trials (RCTs) are the gold standard design of clinical research to assess therapeutic interventions. Although there are scenarios in which one phase III trial with exceptionally compelling and clinically relevant results is sufficient to demonstrate efficacy and safety for marketing authorization, usually two independent confirmatory trials are conducted to provide the variety of data needed to confirm the usefulness of an intervention in the intended population. However, in small populations the conduct of even a single RCT with a sufficient sample size might be extremely difficult or not feasible. In this talk, we consider the scenario of a single RCT comparing an experimental treatment to a control in a small patient population. Inspired by an ongoing paediatric trial in Alport syndrome, we consider a study design in which information external to the randomized comparison, such as data arising from disease registries, are integrated into the design and analysis of an RCT in different ways. Based on the study design, statistical models for binary data are built. A Bayesian hierarchical framework is introduced for generalised evidence synthesis in order to estimate the quantities of interest. The performance of the proposed methods are evaluated under different scenarios by means of experiments. This research has received funding from the EU s 7th Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH with project title (acronym) "Innovative methodology for small populations research" (InSPiRe). I04/TCS057 Non-Parametric Tests in Non-Standard Situations Thu :30-:00 Lecture Hall HS Organizer(s): Martin Posch Chair(s): Markus Pauly Non-parametric exact testing procedures control the type I error rate under minimal distributional assumptions and have similar efficiency compared to parametric testing procedures. In this session we explore the limits of exact and non-parametric testing and present approaches to obtain valid and powerful testing procedures in complex settings such as group sequential and adaptive designs, multiple testing, survival analysis and combinations thereof. I04/TCS057. Validity of re-randomization tests in non-standard settings: A review (invited) Proschan M. National Institute of Allergy and Infectious Diseases, Biostatistics Research Branch, Bethesda, United States Re-randomization tests fix data at their observed values and construct a null distribution by re-randomizing treatment labels according to the original scheme (simple randomization, permuted blocks, response-adaptive randomization, etc.). The key assumption is that the conditional distribution of treatment labels is unchanged by observed data. Having looked at the data does not preclude us from constructing a valid test. In fact, under the key assumption, we can look at the data and change the test statistic according to what we see. The distribution of the treatment labels remains unchanged, so we can still construct a valid test of the strong null hypothesis that observed data are independent of treatment labels. This seems very exciting because it shows that re-randomization tests can be used even when unplanned changes are made. Nevertheless, a thorough understanding of the implications and limitations of the hypothesis being tested may mitigate initial excitement. This talk reviews the potential and limitations of re-randomization tests in non-standard settings such as response-adaptive randomization and unplanned changes. I04/TCS057.2 Non- and semiparametric inference methods for data with multiple endpoints Bathke A. University of Salzburg, Mathematics, Salzburg, Austria When there are several endpoints and different predictors, researchers typically want to find out which predictors are relevant, and for which endpoints. We present two rather general approaches trying to accomplish these goals, accommodating binary, ordinal, and metric endpoints, and different nominal factors. We also try to address the question of how well the proposed methods actually accomplish their goals. I04/TCS057.3 Optimizing exact two-stage designs for single-arm trials with binary outcome under uncertainty Kunzmann K., Kieser M. University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany Single arm designs with binary endpoint are an important tool in early clinical oncology. A variety of both group-sequential as well as adaptive two-stage designs exist, which are optimal under different assumptions about the alternative. In this talk, we illustrate the fundamental problem of incorporating the inherent uncertainty about the true response rate in early clinical development of new treatments into the choice of design. A Bayesian view on identifying optimal adaptive designs under uncertainty is discussed based on existing approaches to score the performance of such designs. We present the optimal designs under a commonly used and intuitive score function by Liu et al. Some limitations of this score in the specific context of early phase two trials are highlighted. These findings can be used to define a more suitable score function based on trial-utility considerations. This approach provides valuable theoretical insights but may be difficult to implement in practice due to the need for a comprehensive definition of trial-utility. We therefore conclude with the discussion of a practical approach to optimizing designs under uncertainty which allows the incorporation of the main insights gained from the utility-based perspective while avoiding most of the detailed assumptions usually required for its implementation. 3

16 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I04/TCS057.4 Group-sequential permutation tests for time-to-event data Brückner M., König F. 2, Posch M. 2 Lancaster University, Lancaster, United Kingdom, 2 Medical University of Vienna, Vienna, Austria For small sample sizes the type I error of the asymptotic log-rank test and related linear rank tests for time-to-event data may be inflated. In such cases permutation tests are a popular non-parametric alternative for exact inference. Exact group-sequential permutation tests for general linear rank tests have been developed. However, these permutation tests are only valid when the underlying censoring distributions are equal in both groups. To overcome this problem, approximate permutation tests using imputation methods have been developed for fixed sample size trials, which remain valid even when the censoring distributions differ between the treatment groups. We adapt these imputation-permutation tests to the group-sequential setting. The proposed method for calculation of group-sequential stopping boundaries does not rely on the independent increments property of the test statistic, as opposed to methods based on asymptotic normal approximation. We compare the methods with the asymptotic log-rank test in a simulation study and in an application to a prostate cancer data set. We find that the empirical type I error of the new methods is close to the nominal level in all scenarios and slightly conservative with an increasing number of interim analyses. I05/TCS036 Innovative Trial Designs in Pharmaceutical Development Wed :30-8:00 Lecture Hall HS 4 Chair(s): Ivan S.F. Chan The overarching mission of the pharmaceutical industry is to develop medical products that can improve human health. Scientific discoveries and innovations in research and development are very critical in the success of developing new medical products such as drugs, vaccines, and medical devices. With recent advancement in statistical methodologies and computing power, many innovative clinical trial designs have been developed, resulting in acceleration of drug development. In this session, expert statisticians from academics and industry will present their research findings in several areas of innovative trial designs, including Bayesian methods for noninferiority trials, sample size reestimation, and adaptive design considerations from theory to practice. They will also discuss applications of these methods with clinical trial examples. I05/TCS036. Bayesian adaptive designs - from theory to practice (invited) Lee J.J. UT MD Anderson Cancer Center, Biostatistics, Houston, United States Clinical trial is a prescribed learning process for identifying safe and effective treatments. In recent years, rapid advancements in cancer biology, immunology, genomics, and treatment development demand innovative methods to identify better therapies for the most appropriate population in a timely, efficient, accurate, and cost-effective way. In my talk, I will first illustrate the concept of Bayesian update and Bayesian inference, a superior alternative to the traditional frequentist approach using Shiny applications we developed. Bayesian methods take the "learn as we go" approach and are innately suitable for clinical trials. Then, I will give an overview of Bayesian adaptive designs in the areas of adaptive dose finding, posterior and predictive probability calculations, outcome adaptive randomization, multi-arm platform design, and hierarchical modeling, etc. Real applications including BATTLE trials in lung cancer and I- SPY 2 trials in breast cancer will be given. Bayesian adaptive clinical trial designs increase the study efficiency, allow more flexible trial conduct, and treat more patients with more effective treatments in the trial but also possess desirable frequentist properties. Perspectives will be given on translating theory to practice to enhance the clinical trial success and speed up drug approval. Many useful software can be found at the followings two sites. I05/TCS036.2 Bayesian design of non-inferiority clinical trials via the Bayes factor Chen M.-H. University of Connecticut, Department of Statistics, Storrs, United States We developed a Bayes factor based approach for the design of non-inferiority clinical trials with a focus on controlling type I error and power. Historical data are incorporated in the Bayesian design via the power prior discussed in Ibrahim and Chen (2000). The properties of the proposed method are examined in detail. An efficient simulation-based computational algorithm is developed to calculate the Bayes factor, type I error and power. The proposed methodology is applied to the design of a non-inferiority medical device clinical trial. Acknowledgements: This is joint work with Wenqing Li, Xiaojing Wang, and Dipak K. Dey. I05/TCS036.3 Efficiency considerations for group sequential designs with adaptive unblinded sample size re-assessment Mehta C.,2 Cytel Inc, Biostatistics, Cambridge, United States, 2 Harvard School of Public Health, Biostatistics, Boston, United States Clinical trials with adaptive sample size re-assessment, based on an analysis of the unblinded interim results (ubssr), have gained in popularity due to uncertainty regarding the value of d at which to power the trial at the start of the study. While the statistical methodology for controlling the type- error of such designs is well established there remain concerns that conventional group sequential designs with no ubssr can accomplish the same goals with greater efficiency. The precise manner in which this efficiency comparison can be objectified has been difficult to quantify, however. In this paper we present a methodology for making this comparison in an objective manner. It is seen that under reasonable decision rules for increasing sample size there is little or no loss of efficiency for the adaptive designs in terms of unconditional power. The two approaches, however, have very different conditional power profiles. More generally a methodolgy has been provided for comparing any design with ubssr relative to a comparable group sequential design with no ubssr, so one can determine whether the efficiency loss, if any, of the ubssr design is offset by the advantages it confers for re-powering the study at the time of the interim analysis. 4

17 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I05/TCS036.4 Issues in late-stage clinical trial design for late-stage immuno-oncology trials Anderson K.M. Merck & Co., Late Development Statistics, North Wales, United States There are several potential concerns in late-stage clinical trials developing therapies focused on aiding the immune system to fight cancer. Potential subgroup differences driven by tumor and tumor environment biology can drive several concerns, including target population identification and treatment effects that result in crossing hazards or otherwise delayed treatment benefit. Immune therapies can provide benefit by improving safety outcomes without substantially affecting overall survival, making non-inferiority an issue of interest. Demonstrating that add-on therapies are effective can be costly if strict rules are adhered to requiring demonstration of contribution of each treatment component. These factors create a need for designs that can examine multiple hypotheses and proved strong Type I error control. Timing of analyses tends to be an important issue. Also, alternatives to current standards applying stratified logrank statistics and proportional hazards modeling will be discussed. I06/TCS03 Subgroups: Identification and Reliable Treatment Effect Estimation Thu :30-3:00 Lecture Hall HS Chair(s): Björn Bornkamp Discussant(s): Ilya Lipkovic I06/TCS03. Quantifying treatment effect variability and heterogeneity (invited) Brannath W., Lüschen A. University Bremen, Faculty 3 - Mathematics/Computer Sciences, Bremen, Germany The effect of treatments is well known to vary largely between individuals. However, in confirmatory clinical trials the standard statistical model assumes a constant treatment effect. This assumption is then investigated in the exploratory part of the statistical analysis by a series of subgroup analyses. However, due to random fluctuations in small subgroups, lack of power and multiple testing issues, the interpretation of subgroup analyses is difficult and still a controversial issue. Treatment effect heterogeneity describes how average treatment effects vary across baseline strata. By its focus on averages, it necessarily ignores treatment effect heterogeneity within a stratum. With the concept of subject-treatment interactions, it is possible to also account for heterogeneity within strata. This permits to quantify the variance of a generally random treatment effect. Unfortunately, in parallel group designs the estimation of treatment effect variance is accompanied with an unidentifiability issue that can only be resolved by making untestable assumptions. In this talk we will present a novel approach for the estimation of treatment effect variability that utilizes baseline information. The unidentifiability issue is resolved by the assumption that individual potential outcomes are stochastically independent for the given baseline information. We derive an estimator for treatment effect variance that is based only on a prediction model for the outcome in the control group. This permits a careful preplanning of the included baseline variables and other model features based on historical data. We finally show how this estimate of treatment effect variance can be used to asses treatment effect heterogeneity with regard to specific subgroups. I06/TCS03.2 Shrinkage estimation methods for subgroup analyses Riehl J., Fritsch A. 2, Ickstadt K. TU Dortmund University, Faculty of Statistics, Dortmund, Germany, 2 Bayer AG, Clinical Statistics, Wuppertal, Germany Subgroup analyses are commonly and increasingly performed in confirmatory clinical trials, where the treatment effect is estimated in subgroups of the overall trial population defined by certain patient characteristics. The appropriateness of subgroup-specific treatment effects is controversial because of multiplicity and small sample sizes within the subgroups. A useful alternative are estimators of subgroup effects which take the overall effect estimate into account. Shrinkage estimators belong to this class because they combine the overall effect estimate with the estimate within a given subgroup by using a Bayesian framework. In doing this, some form of prior distribution for the interaction effect is assumed. The presentation contains a short introduction to two shrinkage estimation approaches proposed by Dixon & Simon (99) and by Simon (2002), one with a non-informative and one with an informative prior. These methods have been defined for subgroup factors with two categories. We extend them to be applicable to factors with more than two categories and provide solutions for some computational issues. Moreover, the results of a simulation study in which the naïve and the shrinkage approaches are compared under different models and scenarios are presented. The main aim of the investigation was to detect in which cases the shrinkage estimators are superior to the common naïve estimator. I06/TCS03.3 Subgroup identification via the predicted individual treatment effect Ballarini N., Koenig F., Posch M., Rosenkranz G. Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Vienna, Austria Identifying subgroups of treatment responders through the different phases of clinical trials has the potential to increase success in drug development. Recent developments in subgroup analysis consider the case when the subgroups are defined in terms of the predicted individual treatment effect (PITE), i.e. the difference between the predicted outcome under treatment and the predicted outcome under control for each individual, which in turn may depend on multiple biomarkers. In this work, we study the properties of different modelling strategies to estimate the PITE. We explore classical linear models as well as statistical learning methods such as the Lasso (Tibshirani, 996), and its modifications. For the latter, estimation after model selection remains a challenge since estimators are typically biased. We implement confidence intervals based on the Selective Inference framework (Taylor and Tibshirani, 205) since a closed form of the variance of the PITE is not known for the Lasso. In the case of the Bayesian Lasso (Park and Casella, 2008), the empirical posterior distribution of the parameter estimates is used for constructing credible intervals. We also evaluate via simulations the performance of using the predicted individual treatment effect to identify subgroups with an expected benefit. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No

18 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I06/TCS03.4 Subgroup identification in dose-response studies via model-based recursive partitioning Thomas M., Bornkamp B., Seibold H. 2, Hothorn T. 2 Novartis Pharma AG, Basel, Switzerland, 2 University of Zurich, Zurich, Switzerland An important task in early phase drug development is to identify patients, which respond better or worse to an experimental treatment. While a variety of different subgroup identification methods have been developed for the situation of trials that study an experimental treatment and control, much less work has been done in the situation when patients were randomized to different dose groups. Model-based recursive partitioning can be used to detect instabilities over covariates in a parametric model and has recently been applied to subgroup identification for two-arm studies. We extend this approach to trials with several dose groups and show that the method can be used to identify subgroups of patients with different dose-response curves and improves estimation of treatment effects and minimum effective doses, when heterogeneity among patients is present. I08/TCS026 Decision Making in Early Clinical Development Tue :30-6:00 Lecture Hall HS 3 Organizer(s): Richardus Vonk Chair(s): Gerhard Nehmiz Statistical sciences are currently moving into the focus of early applied pharmaceutical research. The high costs and long duration of clinical development, paired with high levels of attrition, require the quantification of the risk when moving from early to late stage clinical development. In this session, statistical techniques that facilitate decision making in the transition phases around early clinical development are reviewed, and there implementation is described. I08/TCS026. Optimal designs for non compartmental analysis of pharmacokinetic parameters (invited) Barnett H., Jacobs T. 2, Geys H. 2, Jaki T. Lancaster University, Mathematics and Statistics, Lancaster, United Kingdom, 2 Janssen Pharmaceutica, Beerse, Belgium In traditional PK/PD trials, pharmacokinetics (PK) is investigated in the satellite group of animals, and the pharmacodynamics (PD) is investigated in the study group of animals. The new blood sampling method of microsampling opens up the opportunity to investigate both PK and PD in the same animals. To avoid excessive burden on the animals from the required blood sampling, sparse sampling schemes are typically utilized. Motivated by this application, this paper introduces a procedure to choose an optimal sparse sampling scheme and sampling time points using non-compartmental methods but which can be applied to further settings beyond this. We discuss how robust designs can be obtained and we apply and evaluate the approach to a range of scenarios to give an example of how it may be implemented. The results are compared to optimal designs for model based PK. I08/TCS026.2 Quantitative decision making around early clinical development - one step further Vonk R. Bayer AG, Research and Clinical Sciences Statistics, Berlin, Germany In addition to the regulatory requirements, statistics and statistical thinking are integral parts of the internal decision making processes, particularly in early clinical development. This presentation concentrates on innovative statistical methods in different areas of early drug development that facilitate quantitative rather than qualitative decision making. We describe applications of (Bayesian) statistical techniques to improve decision making in the different transition phases around early clinical development. We illustrate the use of Bayesian meta-analytic predictive (MAP) approaches in early clinical development, and elucidate our experience. Furthermore, we explain how we implement this new way of thinking into our organization. I08/TCS026.3 Decision criterion in drug development: Beyond statistical significance Grieve A. UCB, Centre of Excellence Statistical Innovation, Slough, United Kingdom The mandatory use of statistics in the drug approval process introduced by the FDA in the 960s was followed by four decades of the use of statistical significance as almost a sine qua non of marketing authorisation. The impact of this requirement on the whole drug development process in my view has been negative, with clinical researchers and sponsors relying on statistical significance to support their decision making in all phases. More recently there has been an increased interest in the use of decision criteria, above and beyond pure significance, to support decision making in early phase studies. We review these developments against the background of a greater use of Bayesian and/or predictive approaches in drug development. I08/TCS026.4 Establishing decision rules for an early signal of efficacy design strategy in clinical development Kirby S. Pfizer, Cambridge, United Kingdom An early signal of efficacy design strategy in clinical development is the use of a small clinical trial for the first trial of efficacy with three possible outcomes: acceleration of development of a compound; pausing of a compound before further staged investment to assess its potential; and killing a compound, i.e., stopping development of a compound. A key consideration is how decision rules should be set for individual trials to give desired portfolio level performance. We consider some portfolio level criteria and see which decision rules fulfil these criteria for conjectured prior distributions for the efficacy of a portfolio of new compounds. 6

19 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I09/TCS040 Biomarker Utilization in Innovative Clinical Designs Fri :30-0:00 Lecture Hall HS 3 Chair(s): Mark Chang Discussant(s): Zhaoyang Teng Traditional clinical development of a novel therapy utilizes the one-size-fits-all approach by testing treatment effect in the entire patient population with a specific disease. Precision medicine is an innovative approach that takes into account individual differences in people s genes, environments, and lifestyles. Unlike traditional approach, precision medicine proposes the customization of healthcare, with medical decisions, practices, and treatments being tailored to the individual patient. In this section, experts will discuss the utilization of biomarkers in precision medicine development, adaptive design with biomarkers, the opportunities, challenges, and recommendations, share their research outcomes and practical experience. I09/TCS040. Overview of study designs for personalized medicine (invited) Menon S.,2 Pfizer, Inc., Cambridge, United States, 2 Boston University, Boston, United States Personalized medicine is a relatively young but rapidly evolving field of clinical research. It involves identifying genetic, genomic, and clinical characteristics that have the potential to accurately predict patient's susceptibility of developing a certain disease and its response to treatment. Personalized medicine is the translation of this knowledge to patient care. However, this "translation" can be very challenging in the phase of limited knowledge of the biomarker and /or appropriate diagnostics. Hence, the appropriate selection of the study design is important to critically determine biomarker performance, reliability and eventually regulatory acceptance. This session will discuss various designs including adaptive designs available at our disposal and its merits and limitations. I09/TCS040.2 Biomarker-driven clinical trial designs in precision medicine Wang J. Pfizer, Inc., San Diego, United States Precision medicine has paved the way for a new era of delivering tailored treatment options to patients according to their biological profiles. In combination with innovative adaptive design, this has presented drug developers unprecedented opportunities to engage novel thinking to accelerate drug discovery. This presentation will cover both classical and adaptive designs with biomarkers. Design options for biomarkers with very strong credentials, strong credentials and weak credentials will be discussed. Related statistical theories and analysis strategies will also be covered with case studies. I09/TCS040.3 The selection and use of biomarker-driven trial designs: case studies Bliss R., Balser J., Chang M.,2 Veristat, Southborough, United States, 2 Boston University, Boston, United States Biomarker-driven clinical trials are gaining attention in drug and biological agent research and development. The use of biomarkerdesigns improves study efficiency by targeting specific populations and allows research and treatment to focus on individual patients rather than the average population. The choice of the best study design depends on the certainty of the biomarker as a predictive or prognostic factor, the expected difference in mechanism of action, the overall effect of the test product among biomarker positive and negative populations, and the prevalence of disease. Three case-studies of biomarker-designs will be presented. The first study was implemented by a pharmaceutical company investigating a novel treatment for a rare oncology. The investigators had observed earlier evidence of a biomarker subpopulation responding better to treatment than the overall population. The sponsor selected an adaptive enrichment study design including an interim analysis to evaluate the treatment effect in the biomarker positive subjects and the overall population midway through the study. At that interim analysis, predetermined decision rules were applied to select whether to continue the study with the overall population or with the biomarker positive subjects only. In the second case study, the study sponsor was unsure whether patients with visceral versus cutaneous expression of disease may better respond to treatment. The study sponsor investigated multiple study designs including: a classical design stratified by disease type, a stratified enrichment design with sample size re-estimation, and an adaptive enrichment design with prespecified population enrichment at an interim timepoint. The sponsor's selection of design will be discussed. In the third case study, the study sponsor had strong confidence of a prognostic biomarker but was unsure of the treatment performance in the complementary population. A stratified enrichment study design was selected, allowing comparisons between control and novel treatment in stratified subpopulations of the rare cancer. I/TCS030 Statistical Methods for Cardiovascular Outcome Studies Tue :30-6:00 Lecture Hall HS 2 Chair(s): Antje Jahn In cardiovascular outcome studies, there are often several different patient outcomes that are of clinical interest, e.g. death, myocardial infarction and/or hospital admissions. Although they usually differ in their clinical relevance (e.g. fatal vs non-fatal events), it is common practice to combine them as a composite endpoint and apply a time-to-first-event analysis (an approach that has been criticized as "moving within the comfort zone" ()). Motivated by the major shortcomings of this practice, there is an ongoing debate on more proper statistical approaches, including considerations of first vs recurrent event analysis, the modelling of associations between disease processes (joint frailty models), multiplicity adjustment for the different outcomes, and prioritization of more severe outcomes. Furthermore, attention must be paid to bias and the interpretability of effect estimates because - although RCTs are often suspected to produce unbiased results - hazard-based survival analysis can introduce its own bias. Recently, the Cardiovascular Round Table of the European Society of Cardiology discussed the use of traditional and new composite endpoints and identified the need for more insight into the strengths and weaknesses of different analysis approaches. This session will bring together experts from different areas, with a focus on both methodologies and applications. () Claggett B, Wie L,Pleffer M: Moving beyond our comfort zone. European Heart Journal 34: (203) 7

20 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed I/TCS030. The analysis of recurrent events and in the presence of informative censoring (invited) Rogers J. University of Oxford, Statistics, Oxford, United Kingdom Cardiovascular diseases are chronic illnesses characterised by a relatively long period before death during which multiple non-fatal events occur. But a comparison of heart failure hospitalisation rates can be confounded by the competing risk of death. An increase in heart failure hospitalisations is associated with an elevated risk of cardiovascular death, meaning that subjects may die during follow-up. Any analyses of recurrent events must take into consideration this informative censoring. The Ghosh and Lin non-parametric analysis of heart failure hospitalisations takes mortality into account whilst also adjusting for different follow-up times and multiple hospitalisations per patient. Another option is to treat the incidence of cardiovascular death as an additional event in the recurrent event process and then adopt the usual analysis strategies. An alternative approach is the use of joint modelling techniques to obtain estimates of treatment effects on heart failure hospitalisation rates, whilst allowing for informative censoring. One approach to joint modelling is random effects models, which assume that the recurrent hospitalisations and time-to-death are conditionally independent given a latent variable. Models of this kind are intuitively appealing as they can give a tangible interpretation that an individual's independent frailty term measures their underlying, unobserved severity of illness, which proportionately affects both their heart failure hospitalisation rate and their time-to-death (or CV death). Joint models allow distinct treatment effects to be estimated for each of the processes, whilst taking into account the association between the two. This talk shall outline the different methods available for analysing recurrent events in the presence of dependent censoring and the relative merits of each method shall be discussed. In addition, data from large scale clinical trials in cardiovascular disease shall be used to illustrate the application of these methods. I/TCS030.2 Better characterization of disease burden by using recurrent event endpoints Akacha M. Novartis Pharma AG, Basel, Switzerland Endpoints capturing recurrent event information can lead to interpretable measures of treatment effect that better reflect disease burden and are more efficient than traditional time-to-first-event endpoints in the sense that they use the available information beyond the first event. Recurrent event endpoints are well established in indications where recurrent events are clinically meaningful, treatments are expected to impact the first as well as subsequent events and where the rate of terminal events such as death is very low. Examples include seizures in epilepsy; relapses in multiple sclerosis; and exacerbations in pulmonary diseases such as chronic obstructive pulmonary disease. More recently recurrent event data endpoints have also been proposed in other indications where the rate of terminal events is high, e.g. chronic heart failure, but experience in this setting is limited. Different endpoints and measures of treatment effect - that is, different estimands - can be considered. Depending on the specific setting some estimands may be more appropriate than others. For example, accounting for the interplay between the recurrent event process and the terminal event process is important in indications where the rate of terminal events is high. The choice of estimands has direct impact on trial design, conduct and statistical analyses. In this talk we will discuss the value and possible limitations of using recurrent event estimands.h24 I/TCS030.3 Association between hospitalisation and mortality rates in heart failure trials: consequences for marginal treatment effect estimates Toenges G., Binder H., Jahn A. University Medical Center Mainz, Institute for Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz, Germany This work is motivated by clinical trials in chronic heart failure disease, where treatment/intervention has effects both on morbidity (assessed as recurrent non-fatal hospitalisations) and on mortality (assessed as cardiovascular death). Recently, a joint frailty proportional hazards model has been proposed for these kind of outcomes to investigate a potential association between the risk rates for hospital admissions and cardiovascular death (Rogers et al., 206). However, more often marginal treatment effect estimates are presented as the main efficacy outcome, which corresponds to a misspecification, assuming that both risk processes are (conditionally on the known covariates) unassociated. One example is the common practice to apply a Cox model for mortality and an Andersen-Gill model for the recurrent hospitalisations. We investigate the consequences of applying such misspecified marginal models on treatment effect estimates. By the use of Laplacetransformations we derive the marginal hazard ratios as a function of time. We identify those parameters that cause a violation of the proportional hazards assumption and thus affect the bias and its degree in hazard ratio estimation. We show results on the direction and degree of bias for different situations and relate these results to published clinical trials. In particular we also identify situations where hazard ratio estimates are still unbiased despite model misspecification. Analytical results are further supported by simulation studies. References: Rogers JK, Yaroshinsky A, Pocock SJ, Stokar D and Pogoda J. Analysis of recurrent events with an associated informative dropout time: Application of the joint frailty model. Statistics in Medicine 206; 35: I/TCS030.4 Time-to-first-event versus recurrent-event analysis - points to consider for selecting a meaningful analysis strategy in clinical trials with composite endpoints Rauch G.,2, Kieser M. 2, Binder H. 3, Bayes-Genis A. 4, Jahn-Eimermacher A. 3 University Medical Center Hamburg-Eppendorf, Institute of Medical Biometry and Epidemiology, Hamburg, Germany, 2 University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany, 3 University of Mainz, Mainz, Germany, 4 Haspital Universitari Germans Trias i Pijol, Barcelona, Spain Aims: Composite endpoints combining several event types of clinical interest often define the primary efficacy outcome in cardiologic trials. They are commonly evaluated as time-to-first-event, thereby following the recommendations of regulatory agencies. However, to assess the patient's full disease burden, subsequent events following the first should be considered as well. It has been shown in the literature that a recurrent event analysis might come to considerably different conclusions compared to a time-to-first-event analysis. Recently, the Cardiovascular Round Table of the European Society of Cardiology indicated the need to investigate "how to interpret results if recurrent event analysis results differ [?] from time-to-first-event analysis". This work addresses this topic. Methods and results: In many clinical trials, the focus is to evaluate the efficacy of a new intervention in reducing different kinds of cardiologic events. This paper systematically compares two common analysis strategies for composite endpoints differing with respect 8

21 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed to the incorporation of recurrent events. The comparison is based on a simulation study motivated by a clinical trial. We investigate why the treatment effects estimated from a time-to-first-event analysis (Cox model) and a recurrent-event analysis (Andersen-Gill model) can systematically differ, particularly in cardiovascular trials. Moreover, we provide guidance on how to interpret these results and recommend points to consider when defining a meaningful analysis strategy. Conclusion: When planning trials with a composite endpoint, researchers and regulatory agencies should be aware that the model choice affects the estimated treatment effect and its interpretation. I/TCS030.5 Multi state modelling of survival and repeated events in heart failure based on administrative data Gasperoni F., Ieva F., Barbati G. 2,3, Scagnetto A. 2, Iorio A. 3,4, Sinagra G. 5, Di Lenarda A. 3 Politecnico di Milano, MOX-Modelling and Scientific Computing, Department of Mathematics, Milano, Italy, 2 Università di Trieste, Department of Medical Sciences, Trieste, Italy, 3 Cardiovascular Center, Trieste, Italy, 4 Papa Giovanni XXIII Hospital, Cardiology Unit, Bergamo, Italy, 5 Azienda Sanitaria-Universitaria Integrata Trieste `ASUITS', Cardiovascular Department, Trieste, Italy We investigate how different risk profiles of Heart Failure (HF) patients may affect multiple readmission rates and final outcomes (death). Several models for predicting adverse outcomes have been developed in literature, but they are mainly focused on a single outcome. We propose the application of two different multi state models in real world setting to jointly evaluate the impact of different risk factors on multiple hospital admissions, on Integrated Home Care (IHC) activations, on Intermediate Care Unit (ICU) admissions and on death. The first model (Model ) concerns only hospitalizations as possible events in patients clinical history. In the second one (Model 2), we consider both hospitalizations and ICU admission and IHC activation. Through Model, we want to detect the determinants of repeated hospitalizations, while, through Model 2, we want to evaluate which patients profiles are associated with transitions in intermediate care with respect to repeated hospitalizations or death. Both models are characterized by transition specific covariates, adjusting for patient s risk factors. I4/TCS04 Scared of Rare Events? Small-Sample Bias, Unmeasured Confounding and Other Dangerous Things Thu :30-:00 Lecture Hall HS 2 Chair(s): Georg Heinze The analysis of studies with binary outcomes can be challenging if the outcome event probability is low or data on the predictors are sparse. Some of the problems incurred by such sparsity are bias due to small samples, bias due to model misspecification caused by unmeasured confounders, but also bias that is only due to rare outcome categories and which will persist even if small-sample and confounding bias can be corrected. The session will first present corrections for small-sample bias for the more general case of ordinal outcomes (Ioannis Kosmidis) and solutions for the issue of model misspecification due to unmeasured confounding (Michal Abrahamowicz). Later on, Rok Blagus will discuss the rare event bias which means that events and non-events are predicted with different accuracies depending on their relative frequencies, and will explain why this problem is independent from the two types of biases mentioned before. Finally, Jelle Goeman will look at the rare events problem from a more theoretical point of view can a prediction rule correctly estimate the probability of an event and at the same time supply equal sensitivity and specificity or equal positive and negative predictive values? The answer is surprisingly simple and has implications on what we can generally expect from prediction models for rare events. At the end of the session there will be a longer discussion. I4/TCS04. Reduced-bias estimation of models with ordinal responses (invited) Kosmidis I. University College London, Statistical Science, London, United Kingdom For the estimation of cumulative link models for ordinal data, the bias reducing adjusted score equations of Firth in 993 are obtained, whose solution ensures an estimator with smaller asymptotic bias than the maximum likelihood estimator. Their form suggests a parameter-dependent adjustment of the multinomial counts, which in turn suggests the solution of the adjusted score equations through iterated maximum likelihood fits on adjusted counts, greatly facilitating implementation. Like the maximum likelihood estimator, the reduced bias estimator is found to respect the invariance properties that make cumulative link models a good choice for the analysis of categorical data. Its additional finiteness and optimal frequentist properties, along with the adequate behaviour of related asymptotic inferential procedures, make the reduced bias estimator attractive as a default choice for practical applications. Furthermore, the estimator proposed enjoys certain shrinkage properties that are defensible from an experimental point of view relating to the nature of ordinal data. I4/TCS04.2 Rare events in pharmacoepidemiologic cohort studies: selected problems and possible solutions (invited) Abrahamowicz M., Burne R. 2 McGill University, Montreal, Canada, 2 Analysis Group, Montreal, Canada Pharmacoepidemiology aims at assessing adverse effects of medications. Serious adverse events are rare and are typically evaluated using large administrative databases, which lack information on important clinical and lifestyle confounders. Accordingly, unmeasured confounding is the Achilles' heel of pharmacoepidemiological research. Confounders not measured in the large database are sometimes available in clinical datasets, which, however, are too small to ensure adequate power and precision for analyzing rare events. We propose a novel method, for time-to-event analyses of pharmacoepidemiological cohort studies with rare events, that combines the strengths of large databases and smaller clinical 'validation samples' (VS) to account for rare events and unmeasured confounding. In the main database, we impute the unmeasured confounders using their relationships estimated in the VS with martingale residuals (MR) from the Cox model adjusted only for confounders measured in the main database. We also extend the method to marginal structural models (Cox MSM) with inverse probability of treatment weights. In simulations, the proposed MR-based imputation eliminated bias due to unmeasured confounders, and yielded lower mean squared errors (MSE) than four alternative methods, even in the case of rare events, with only 25 to 50 events observed in the VS. When re-assessing the association between DPP-4 inhibitors and hypoglycemia hospitalizations (< 3 events/00 person-years), MSM analyses of N=2,300 diabetic patients with measurements of a potential confounder (HbAc), yielded imprecise, inconclusive results (adjusted HR=0.87, 95% CI: ). In contrast, by using our MR-based 9

22 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed approach to impute unmeasured HbAc values in N>47,000 patients in an administrative database, we could demonstrate a significant protective effect of DPP-4 therapy (HR=0.74; 0.57 to 0.97). In conclusion, the proposed martingale residual imputation may enhance the accuracy of time-to-event analyses of pharmacoepidemiological cohort studies, by offering means to address simultaneously the challenges related to a combination of rare events and unmeasured confounding. I4/TCS04.3 Rare events bias of logistic regression Blagus R. University of Ljubljana, Institute for Biostatistics and Medical Informatics, Ljubljana, Slovenia Logistic regression is one of the most commonly used statistical methods to estimate prognostic models that relate a binary outcome (with levels event and non-event) to a number of explanatory variables. A low prevalence of events, encountered frequently in clinical or epidemiological studies, causes unequal treatment of events and non-events in terms of their respective predictive accuracies (rare events bias). It is well known that maximum likelihood estimates of the regression coefficients in the logistic regression model are biased (small sample bias) and it it known that the bias is amplified when the sample size and the proportion of events are smaller. We explain that the rare events bias is not a consequence of small sample bias which can explain why the bias corrected estimates, as for example Firth s bias correction, cannot remove the rare events bias. We provide an explanation of the rare events bias by using some simulated examples as well as some theoretic results. The rare events bias is explained for the maximum likelihood and penalized likelihood estimation using some common penalty functions. We also explain why the intuitive solution of weighting the samples amplifies the rare events bias while under-sampling the non-events is efficient in removing the rare events bias. I4/TCS04.4 What (not) to expect when classifying rare events Goeman J., Blagus R. 2 Leiden University Medical Center, Leiden, Netherlands, 2 University of Ljubljana, Ljubljana, Slovenia When building classifiers, it is natural to require that the classifier correctly estimates the event probability (constraint ), that it has equal sensitivity and specificity (constraint 2), or that it has equal positive and negative predictive values (constraint 3). We prove that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Such unbiasedness of events and non-events is much more difficult to achieve in the case of rare events, i.e. the situation in which the proportion of events is (much) smaller than 0.5. Here, we prove that it is impossible to meet all three constraints unless the classifier achieves perfect predictions. Any non-perfect classifier can only satisfy at most one constraint, and satisfying one constraint implies violating the other two constraints in a specific direction. Our results have implications for classifiers optimized using g-means or F -measure, which tend to satisfy constraint 2 and, respectively. Our results are derived from basic probability theory and illustrated with simulations based on some frequently used classifiers. I8/TCS046 Adaptive Enrichment Designs and Precision Medicine Wed :30-:00 Lecture Hall HS Organizer(s): Jie Chen Chair(s): Frank Liu It has been well recognized that responses of patients with certain disease to a therapy may differ substantially due to a variety of reasons. For instance, patients with a biomarker positive may benefit more from a new therapy than those who are biomarker negative; a new treatment may generate more favorable outcome among patients with higher risk or more severe disease than among those with lower risk or less severe disease. Adaptive enrichment designs are to select a study population in which detection of a drug effect, if any, is more likely than it would be in an unselected population. While maintaining the type I error at a desired level and increasing study power, enrichment is intended to maximize patient benefit towards precision therapy by decreasing heterogeneity of patient population, identifying a population who is at high risk and more likely to respond to the therapy. This session features some new developments in adaptive enrichment designs and related issues, including biomarker-guided adaptive designs (invited), enrichment designs with patient population augmentation, multiplicity issues in innovative trial designs with biomarkers and trial designs incorporating predictive biomarkers. I8/TCS046. Adaptive enrichment trials for biomarker-guided treatments (invited) Simon N. University of Washington, Department of Biostatistics, Seattle, United States The biomedical field has recently focused on developing targeted therapies, designed to be effective in only some subset of the population with a given disease. However, for many new treatments, characterizing this subset has been a challenge. Often, at the start of large-scale trials the subset is only rudimentarily understood. This leads practitioners to either ) run an all-comers trial without use of the biomarker or 2) use a poorly characterized biomarker that may miss parts of the true target population and potentially incorrectly indicate a drug from a successful trial. In this talk we will discuss a class of adaptive enrichment designs: clinical trial designs that allow the simultaneous construction and use of a biomarker, during an ongoing trial, to adaptively enrich the enrolled population. For poorly characterized biomarkers, these trials can significantly improve power while still controlling type one error. However there are additional challenges in this framework: How do we adapt our enrollment criteria in an "optimal" way? (what are we trying to optimize for?) How do we run a formal statistical test after updating our enrollment criteria? How do we estimate an unbiased treatment effect-size in our "selected population"? (combatting a potential selection bias) In this talk we will give an overview of a class of clinical trial designs and tools that address these questions. I8/TCS046.2 Enrichment design with patient population augmentation Yang B., Zhou Y. 2, Zhang L. 2, Cui L. 2 Vertex Pharmaceutical, Boston, United States, 2 AbbVie, North Chicago, United States The advancement in science (e.g., biomarkers from genomics, proteomics) has provided opportunities to identify patient subpopulations that may be more responsive to a treatment. Clinical trials can be enriched on such sub-populations to improve the 20

23 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed probability of success in demonstrating the benefit of new treatments. In 202 FDA issued a draft guidance document to facilitate trial design with an enrichment strategy. In this talk, we consider an enrichment trial design where efficacy in the enriched population and further in general patient population can be evaluated. Specifically, a new weighted test statistic is derived to assess the treatment effect in a general patient population under an enriched trial setting, coupled with novel design based on screening information for weight determination. The proposed design and analysis method enhance the probability of success compared with a traditional all-comer trial design. It allows a generalization of the enrichment trial result to an all-comer population despite the fact the enriched population is disproportionally distributed in the study. Sample size determination and data collection are discussed along with examples and simulations. I8/TCS046.3 Some Issus and solutions in biomarker-enrichment designs Chang M. Veristat LLC, Lexington, United States We will focus on three issues and solutions: () When a clinical trial is designed to test the drug effect for populations characterized using biomarkers, multiplicity issue must be deal with. A new powerful multiple testing method to deal multiplicity in biomarker trial with correlated statistics will be discussed. (2) When an enrichment design is used, the threshold to classify biomarker positive and negative is often difficult to determine beforehand due to lack of information at the time starting the trial. A proposed adaptive threshold design for an arthritis trial will be discussed. (3) When a hidden biomarker confounder was not identified and randomization does not ensure the balance between the treatment group, it will bias the treatment estimation. Since the hidden effect is not identified, cannot be measured, we cannot use ANCOVA to adjusted. To deal with such a "Ghost effect," an adaptive method-selection and EM algorithm will be discussed. I8/TCS046.4 Basket, umbrella, and platform trials: Definitions and statistical properties of master protocols for personalized medicine in oncology Renfro L. Mayo Clinic, Division of Biomedical Statistics and Informatics, Rochester, United States Within the field of cancer research, discovery of biomarkers and genetic mutations that are potentially predictive of treatment benefit are motivating a paradigm shift in how cancer clinical trials are conducted. In this review, we provide an overview of the class of trials known as "master protocols", including basket trials, umbrella trials, and platform trials. For each, we describe standardized terminology, provide a motivating example with modeling details and decision rules, and discuss statistical advantages and limitations. We conclude with a discussion of general statistical considerations and challenges encountered across these types of trials. Topic-Contributed Sessions TCS00 Contributions for a Better Teaching of Biometrical Topics Wed :30-6:00 Lecture Hall HS 5 Chair(s): Geraldine Rauch Many medical and biological studies comprise biometrical and statistical topics within their curricula. Whereas the students are often highly motivated in their primary study subject (human medicine, biology, etc ), they often show a certain prejudice against statistical topics. The biometrical lecturer is therefore confronted with the difficult task to find methods of teaching which activate the students and which inspire their interest in statistics. In 206, the working group Teaching and didactics of biometry of the German Region of the International Biometrical Society for the second time announced a call for the prize of the best teaching material in biometry. The aim was to collect ready-to-use material and new ideas to implement interesting and inspiring lectures for biometrical topics. All contributions will be published in a Springer-book like the proceedings of the first call which were also published within a book [Rauch et al. 204]. Within this session, three contributions of the current announcement will be presented. The three talks cover a variety of different topics and employed media for teaching. Join the session and hear about new shiny-apps for biometry and bioinformatics and about methods to teach a critical reflection of medical publications. Reference: G. Rauch, R. Muche, R. Vonthein (Herausgeber) (204): Zeig mir Biostatistik! Ideen und Material für einen guten Biometrie-Unterricht, ISBN: , Springer-Lehrbuch. TCS00. And nothing but the truth!..? Application of biometrical methods in everyday situations and assessment of health information Brensing J., Burkholder I. HTW Saar, University of Applied Science, Saarbrücken, Germany Nowadays, it is a common practice to search for health information by means of the internet, hence it is all the more important that users are able to subject their search results to critical analysis. In this talk a practice session of about two class hours will be presented which addresses to undergraduate students as well as to upper school students. It intends to discuss relevant aspects of clinical trials for daily situations and differences between a scientific study and its presentation in nonscientific internet articles. This practice could be conducted in two ways. In the first version, several internet articles citing an exemplary scientific study are handed out to student groups. Each group has to read one article and present the mentioned facts of it. Subsequently the abstract of the original article will be shown and differences are to be discussed. In the second version of the practice some exemplary nonscientific internet articles are given out to the student groups, which name some health relating facts. The student groups have to read and discuss the respecting article. Further they have to work out possible 2

24 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed study designs to evaluate the claimed fact, respecting the main scientific principles for clinical studies. Both practices aim to sensibilize the participants to read more critically nonscientific articles and to evaluate health information. TCS00.2 Interactive browser tools for teaching statistical bioinformatics using the R-shinyenvironment Jung K., Kruppa J. University of Veterinary Medicine Hannover, Hannover, Germany Since the middle of the 990s multiple technologies for generating high-throughput data are used in molecular biology, for example microarrays or next-generation-sequencing. The analysis of these data involves techniques from the (bio)statistics as well as the bioinformatics area, collectively often termed as 'Statistical Bioinformatics'. This branch of methods is nowadays regularly taught by biostatisticians in diverse university study programs involving students of different disciplines such as statistics, bioinformatics, biology or medicine. The analysis of molecular high-throughput data often involves multiple steps where many of them are computationally expensive and are usually not performed on a single computer but on a computing server. Thus, computing time and lacking infrastructure can make the demonstration of the methods in a software course difficult. We therefore employed the 'Shiny' environment for the statistic software R that enables the demonstration of interactive graphics in a web browser. Using sliding controllers and other input elements, the lecturer can demonstrate in real time how a graphic changes when different analysis parameters change. We implemented several units for typical questions in Statistical Bioinformatics (e.g. cluster analysis, differential testing and gene set enrichment analysis) and make them freely available at A first evaluation of this teaching strategy was done during an R course at our university. The results were compared to students who were taught without the shiny applications. As a result, we observed a trend that shiny-students performed better. TCS00.3 Interactive graphics as an effective means to gain insights and promote interdisciplinary communication - technical possibilities, implementation and use cases Englert S.,2 AbbVie Deutschland GmbH & Co KG, Data and Statistical Sciences, Ludwigshafen, Germany, 2 Ruprecht-Karls-Universität Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany If a picture says a thousand words, then a dynamic, interactive display may offer an even greater potential to promote interdisciplinary communication and efficiently gain insights. The potential uses of interactive graphics run the gambit: for teaching purposes and in the classroom they may be pedagogic, helping to visualize and explain complex statistical concepts; in pharmaceutical development they may assist trial design by exploring the impact of changing working assumptions; for researchers they may provide tools for communicating ideas and demonstrating the properties of methodology. In this talk, we will describe two technical possibilities that very easy realize interactive graphics both in technical and practical terms: Wolfram Research Demonstrations and R Shiny Applications. The first one is based on a solution by Wolfram Research, the developers of the commercial mathematical and scientific software package Mathematica'. The second is based on the free programming language R'. They will be compared and their advantages and disadvantages will be considered. Both do not require any special installations in the presentation room and can be used just about anywhere. Two practical examples will illustrate the talk. One is taken from a lecture for students of the human medicine disciplines and demonstrates how interactive graphics can be integrated into the course material. The second makes available a new methodology for efficient trial design in an interactive web application. TCS002 Innovative Applications of Resampling in Complex Models Fri :30-0:00 Lecture Hall KR 8 Chair(s): Arne Bathke The development of non-parametric methods for ever more complex modeling situations is an important task of modern statistical development. In this session novel methods for four different applications are presented, including different resampling approaches for repeated measurements, multivariate data and split plot designs. One talk is specifically concerned with a non-parametric approach to quantifying niche overlap, a problem occurring in the field of ecology. TCS002. Resampling approaches for repeated measures designs and multivariate data - Theory, R- package and applications Friedrich S., Brunner E. 2, Konietschke F. 3, Pauly M. Ulm University, Institute of Statistics, Ulm, Germany, 2 University Medical Center Göttingen, Institute of Medical Statistics, Göttingen, Germany, 3 University of Texas at Dallas, Department of Statistics, Dallas, United States In many experiments in the life, social or psychological sciences the experimental units are observed at different occasions, e.g., different time points. This leads to certain dependencies between observations from the same unit and results in a more complicated statistical analysis. Classical repeated measures models assume that the observation vectors are independent with normally distributed error terms and a common covariance matrix for all groups. However, these two assumptions are often not met in practice and may inflate the type-i error rates of the corresponding procedures. We present a different approach working under covariance heterogeneity and without postulating any specific underlying distribution. Furthermore, we allow the number of time points to vary between treatment groups. Based on the asymptotic pivotal Wald-type test statistic (WTS), we propose a studentized permutation technique generalizing the results of Pauly et al. (205, JRSS-B). The permutation procedure leads to astonishingly successful results in spite of the dependencies in a repeated measures design which is shown in extensive simulation studies. Similar to the GFD package, the permutation procedure for repeated measures is implemented in the R-package MANOVA.RM along with different resampling methods for multivariate data. We shortly present the package and use it to analyze a practical data set. 22

25 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS002.2 Wild bootstrapping rank-based procedures: split-plot designs and multiple testing Umlauft M., Konietschke F. 2, Pauly M. Ulm University, Ulm, Germany, 2 University of Texas at Dallas, Dallas, United States Split-plot designs are one of the most frequently used methods in several scientific fields. If the global null hypotheses of no treatment or time effect are rejected, multiple comparisons between the groups are usually performed, e.g. testing all pairwise hypothesis by means of unpaired t-test type statistics and/or adequate simultaneous confidence intervals for the corresponding treatment effect. However, the underlying distributional assumptions (such as normality and variance homogeneity) are often not met in real data. Furthermore, the used effect sizes (mean differences or more general mean contrasts) may not be appropriate measures, especially if ordinal or ordered categorical data are present. To this end, several nonparametric procedures for simultaneous inference in general factorial designs have been studied. Here, the current approaches do not lead to simultaneous confidence intervals for contrasts in adequate effect measures. Thus, global inference and multiple testing procedures for an adequate nonparametric effect measure are required. Konietschke et al. (202) and Brunner et al. (206) studied an inviting solution for this issue. Nevertheless, their methods are in general not asymptotically correct. In this talk we discuss a novel approach of rank-based multiple comparison procedures for split-plot designs which can be used to test hypotheses formulated in terms of purely non-parametric treatment effects. In particular, different resampling methods as small sample size approximations will be discussed. TCS002.3 A fast and robust way to estimate overlap of n-dimensional niches and draw inference Parkinson J.H., Kutil R., Kuppler J., Junker R.R., Trutschnig W., Bathke A. Paris Lodron University, Salzburg, Austria Recent methodological progress has reignited interest in the problem of quantifying niche overlap. In ecology, not only the quantification of species niches is of interest but also the quantification of the multivariate space of a community, among others. Another application area would be given in economics where one may be interested in the market niche of a company or a product. A long time there were no adequate methods for the quantification, until a generalized approach was proposed by Blonder (204, Global Ecology and Biogeography). Two more general methods to estimate n-dimensional niche overlap were then proposed by Carmona et al. (205, Functional Ecology) and Junker et al. (206, Methods in Ecology and Evolution). In this presentation we provide an interpretable quantification of niche overlap without imposing a particular parametric model or distribution family. That is, the approach presented here is fully nonparametric. The basis for our approach is given by the recently published nonparametric solution approach by Junker et al. (206) which consists of dynamic range boxes. Our method uses rank statistics for the estimators of intervals wherefore it provides a quicker calculable and easier interpretable approach. Further, we provide confidence intervals for the true overlap which have not been yet provided in any of the previous papers. TCS002.4 General multivariate ANCOVA for small sample sizes Zimmermann G.,2,3, Bathke A.C. 3 Department for Neurology, Christian Doppler Klinik, Paracelsus Medical University, Salzburg, Austria, 2 Spinal Cord Injury and Tissue Regeneration Centre Salzburg, Paracelsus Medical University, Salzburg, Austria, 3 Department for Mathematics, Paris Lodron University, Salzburg, Austria In medical research, it is often necessary to account for one or several covariates when comparing the means of two or more groups. This can be done by assuming a uni-or multivariate analysis of covariance (ANCOVA) model for the data. However, the "classical" F test, which is used for the comparison of the adjusted group means, is based on assumptions such as homoskedasticity and normality of the errors. In practical applications, especially in research on patients with rare diseases such as spinal cord injury, it is difficult if not impossible to check those assumptions, due to the small sample size. Therefore, we consider a general multivariate ANCOVA model, allowing for heteroskedasticity as well as non-normality. As we place special emphasis on the case of small and highly unbalanced group sizes, we also apply bootstrap techniques, in order to improve the finite-sample properties of the test. In addition to our theoretical considerations, we provide a real-life data example, illustrating the proposed methods. TCS003 Advanced Decision Making in Drug Development Tue :30-8:00 Lecture Hall HS 3 Organizer(s): Zoran Antonijevic Chair(s): Keaven M Anderson Discussant(s): Martin Posch This session will discuss decision making that goes beyond optimizing individual trials or development programs. The first two presentations will address sponsor s considerations for optimizing their portfolios. The third presentation will discuss how regulatory decisions can maximize benefit to patients and society. The session will be concluded by a discussant. TCS003. Maximizing the efficiency of proof of concept studies and of arrays of proof of concept studies for multiple drugs or indications Beckman R. Georgetown University Medical Center, Oncology and Biostatistics, Washington, United States Phase II proof-of-concept (POC) trials determine which therapeutic hypotheses will undergo definitive Phase III testing. The number of possible POC hypotheses likely far exceeds available public or private resources. We propose a design strategy for maximizing efficiency of POC trials that obtains the greatest knowledge with the minimum patient exposure. We compare efficiency using the benefit-cost ratio, defined to be the risk-adjusted number of truly active drugs correctly identified for Phase III development divided by the risk-adjusted total sample size in Phase II and III development. It is most cost-effective to conduct smaller POC trials which are powered at 80% on an effect size 50% larger than that of minimal clinical interest, allowing more possible POC hypotheses to be investigated under a finite POC budget constraint. We also consider parallel arrays of POC trials with multiple indications or drugs, and sequential two-stage POC trial arrays where all drugs get an initial allotment of POC trials and only those which achieve a POC get further investment. These strategies can improve the output of successful drugs by up to 30% at a constant budget. 23

26 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS003.2 Impact of adaptive design on value of pharmaceutical portfolios Antonijevic Z. Amgen, Thousnad Oaks, United States This presentation will first describe general principles of pharmaceutical portfolio optimization. It will then assess the impact of adaptive design on portfolio value. To accomplish this an example portfolio was designed, and strategies that did or did not include trials with adaptive designs were specified. We demonstrate that adaptive portfolios offer some advantages over traditional ones. Its flexibility largely increases the number of decision points, and as such it allows for a much more frequent reassessment of portfolios. Additionally, an adaptive portfolio can correct itself if initial decisions were made incorrectly. Despite these advantages, the adaptive portfolio did not outperform the traditional portfolio in terms of the expected Net Present Value. The main reason is that in this case, adaptive designs allowed for increases in sample size to the point where improvements per unit increase were minimal, instead of allocating this budget to additional trials. TCS003.3 A decision theoretical modeling for Phase III investments and drug licensing Burman C.-F., Miller F. 2 AstraZeneca / Chalmers University, Statistical Innovation, Mölndal, Sweden, 2 Stockholm University, Stockholm, Sweden Consider a population of candidate drugs with different, but unknown, efficacy. Depending on Phase II data for each drug, sponsors will decide whether to invest in Phase III. Based on the outcome of Phase III, the regulatory agency will give or not give market authorization. The end result will be that a certain number of drugs reaches the patients and causes benefit and harm. Based on a number of assumptions, the total net benefit to patients and to the society is quantified. We show that, everything else being held constant, regulatory requirements should be lower for rare diseases than for common ones. Additionally, we demonstrate that if sponsors are uncertain about how regulators will weigh efficacy vs. safety, this uncertainty will translate into a loss of value both for sponsors and patients. TCS004 Basket Trials using Bayesian Adaptive Design Tue :30-6:00 Lecture Hall KR 8 Chair(s): Peter Mueller Discussant(s): Peter Mueller Speakers in this session will present some current approaches to Bayesian basket trials. A basket trial enrolls patients on the basis of particular biomarkers or molecular aberrations. This typically includes patients across different cancer types. A successful basket trial master protocol strikes a compromise between the two extremes of simply pooling patients across cancers on one hand, and running separate studies without borrowing strength on the other hand. This session introduces three specific examples, and a final discussion to highlight the common themes and issues. In summary, the session aims to show that basket trials are exciting new opportunities to advance cancer research and enable decisions in drug development." TCS004. A subgroup cluster-based Bayesian adaptive design for precision medicine Ji Y. University of Chicago, Chicago, United States In precision medicine, a patient is treated with targeted therapies that are predicted to be effective based on the patient s baseline characteristics such as biomarker profiles. Oftentimes, patient subgroups are unknown and must be learned through inference using observed data. We present SCUBA, a Subgroup ClUster-based Bayesian Adaptive design aiming to fulfill two simultaneous goals in a clinical trial, ) to treatments enrich the allocation of each subgroup of patients to their precision and desirable treatments and 2) to report multiple subgroup-treatment pairs (STPs). Using random partitions and semiparametric Bayesian models, SCUBA provides coherent and probabilistic assessment of potential patient subgroups and their associated targeted therapies. Each STP can then be used for future confirmatory studies for regulatory approval. Through extensive simulation studies, we present an application of SCUBA to an innovative clinical trial in gastroesphogeal cancer. TCS004.2 A nonparametric bayesian basket trial design Xu Y., Mueller P. 2, Tsimberidou A. 3, Berry D. 3 Johns Hopkins University, Baltimore, United States, 2 University of Texas at Austin, Austin, United States, 3 University of Texas M.D. Anderson Cancer Center, Houston, United States Targeted therapies on the basis of genomic aberrations analysis of the tumor have become a mainstream direction of cancer prognosis and treatment. Regardless of cancer type, trials that match patients to targeted therapies for their particular genomic aberrations, are well motivated. Therefore, finding the subpopulation of patients who can most benefit from an aberration-specific targeted therapy across multiple cancer types is important. We propose an adaptive Bayesian clinical trial design for patient allocation and subpopulation identification. We start with a decision theoretic approach, including a utility function and a probability model across all possible subpopulation models. The main features of the proposed design and population finding methods are that we allow for variable sets of covariates to be recorded by different patients, adjust for missing data, allow high order interactions of covariates, and the adaptive allocation of each patient to treatment arms using the posterior predictive probability of which arm is best for each patient. The new method is demonstrated via extensive simulation studies. TCS004.3 Bayesian baskets for biomarker-based clinical trials Trippa L., Alexander B. Dana-Farber/Harvard, Boston, United States Biomarker-based clinical trials provide efficiencies during therapeutic development and form the foundation for precision medicine. These trials must generate information on both experimental therapeutics and putative predictive biomarkers in the context of varying 24

27 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed pretrial information. We generated an efficient, flexible design that accommodates various pretrial levels of evidence supporting the predictive capacity of biomarkers while making pretrial design choices explicit. We generated a randomization procedure that explicitly incorporates pretrial estimates of the predictive capacity of biomarkers. To compare the utility of this Bayesian basket (BB) design with that of the utility of a balanced randomized, biomarker agnostic (BA) design and a traditional basket (TB) design that includes only biomarker-positive patients, we iteratively simulated hypothetical multi-arm clinical trials under various scenarios. BB increased power over BA in cases when the biomarker was predictive and when the experimental therapeutic worked in all patients in a variety of scenarios. BB also generated more information about the predictive capacity of biomarkers than BA and categorically relative to TB, which generates no new biomarker information. The BB design offers an efficient way to generate information on both experimental therapeutics and the predictive capacity of putative biomarkers. The design is flexible enough to accommodate varying levels of pretrial biomarker confidence within the same platform structure and makes clinical trial design decisions more explicit. TCS005 High Dimensional Model Selection with Genetic and Biometrical Applications Thu :30-6:00 Lecture Hall KR 7 Chair(s): Florian Frommlet The purpose of this session is to highlight different recent developments in high-dimensional model selection covering both theoretical as well as applied aspects. Malgorzata Bogdan will open this session by providing a comprehensive overview on recent developments in the theory of high-dimensional model selection. Specifically she will address different selection strategies which control the false discovery rate of regressors entering selected models. The second and the third speaker will each discuss model selection in a very specific high-dimensional context. Mélina Gallopin will focus on Gaussian graphical models which are utilized to infer networks from gene expression data. This task is often considerably complicated by the relatively small sample size available to infer the underlying graph. Gregory Nuel will introduce a novel approach to estimate piecewise constant hazard rates in survival analysis using model selection techniques based on L0 regularization. The method provides an automatic procedure to find the number and location of cut points and to estimate the hazard on each cut interval. For illustrative purposes the method is applied to estimate cancer incidence rates using cohort data. The final talk by Aliaksandr Hubin will be concerned with computational questions arising in high-dimensional model selection. He will introduce an efficient mode jumping MCMC for Bayesian variable selection which has been implemented for the generalized linear mixed model. Its performance will be compared with other state of the art approaches for protein activity data and real epigenetic data. TCS005. Identifying predictors in large data bases: FDR control and predictive properties Bogdan M. University of Wroclaw, Wroclaw, Poland Consider the problem of the estimation of the vector of means in the multivariate normal distribution with the diagonal covariance matrix. It is well known that when the number of nonzero means is relatively small, the optimal prediction properties can be obtained by application of the thresholding procedures related to the control of the False Discovery Rate (FDR). Thus, it is also expected that control of FDR will lead to the optimal prediction properties in the sparse regression problems with roughly orthogonal predictors. In this talk we will present several strategies for FDR control in the multiple regression and some theoretical results illustrating their superior predictive properties. We will also demonstrate practical advantages of these methods in the context of selection of patients for personalized therapies based on the genotype data. TCS005.2 Nonlinear network-based quantitative trait prediction from transcriptomic data Gallopin M., Devijver E. 2, Perthame E. 3,4 Université Paris Sud, Orsay, France, 2 KU Leuven, Leuven, Belgium, 3 Institut Pasteur, Paris, France, 4 Inria, Grenoble, France Quantitatively predicting phenotype variables by the expression changes in a set of candidate genes is of great interest in molecular biology but it is also a challenging task for several reasons. First, the collected biological observations might be heterogeneous and correspond to different biological mechanisms. Secondly, the gene expression variables used to predict the phenotype are potentially highly correlated since genes interact through unknown regulatory networks. In this talk, we present a novel approach designed to predict quantitative traits from transcriptomic data, taking into account the heterogeneity in biological observations and the hidden gene regulatory networks. The proposed model performs well on prediction but it is also fully parametric, which facilitates the downstream biological interpretation. The model provides clusters of individuals based on the relation between gene expression data and the phenotype, and also leads to infer a gene regulatory network specific for each cluster of individuals. We perform numerical simulations to demonstrate that our model is competitive with other prediction models, and we demonstrate the predictive performance and the interpretability of our model to predict alcohol sensitivity from transcriptomic data on real data from Drosophila Melanogaster Genetic Reference Panel (DGRP) TCS005.3 L0 regularization for the estimation of piecewise constant hazard rates in survival analysis and application to the estimation of cancer incidence using cohort data Goepp V., Bouaziz O., Nuel G. 2 MAP5 - CNRS Paris Descartes, Paris, France, 2 LPMA - CNRS UPMC - Sorbonne Universités, Paris, France In a survival analysis context we suggest a new method to estimate the piecewise constant hazard rate model. The method provides an automatic procedure to find the number and location of cut points and to estimate the hazard on each cut interval. Estimation is performed through a penalized likelihood using an adaptive ridge procedure. A bootstrap procedure is proposed in order to derive valid statistical inference taking both into account the variability of the estimate and the variability in the choice of the cut points. The method is then extended through a bi-dimensional segmentation procedure to cancer cohort data, where incidence is known to depend both on the age of the patient and on calendar time. The method is illustrated with the estimation of breast cancer incidence using the E3N cohort (00,000 women) and its performances are compared to classical age-period-cohort methods (typically using smoothing splines). 25

28 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS005.4 Variable selection in binomial regression with latent Gaussian field models for analysis of epigenetic data Hubin A., Storvik G., Grini P. University of Oslo, Oslo, Norway Epigenetic observations are represented by the total amount of reads from a particular cell and the amount of methylated reads, which are reasonable to model via a Binomial distribution. There are numerous factors that might influence the probability of success from a particular region. We might also expect spatial dependence of these probabilities. We incorporate dependence on the covariates and spatial dependence of probability of being methylated for observation from a particular cell by means of a binomial regression model with a latent Gaussian field. We finally divide genome into a number of regions to reduce computational effort and avoid heteroscedasticity and carry out efficient mode jumping MCMC with simultaneous model selection with respect to different model selection criteria across different choices of covariates of the regression model for each region in order to draw from posterior distributions of parameters of the models and models jointly and finding the best choice of covariates that influence methylation structure in all of the regions within the genome. TCS006 Futility Evaluation in Complex Clinical Trials Wed :30-3:00 Lecture Hall HS 2 Organizer(s): Koko Asakura, Frank Bretz Chair(s): Toshimitsu Hamasaki Many recent clinical trials for evaluating efficacy and safety of new interventions include multiple objectives, especially in medical product development. Clinical trials with multiple objectives can be extremely expensive and resource intensive as they often require the enrollment of large numbers of participants and the collection of massive amounts of data. Therefore, in such trials, stopping a clinical trial when the interim results suggest that it is unlikely to achieve the study primary objectives can save time and money as well as prevent patients from being exposed to ineffective interventions unnecessarily. However, they introduce challenges in terms of complex decision-making processes associated with the multiple objectives. The session will cover reviewing the challenging issues and recent methodological developments in futility evaluation in clinical trials with multiple objectives. The session will bring together worldwide statisticians and related professionals who are involved in biopharmaceutical research, medical products development to share and exchange experiences and research findings. TCS006. Futility stopping: Some statistical and practical considerations Friede T.,2 Universitätsmedizin Göttingen, Institut für Mediznische Statistik, Göttingen, Germany, 2 DZHK (German Centre for Cardiovascular Research), partner site Göttingen, Göttingen, Germany Adaptive designs are defined as "a clinical study design that uses accumulating data to decide how to modify aspects of the study as it continues, without undermining the validity and integrity of the trial" (Gallo et al, 2005). From a regulatory perspective requirements in terms of validity and integrity include type I error rate control and maintaining the blinding (CHMP, 2007; FDA, 200). The statistical challenges of adaptive designs include repeated testing, combination of pre and post adaptation data and multiple hypotheses. As methods exists to deal with all three challenges the idea of so-called seamless adaptive designs is to combine these to achieve the above mentioned characteristics of validity and integrity. Futility assessments are among the most popular adaptations. A number of measures have been proposed to facilitate futility stopping (Gallo et al, 204). From a practical perspective choice of futility threshold and timepoint when to carry out a futility assessment are important questions which should be explored through simulations (see e.g. Benda et al, 200; Friede et al, 200). In this presentation we consider these points. Furthermore we discuss the role of futility stopping in a variety of settings contrasting investigator initiated trials with industry trials, and exploratory trials with confirmatory ones. The role of data monitoring committees are considered. Examples from cardiovascular trials will be used to motivate and illustrate the considerations (Filippatos et al, 207). References: Benda N et al (200) Aspects of modernizing drug development using scenario planning and evaluation. Drug Information Journal 44: Filippatos GS et al (207) Independent Academic Data Monitoring Committees for Clinical Trials in Cardiovascular and Cardiometabolic Diseases. European Heart Failure Journal (in press). Gallo P et al (204) Alternative Views on Setting Clinical Trial Futility Criteria. JBS 24: TCS006.2 Characterizing behavior of futility schemes in multiple endpoint situations Gallo P. Novartis, East Hanover, United States Determining futility stopping thresholds for a single main outcome in a multi-look scheme is not always straightforward, as design operating characteristics must be carefully considered. Methods sometimes considered, such as conditional power (CP) or predictive probability (PP) thresholds, or beta-spending approaches, may be sub-optimal and not induce desirable behavior across all timepoints at which they might be used. It can be particularly problematic if CP or PP values are over-interpreted as chances of trial success. Most literature on this topic focuses on a single primary endpoint, but in studies with multiple important objectives it seems imperative to base actions on consideration of those objectives jointly. This adds additional dimensions of complexity to determining sensible thresholds. We should not necessarily expect signals across different objectives to be highly concordant, and thus need to think carefully about outcome patterns that could really justify stopping. In multiple-objective studies, it's not uncommon that some objectives are overpowered (i.e., to achieve sufficient power for others), so familiar thresholds that may make sense in other circumstances may not behave as expected. Additionally: different objectives may have different degrees of importance; there may be differing levels of prior belief or evidence regarding favorable outcomes; information may accrue at different rates (perhaps at unknown rates for some endpoints); correlation among endpoints may be relevant to account for; there may be different ethical implications of poorly-trending results for different endpoints. Based on such considerations, we discuss how we might extend single-endpoint schemes to these more complex situations. 26

29 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS006.3 Interim evaluation of futility in clinical trials with co-primary endpoints Asakura K., Hamasaki T., Evans S. 2 National Cerebral and Cardiovascular Center, Department of Data Sicence, Osaka, Japan, 2 Harvard T.H. Chan School of Public Health, Department of Biostatistics, Boston, United States Co-primary endpoints offer an attractive design feature as they capture a more complete characterization of the effect of an intervention. Superiority clinical trials with co-primary endpoints are designed to evaluate if the test intervention is superior to the control on all of the primary endpoints. Failure to demonstrate superiority on any single endpoint implies that superiority to the control intervention cannot be concluded. Such trials often require large sample sizes and thus it is desirable to conduct interim evaluation of futility. If it is clear that the null hypothesis would not be rejected and that important effects sizes can be ruled-out with reasonable confidence based on the interim data, stopping a trial early for futility could save valuable resources and time, and prevent patients from being exposed to an ineffective intervention unnecessarily. In this presentation, we discuss futility evaluation using prediction in clinical trials with co-primary endpoints (Evans et al., Drug Inf J 4: , 2007; Li et al., Stat Biopharm Res : ), as a flexible and practical approach for evaluating futility. This approach is appealing in that it provides quantitative evaluation of potential effect sizes and flexible decision-making compared with other approaches. We illustrate the method using an example, and compare these methods with other methods based on the probability of a statistically significant result at the end of the trial given the interim data. TCS006.4 Real world experiences in futility analysis in clinical trials Halabi S. Duke University, Durham, United States Futility analysis is considered part of standard statistical practice during the conduct of clinical trials. Most randomized clinical trials include strategies for terminating a trial early due to inefficacy. By inefficacy we mean both harm and absence of tangible benefit to patients. The motivation for terminating a trial is clear if harm is found because of ethical considerations to minimize patient exposure to a harmful treatment. Stopping a trial early for futility would reduce the required number of patients enrolled and treated with an ineffective therapy and thus provides savings on sample size and the duration of the trial. In addition, patients would have the opportunity to be treated with other alternative therapies which they may benefit from. Although interim analysis for futility serves as an aid in monitoring throughout the trial, the decision to stop a trial for futility is complex. In this talk, different analytical approaches that are commonly used for futility monitoring will be reviewed. The consequences of terminating a trial for inefficacy will be examined with an emphasis on estimation of the treatment effect. Several examples of trials that were stopped early for futility will be discussed. TCS008 Experimental Design I (Theory and Application) Fri :30-0:00 Lecture Hall HS 5 Chair(s): K Moder TCS008. Design of computer experiments - theory, models and applications Pilz J. AAU Klagenfurt, Statistics, Klagenfurt, Austria Over the last three decades, the design of computer experiments has rapidly developed as a statistical discipline at the intersection of the well-established theories of DoE, stochastic processes, stochastic simulation and statistical parameter estimation, with the aim of approximating complex computer models to reproduce the behaviour of engineering, physical, biological, environmental and social science processes. In this paper we focus on the use of Gaussian Processes (GP s) for the approximation of computer models. Whereas GP s have proved to be useful for the analysis of spatially correlated data, these models cannot be simply transferred to analyse complex computer models. We first discuss the main differences between computer models and typically used models for spatial data and then show how to modify the latter models to make them accessible to the analysis of complex computer code. Then we discuss the numerical problems associated with the estimation of the model parameters, in particular, the second-order (variance and correlation) parameters. To overcome these problems we will highlight the use of Bayesian regularization, based on objective priors for the parameters. Finally, we will consider design problems associated with the search for numerical robustness of the estimation procedures. We will illustrate our findings for typical applications in engineering and geophysics, thereby also discussing the strengths and limitations of standard R-packages for estimation, prediction and sensitivity analysis in computer models. TCS008.2 Design for copula models with applications in pharmakocology Mueller W. Johannes Kepler University Linz, Applied Statistics, Linz, Austria Optimum experimental design theory has recently been extended for parameter estimation in copula models. The use of these models allows one to gain in flexibility by considering the model parameter set split into marginal and dependence parameters. An example on drug testing will illustrate that in practical situations considerable gains in design efficiency can be achieved. A natural comparison between different copula models with respect to design efficiency and other aspects of model-building will be provided as well. TCS008.3 Recent accomplishments in weighing designs: theory and applications. Graczyk M. Poznan University of Life Sciences, Mathematical and Statistical Methods, Poznan, Poland The problematic aspects of the presentation are the issues linked to the planning of the experiments in that we determine unknown measurements of p object using n measurement operations, where p and n are set down. We pay special attention to the assumptions concerned the errors of measurements, so we consider the experiments in that the errors are correlated and they have different variances. The target point in this research is such planning of the experiment in the sense of attaining the best estimators of unknown measurements of objects according to the specific optimality criteria. We preset recent theoretical results and a few examples how to 27

30 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed improve the experimental plan to obtain estimators having smallest variance. Some implementations of optimal weighing designs as experimental plans in the issues connected with the infection of spring barley stem base by pathogen Gaeumannomyces graminis and the influence of application of growth regulator and foliar fertilizers mixed used in winter wheat on the quality of grain are stated. We suggested the choice of the objects combination to the experiment was carried out according to the matrix of optimal weighing design. At that time the variance of estimators is significant reduced. TCS008.4 Power of heteroskedasticity tests in presence of various types of skedastic function and sample size Adamec V. Mendel University, School of Business and Economics, Statistics and Operations Research, Brno, Czech Republic This research presents evidence from Monte Carlo (MC) simulations aimed at exploring power of heteroskedasticity tests frequented in Econometrics and (or) Biometrics to verify presence of various patterns of non-constant variances in linear models. Three independent MC simulation schemes were set up in R code to investigate impact of error variance changing with power of the proportionality factor (scheme ) or power of absolute value of centered proportionality factor (scheme 2) or in response to variance multiplication constant applied to a subset of data (scheme 3). Variants of the Monte Carlo schemes differed in parameter levels and sample size replications were generated for every combination of sample size and variance heterogeneity parameter. Empirical power was calculated as a proportion of H 0 rejections at 5% Type I. error rate. Following heteroskedasticity tests were scrutinized: Bartlett, Breusch-Pagan, Glejser, Goldfeld-Quandt, Harrison, Harvey, Koenker, Park, Spearman rank correlation and White. Empirical power was shown to vary in accordance with the simulated type of the skedastic function, parameter levels and sample size. Tests mostly showed adequate power for the skedastic function in scheme and to a lesser extent in scheme 3. In contrast, heteroskedasticity of the skedastic function in scheme 2 was the most difficult to detect by most heteroskedasticity tests. Estimated power curves shall be presented and discussed in detail for the named tests and simulated variants. TCS009 Experimental Design II (Related to Agricultural Experiments) Fri :30-2:00 Lecture Hall HS 5 Organizer(s): K Moder Chair(s): Laurence Madden TCS009. Row-column designs for field trials with an even distribution of treatment replications Piepho H.-P., Williams E. 2, Michel V. 3 University of Hohenheim, Stuttgart, Germany, 2 Australian National University, Canberra, Australia, 3 Landesforschungsanstalt für Landwirtschaft und Fischerei Mecklenburg-Vorpommern, Variety Testing and Biostatistics, Gülzow, Germany When generating experimental designs for field trials laid out on a rectangular grid of plots, it is useful to allow for blocking in both rows and columns. When the design is nonresolvable, randomized classical row-column designs may occasionally involve clustered placement of several replications of a treatment. In our experience, this feature prevents the more frequent use of these useful designs in practice. Practitioners often prefer a more even distribution of treatment replications. In this paper we illustrate how spatial variancecovariance structures can be used to achieve a more even distribution of treatment replications across the field and how such designs compare with classical row-column designs in terms of efficiency factors. We consider both equally and unequally replicated designs, including partially replicated designs. TCS009.2 Incomplete factorial designs with split units useful for agricultural experiments Mejza S., Mejza I. Poznan University of Life Sciences, Department of Mathematical and Statistical Methods, Poznan, Poland This paper deals with two-factor experiments carried out in designs with split units. Particular attention is paid to incomplete split plot and split block designs. For each of these designs we present construction methods that lead to optimal designs with respect to both statistical optimality and practical usefulness. To generate incomplete designs with split units we use the ordinary Kronecker product as well as the so-called semi-kronecker product of generating block designs. We investigate statistical properties such as general balance and efficiency balance for the designs so obtained. The statistical properties of incomplete designs with split units are examined under randomization-derived mixed linear models with fixed factor effects and random block effects. The considerations are illustrated with some agricultural experiments. TCS009.3 Do we need check plots in early generation field trial testing? Möhring J., Laubach E. 2, Piepho H.-P. Universität Hohenheim, Stuttgart, Germany, 2 Nordsaat Saatzucht GmbH, Gudow, Germany In plant breeding, testing of entries in field trials always include some well-known checks. In replicated trials they are used as benchmark for selecting preferred entries or for checking environmental conditions like winter damage or pressure of infection with diseases. Additionally, in augmented designs with unreplicated entries the replicated checks are used for adjusting for block effects and estimating an error variance in each trial. For the latter design, simulations show that partially replicated (p-rep) designs are a preferred alternative (Cullis et al. 2006; Möhring et al. 205). P-rep designs replace check plots by plots for entries already tested within the trial. As it is common to add checks in field trials, the amount of check plots and the benefit of additional check plots in partially replicated designs is unknown. We analyse five large one-year series of winter barley early generation yield trials conducted as p-rep design with additional check plots within each incomplete block using a mixed model approach. We used four approaches fitting three different models to four subsets of the dataset to estimate entry main effects: (i) corresponds to a p-rep design using a model without check effects to data dropping observations from checks, (ii) corresponds to an augmented design using a model including check effects and dropping observations from half of the plots of replicated entries, (iii) the same model as in (iv) to all data or (v) corresponds to an unreplicated design using a model with confounded entry-by-location and error variance and dropping both all 28

31 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed observations from checks and half of the observations of replicated entries. We compared the approaches using the average s.e.d., the inter-location correlation of entry estimates and the correlation of selected entry estimates from the current year and their performance in the following year. TCS009.4 More, larger, simpler: how comparable are on-farm and on-station trials for cultivar evaluation? Schmidt P., Möhring J., Koch R.J. 2, Piepho H.-P. Universität Hohenheim, Biostatistics, Stuttgart, Germany, 2 Pioneer Hi-Bred Northern Europe Sales Division GmbH, Buxtehude, Germany Traditionally, cultivar evaluation trials have been conducted as replicated small-plot on-station trials at a certain number of locations and years. Alternatively, cultivar evaluation may done in on-farm trials with single replicates laid out as large strips. Such trials are often conducted at a comparably larger number of locations. It is not clear how comparable these two trial systems are. Our objective therefore was to compare the precision and accuracy of these two systems using yield data from both German on-farm trials and from official German on-station trials for winter oilseed rape (Brassica napus) across eight years ( ). We set up multivariate mixed models to analyze the combined dataset and estimate heterogeneous variance components. Furthermore, based on 23 genotypes that were common to both datasets, we investigated the genetic correlation between systems and tested for genotype-by-system interaction effects. The results suggest that on-farm trials are comparable to traditional on-station trials in terms of precision but that there is a bias between the two systems' yield estimates. The single, systematic cause for the bias between the systems was identified as the systemspecific group effect of semi-dwarf and long-strawed genotypes. TCS009.5 Effects of violations of prerequisites in sequential designs on sample size and power Moder K. University of Natural Resources and Life Sciences, Applied Statistics and Computing, Vienna, Austria Testing group differences in sequential designs is based on critical values of the standard normal distribution. This approximation is not called into question in various variants if sequential tests. It is the basis not only for assessment of the null hypothesis but also for calculating sample size and power. Additional assumption concerning this kind of tests are homogeneity of variances and normal distributed random variables. In a simulation study effects of using the Alpha-percentile of the standard normal distribution and of violating various prerequisites are evaluated. Simulations were carried out for several situations of assumed population variances and deviations from these assumptions for the sample distributions. Various situations of non normal distributions in regard to skewness and kurtosis and their influence on alpha and power were examined. TCS0 Trending Topics in Health Economics and Outcomes Research Thu :30-3:00 Lecture Hall KR 7 Chair(s): Simon Kirby Discussant(s): Demissie Alemayehu TCS0. Hierarchical models for combining N-of- trials Schmid C.H. Brown University, Biostatistics, Providence, United States N-of- trials are single-patient multiple-crossover studies for determining the relative effectiveness of treatments for an individual participant. A series of N-of- trials assessing the same scientific question may be combined to make inferences about the average efficacy of the treatment as well as to borrow strength across the series to make improved inferences about individuals. Series that include more than two treatments may enable a network model that can simultaneously estimate and compare the different treatments. Such models are complex because each trial contributes data in the form of a time series with changing treatments. The data are therefore both highly correlated and potentially contaminated by carryover. We will use data from a series of 00 N-of- trials in an ongoing study assessing different treatments for chronic pain to illustrate different models that may be used to represent such data. TCS0.2 Collaborating to improve the acceptability of real world evidence by healthcare decisionmakers Willke R., Berger M. 2, Mullins D. 3, Schneeweiss S. 4 International Society for Pharmacoeconomics and Outcomes Research, Lawrenceville, United States, 2 Pfizer, New York, United States, 3 University of Maryland, Pharmaceutical Health Services Research, School of Pharmacy, Baltimore, United States, 4 Harvard Medical School, Boston, United States While clinical trial evidence remains the gold standard for treatment efficacy evaluation, the potential for use of real world data (RWD), and of the real world evidence (RWE) that can be obtained by careful analysis and interpretation of RWD, to inform a variety of healthcare decisions has received a great deal of attention. Nevertheless, the standard concerns about RWE - such as lack of randomization, data quality, and poor attention to hypothesis-testing standards - continue to cast shadows on its credibility and use among decision-makers of all types. To deal with these concerns, not only improvements in and better use of analysis methods, but also efforts to assist decision-makers interpret and have confidence in using RWE for appropriate purposes, are important. The International Society for Pharmacoeconomics and Outcomes Research (ISPOR), a scientific association now comprising over 20,000 members worldwide, has formed task forces to help with these interpretation and confidence issues. One of these task forces was part of a collaboration among ISPOR, the Academy of Managed Care Pharmacy, and the National Pharmaceutical Council to help guide managed care decision-makers in evaluating RWD-based comparative effectiveness research for its credibility and relevance to formulary decisions. Another recent effort is part of a collaboration aimed at strengthening the reproducibility and transparency of analysis of RWD, particularly for regulatory decision-making purposes. Awareness and use of good practices in this area can assist analysts and decision-makers in the generation and use of RWE. 29

32 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS0.3 A solution for study bias, publication bias and p-hacking in observational research Schuemie M.,2 Janssen R&D, Epidemiology, Titusville, United States, 2 Observational Health Data Science and Informatics (OHDSI), New York, United States Current observational research generates evidence at a large scale, and this evidence is primarily disseminated through scientific literature. We have automatically extracted the effect size estimates reported in literature, and the resulting distribution suggests publication bias and p-hacking is prevalent. Furthermore, because observational studies are not controlled experiments they are susceptible to study bias due to unmeasured confounding. We propose a paradigm shift in observational research to resolve these issues. First, instead of the current practice of bespoke analyses we propose the use of a standardized and therefore fully reproducible approach. Because the approach is standardized, we can then evaluate its performance using a gold standard, and calibrate the outputs such as estimates, confidence intervals, and p-values accordingly. Lastly, we apply our approach at large scale and report all results, not just those where p < 0.05, to circumvent publication bias and p-hacking. As a proof-of-concept, we have generated evidence comparing all prevalent treatments for depression for a large set of outcomes of interest, and show that the distribution of estimates we produce is very different from the one observed in literature, indicating our results are more reliable. TCS04 Statistical Challenges Surrounding Multi-Regional Clinical Trials and Global Medical Product Development Thu :30-3:00 Lecture Hall HS 2 Organizer(s): Toshimitsu Hamasaki, Chin-Fu Hsiao Chair(s): Toshimitsu Hamasaki Discussant(s): Jose Pinheiro With the increasing globalization of drug development, it has become important that data from multi-regional clinical trials (MRCTs) can be accepted by regulatory authorities across regions and countries as the primary source of evidence to support marketing approval of medicinal products. International conference on harmonization (ICH) has initiated the process for having a harmonized guidance document on MRCT, entitled E7 General Principles on Planning/Designing Multi-Regional Clinical Trials to provide general principles for the planning and design of MRCTs with the aim of increasing the acceptability of MRCTs in global regulatory submissions, and to address some strategic program issues in the planning and design of confirmatory MRCTs. This session will focus on statistical considerations and regulatory issues in conducting multi-regional clinical trials in the setting of increasing globalization of medical product development. The session will cover reviewing and summarizing issues and recent methodological developments, and practical considerations in design and analysis of multi-regional clinical trials. The session will bring together worldwide statisticians and related professionals who are involved in biopharmaceutical research, development and regulations to share and exchange information, experience and research findings. TCS04. Use of interval estimations to design and evaluation of multi-regional clinical trials Hsiao C.-F., Chiang C. National Health Research Institutes, Institute of Population Health Sciences, Miaoli County, Taiwan Multi-regional clinical trials (MRCTs) have been accepted in recent years as a useful means of accelerating the development of new drugs and abridging their approval time. Global collaboration in MRCTs unites patients from several countries/regions around the world under the same protocol. The statistical properties of MRCTs have been widely discussed. However, when regional variability is taken into consideration, the assessment of efficacy response becomes much more complex. The current study represents an evaluation of the efficacy response for MRCTs based on Howe's, Cochran-Cox's, and Satterthwaite's interval estimations, which have been shown to have well-controlled type I error rates with heterogeneous regional variances. Corresponding sample size determination to achieve a desired power based on these interval estimations is also represented. Moreover, the consistency criteria suggested by the Japanese Ministry of Health, Labour and Welfare (MHLW) guidance to decide whether the overall results from the MRCT, via the proposed interval estimation, can be applied to a specific region or all regions are also derived. An example for three regions is used to illustrate the proposed method. Results of simulation studies are reported so that the proposed method can help determine the sample size and correctly evaluate the assurance probability of the consistency criteria. TCS04.2 Issues of design and analysis of Multi-Regional Clinical Trials based on experiences in Japan Ando Y. Pharmaceuticals and Medical Devices Agency, Tokyo, Japan Multi-Regional Clinical Trials (MRCTs) have been common practice of the global simultaneous development of new drugs, and designing and analysing MRCT are the key for successful simultaneous approval in the multiple regions. In order to accelerate the global development of new drugs and provide clearer points to consider for implementing MRCTs, ahead of other regulatory authorities, the Ministry of Health, Labour and Welfare (MHLW) issued a guidance document entitled "Basic Principles on Global Clinical Trials" in 2007 in Japan. And based on accumulated experiences in evaluating results of MRCTs in Pharmaceuticals and Medical Devices Agency (PMDA), a guidance document entitled "Basic Principles on Global Clinical Trials (Reference Cases)" was issued in 202. There are several challenging issues in MRCTs, and especially the evaluation of results of subjects in a region may be the critical issue. There is a difficulty in evaluating results of MRCTs and the relationship between the results of subjects of a region and that of all subjects in actual new drug review, depending on the therapeutic areas and/or characteristics of the drugs. Recently MRCT was adopted as the topic "E7" of a topic for the International Council for Harmonisation (ICH), and the draft technical guidance has been issued. In this presentation, recent review experiences of the new drugs with MRCTs based on the Japanese guidance documents will be presented, and possible changes based on the draft E7 guidance will be discussed. 30

33 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS04.3 Multi-regional planning with biosimilar studies Chang Y.-W., Xia Q. 2 Abbvie Stemcentrx, Danbury, United States, 2 Boehringer Ingelheim, Danbury, United States In biosimilarity drug development, the evidence of demonstrating biosimilarity is different from the evidence of approving a new drug. Biosimilarity is based on the totality of evidence of multiple steps of assessment. The fundamental steps consist of similarity in analytical and pharmacokinetic assessment. Some of the regional regulatory agencies may allow the sponsor to plan a phase III clinical trial with a single regional reference product instead of references by each region. Since the reference biological drugs marketed in different regions have not been demonstrated for biosimilarity, justification of using a reference product from regions other than the review region would require evidence of bridging between references marketed in different region. The bridging evidence needs to be established in analytical and pharmacokinetic assessment. In this paper, we reviewed the various setups and designs of the first four FDA approved biosimilar products and discussed the potential involvement of multiple comparisons that require type I error rate adjustment and power reduction. The impact may increases when more regional references are involved. TCS04.4 Statistical challenges surrounding multi-regional clinical trials and global medical product development Dai L. Boehringer Ingelheim, Shanghai, China With a rapidly growing trend of globalization, the development of medical products has necessitated the conduct of clinical trials in multiple countries under a common protocol, so as to provide patients worldwide with novel medicines simultaneously. A Multi-Regional Clinical Trial (MRCT) aim at evaluating overall treatment effect and safety profile in the entire study population and attempt to obtain adequate and scientific estimates of the drug effect and safety applicable to each region. Good planning, especially an optimal study design and specifications of how to evaluate consistency, is essential to maximize the probability of success of a MRCT. In this talk, the speaker will present statistical challenges and considerations for the evaluation of regional estimates for decision making throughout the development program. TCS05 Advances in the Model Based Dose Finding Wed :30-6:00 Lecture Hall HS 4 Chair(s): Valerii Fedorov Discussant(s): France Mentré, Stephen Senn In recent years statistical methods of model-based design and analysis of experiments have become a popular and practically sound approach in dose finding and PK/PD studies. Recently a number of new methods were actively developed and implemented. The three presentations of this session address both methodological and applied issues mostly arising in early stages of clinical trials. Dr. Frank Bretz talks on assessing the similarity of dose response and target doses in two non-overlapping subgroups. Together with interesting theoretical findings he illustrates the proposed methods with a real case study and investigates their operating characteristics via simulation. Dr. Fritjof Friese compares various types of adaptive designs for quantal dose-response studies using both analytical and Monte-Carlo approaches. Dr. Moreno Ursino will discuss the optimal design of clinical trials to efficiently gain reliable information on safety, tolerability, pharmacokinetics and mechanism of action of drugs with the major objective of determining the maximum tolerated dose. Prof. France Mentre, who is well known for her results in optimal design of dose response studies, and Prof. Stephen Senn, who published several books on design and analysis of clinical trials, kindly agreed to discuss the above presentations. TCS05. Assessing the similarity of dose response and target doses in two non-overlapping subgroups Bretz F. Novartis, Basel, Switzerland We consider two problems of increasing importance in clinical dose finding studies. First, we assess the similarity of two non-linear regression models for two non-overlapping subgroups of patients over a restricted covariate space. To this end, we derive a confidence interval for the maximum difference between the two given models. If this confidence interval excludes the equivalence margins, similarity of dose response can be claimed. Second, we address the problem of demonstrating the similarity of two target doses for two non-overlapping subgroups, using again a confidence interval based approach. We illustrate the proposed methods with a real case study and investigate their operating characteristics via simulation. TCS05.2 On adaptive designs for logistic binary response models Freise F. TU Dortmund, Dortmund, Germany Since optimal designs for binary response models depend on the unknown parameter, it is common to use adaptive designs, where new design points are determined using estimates based on observations from prior stages. The focus of this talk is on experiments with ``false responses. In psychophysical experiments these can be a product of inattention, in a forced choice experiment generated by guessing or, in medical applications, by a physical or psychological predisposition. An optimized adaptive design based on the Fisher information is proposed to estimate the threshold of logistic models, which incorporate ``false responses. By simulation studies this design is compared to existing methods like the Robbins-Monro procedure and Wu s maximum likelihood method. TCS05.3 Dose-finding methods for Phase I clinical trials using pharmacokinetics in small populations Ursino M., Zohar S., Lentz F. 2, Alberti C. 3, Friede T. 4, Stallard N. 5, Comets E. 6,7 INSERM, UMRS 38, CRC, Paris, France, 2 BfArM, Bonn, Germany, 3 INSERM, UMR 23, Paris, France, 4 Universitätsmedizin 3

34 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Göttingen, Göttingen, Germany, 5 University of Warwick, Coventry, United Kingdom, 6 INSERM, CIC 44, Rennes, France, 7 INSERM, IAME, UMR 37, Paris, France The aim of phase I clinical trials is to obtain reliable information on safety, tolerability, pharmacokinetics (PK) and mechanism of action of drugs with the objective of determining the maximum tolerated dose (MTD). In most phase I studies, dose-finding and PK analysis are done separately and no attempt is made to combine them during dose allocation. In cases such as rare diseases, paediatrics, and studies in a biomarker-defined subgroup of a defined population, the available population size will limit the number of possible clinical trials that can be conducted. Combining dose-finding and PK analyses to allow better estimation of the dose-toxicity curve should then be considered. In this work we propose, study and compare methods to incorporate PK measures in the dose allocation process during a phase I clinical trial. We conducted a large simulation study which showed that adding PK measurements does not improve the efficiency of dose-finding trials either in terms of the number of observed dose limiting toxicities or the probability of correct dose selection. However, it does allow better estimation of the dose-toxicity curve. In conclusion, using PK information in the dose allocation process enriches the knowledge of the dose-toxicity relationship, facilitating better dose recommendation for subsequent trials. Extension to phase I/II, combining dose-finding and PK/PD analysis aiming at finding the most successful dose, will also be explored. TCS06 Statistical Design of Clinical Studies and Modelling of Related Operational Processes Tue :30-3:00 Lecture Hall HS 4 Organizer(s): Valerii Fedorov Chair(s): Zoran Antonijevic Success of clinical trials is defined not only by their statistical properties, like the amount of gained information given sample size, but also by their cost or duration admissibility. Thus the thorough balance between statistical and operational effectiveness should be a crucial factor that determines the design selection, especially in the case of composing adaptive clinical trials. The respective decision making is based on uncertain input information such as number of open sites, enrollment rates, timing of interim looks, drop-out rates etc. Statistical models become crucial both for reliable prediction and maximization of the probability of technical success. Prof. Vladimir Anisimov will present new theoretical results on predictive modeling of clinical trial operations that are based on the Poisson-Gamma model and Bayesian estimation of various operational characteristics. Dr. Nitin Patel and Prof. Suresh Ankolekar will discuss the reliability of the time-to-event predictions. Their approach is an interesting blend of analytical results and Monte-Carlo simulations. Dr. Matthew Austin will present his findings in the methodology of objective comparisons of alternative clinical development program scenarios. Drs. Vladimir Dragalin and Zoran Antonijevic are the discussants and their extensive experience in statistics and drug development will assure an in-depth discussion. TCS06. How reliable are the time-to-event predictions? Ankolekar S.,2, Patel N. 2 Maastricht School of Management, Maastricht, Netherlands, 2 Cytel Inc., Cambridge, United States Reliable predictions are critical to effectively support decision-making in planning/execution of time-to-event clinical trials and trigger improvements in the operational processes. The predictions depend on the specification of the underlying model and the effective use of the accumulating observed data to refine the model as the trial progresses. A wide variety of parametric, semi-parametric, and Bayesian nonparametric models are reported in the literature that have mixed performance in prediction with fully observed historical data of past trials. How can we assess the potential efficacy of such models for the ongoing trial with evolving data? How can we detect any modelspecification issues that impact the predictions? The periodic nature of scheduled monthly/quarterly/half-yearly follow-up of the subjects in time-to-event trials and temporary/permanent loss-to-follow-up implies increasing number of potentially unobserved onset of events in the accumulating data. How do such unobserved events impact the predictions? Do we need to review the operational processes related to the data capture? This presentation addresses these issues in the context of an ongoing large global multi-center time-toevent clinical trial. A methodology is developed to assess the alternative models, detect model-specification issues, and assess the impact of the unobserved events. The assessment is done in terms of shifts in the prediction trends over time. Two alternative simulation models based on underlying Poisson-Gamma model are used to illustrate the methodology. Keywords: Clinical trial events timeline projection, Poisson-Gamma model, time-to-event milestone prediction. TCS06.2 Moving toward objective comparisons of alternative clinical development program scenarios Austin M. Amgen, Inc, Thousand Oaks, United States Traditionally, quantitative comparisons between alternative clinical development programs have been based on high-level comparisons of time, cost, and power of individual studies. Recently, more holistic quantitative comparisons of development programs has increased due to the desire to understand the efficiency of alternative scenarios (e.g., adaptive designs, tuning futility and efficacy boundaries). Using Bayesian techniques for simulation of development programs based on inputs from finance, commercial, and statistical design aspects that include uncertainty at several levels allows for more informed decision-making. An example of this technique with discussion of the methods, input gathering from multiple functions, and ideas for easily understood result summaries will be presented. TCS06.3 Comparing drug development strategies with probabilities of success including benefitrisk assessment to inform decision-making Saint-Hilary G., Robert V. 2, Gasparini M. Politecnico di Torino, Dipartimento di Scienze Matematiche (DISMA) Giuseppe Luigi Lagrange, Torino, Italy, 2 Institut de Recherches Internationales Servier (IRIS), Department of Biostatistics, Suresnes, France Evidence-based quantitative methodologies have been proposed to inform decision-making in drug development, such as metrics to make go/no-go decisions or predictions of success based on the statistical significance of future clinical trials. While these methodologies appropriately address some critical questions on the potential of a drug, they either consider the past evidence without predicting the outcome of the future trials or focus only on efficacy, failing to account for the multifaceted aspects of a successful drug development. As quantitative benefit-risk assessments could enhance decision-making, we propose a more comprehensive approach 32

35 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed using a composite definition of success based not only on the statistical significance of the treatment effect on the primary endpoint, but also on its clinical relevance, and on a favorable benefit-risk balance in the next pivotal studies. For one drug, we can thus study several development strategies before starting the pivotal trials by comparing their predictive probability of success. The predictions are based on the available evidence from the previous trials, which could be combined with new hypotheses on the future development. The resulting predictive probability of composite success provides a useful summary to support the discussions of the decision-makers. We present a fictive, but realistic, example in Major Depressive Disorder inspired by a real decision-making case. TCS07 Signal Detection in Spontaneous Reports with Focus on Increased Frequency Signals Thu :30-8:00 Lecture Hall HS 4 Chair(s): Conny Berlin Post-marketing drug safety surveillance to identify potential safety signals is of vital importance for public health. This includes the early identification of associations between certain types of adverse events and drugs, increases in frequency of known adverse drug reactions, and the identification of adverse drug reactions based on exposure or exposure duration. Most recently attention has shifted to methods directed at detecting increased frequencies of already identified adverse drug reactions. Purpose of the session is to discuss methods and their application/implementation which have recently been proposed by health authorities or the industry. TCS07. A Bayesian algorithm for detecting increases in frequency of spontaneous adverse event reports over time DuMouchel W. Oracle Health Sciences, Miami, United States A Bayesian statistical methodology - focused on temporal change detection - was developed to highlight excursions from baseline spontaneous AE reporting. Regression estimation (both smooth trend and seasonal components) models the monthly counts of a drug s reports containing each particular AE, and then the sum of counts in the most recent two months is compared to the fitted trend. The method does not require report counts from a database of other drugs, as do standard disproportionality methods, but it does use empirical Bayesian shrinkage to adjust for the multiple comparisons challenge caused by possibly hundreds of different event counts being tracked over time. The signaling threshold was tuned, using retrospective analysis, to yield acceptable sensitivity and specificity. TCS07.2 A nonparametric permutation test for high-dimensional data to detect signals of increased frequencies for adverse drug reactions in post-marketing drug safety surveillance Heimann G., Belleli R. 2, Behr S. 2, Berlin C. 2 Novartis Pharma AG, Biostatistics and Pharmacometrics, Basel, Switzerland, 2 Novartis Pharma AG, Drug Safety and Epidemiology, Basel, Switzerland In post-marketing drug safety surveillance one monitors whether the frequencies of adverse drug reactions (ADRs) increase over time. For each drug on the market there is a list of relevant ADRs, which can sometimes include more than 000 different types of ADRs, and the frequency has to be monitored for each type of ADR. An added complexity is that only spontaneously reported cases are registered in the database, while the overall number of patients who take the drug is unknown. Only a very small proportion of the spontaneously reported cases usually experience a specific type of ADR. We propose a nonparametric permutation test which is sensitive to detect increased frequencies, which controls the family-wise type I error across all types of ADRs, and which avoids the large amount of false positive results observed with the current method. We also show that the test can be used when the overall number of spontaneously reported cases varies over time, which is common with new drugs, drugs going off patent, or drugs which are used on a seasonal basis. We will present the results of a large simulation study to show power and type I error control of the proposed procedure, and compare it to alternative approaches. TCS07.3 Time-series forecasting model to detect unexpected increases in frequency of reports of adverse events in EudraVigilance Candore G., Pinheiro L., Slattery J., Zaccaria C., Arlett P. European Medicines Agency, London, United Kingdom The European Medicines Agency developed a time-series forecasting model in view of enhancing the ability to detect adverse events that can potentially manifest as unexpected increases in frequency (UIF) of reports; namely quality defects (QD), medication errors (ME) and cases of abuse and misuse (A/M). A time-series forecasting model of count data was developed based on the assumption of a negative binomial distribution. Historical values, corresponding to the count of case reports received in each of the previous six monthly observations, were used to estimate the expected value for the monitored period. Anomalous values were highlighted when the observed value was above the upper bound of the 95% confidence interval of the estimate and the counts for the monitored period were greater or equal to three or five. The model was tested on EudraVigilance using thirteen true historical concerns. To determine if a historical concern was identified, the date when it was first identified was used as the index date. The proportion of positive highlights identified within the set of historical concerns and the positive predictive value (PPV) were determined for eight different settings of the model. The proportion of historical concerns identified ranged from 37.5% to 00% whereas the PPVs ranged from 0.59% to 8.75% depending on the settings. The PPV of the settings similar to routine signal detection methods (calculation done on the counts of adverse events per substance) ranged from 0.82%, using a threshold on the number of cases of 3, to.29%, using a threshold of 5. In conclusion, the time-series forecasting model highlighted most historical concerns of QD, M/E and A/M, and thus could be a useful tool in the signal detection toolkit for concerns that may manifest as changes in frequency of reporting. 33

36 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS07.4 Implementation of a custom solution for a small database of spontaneous reports Kübler J. Quantitative Scientific Consulting, Marburg, Germany Historically, signal detection methodology in spontaneous reports has been applied and evaluated for medium to large databases. This presentation describes the evaluation, selection and tuning of an approach suitable for a spontaneous reports database of small size with a diverse portfolio. In close collaboration with subject matter experts from Risk Management this led to a tailor-made increased frequency type of solution.special consideration was given to the choice of reference, MedDRA hierarchy level, trade-offs between sensitivity and specificity, and ease of implementation. TCS08 The Use of Matching Adjusted Indirect Comparison in Health Technology Assessments (HTA) Wed :30-8:00 Lecture Hall KR 8 Chair(s): Mark Belger The session will cover 3 topics, each with separate speakers, to describe the use of Matching Adjusted Indirect Comparisons in the context of HTA decision making. Following the presentations there will be a 30 minute structured panel discussion. The three speakers will be joined by experts in HTA/NMA from the Central European Network during which we will also invite questions from the audience. The moderator will ensure questions are appropriately framed and addressed by the most suitable speaker. TCS08. Review of the recommendations from the decision support unit for the use of populationadjusted indirect comparisons in submissions to NICE Phillippo D. University of Bristol, Bristol, United Kingdom We present the findings and recommendations of a recent NICE Technical Support Document (available from regarding the use of population-adjusted indirect comparisons in health technology appraisal. Standard methods for indirect comparisons and network meta-analysis are based on aggregate data, with the key assumption that there is no difference between trials in the distribution of effect-modifying variables. Two methods which relax this assumption, Matching- Adjusted Indirect Comparison (MAIC) and Simulated Treatment Comparison (STC), are becoming increasingly common in industrysponsored treatment comparisons, where a company has access to individual patient data (IPD) from its own trials but only aggregate data from competitor trials. Both methods use IPD to adjust for between-trial differences in covariate distributions. Despite their increasing popularity, there is a distinct lack of clarity about how and when these methods should be applied. We review the properties of these methods, and identify the key assumptions. Notably, there is a fundamental distinction between "anchored" and "unanchored" forms of indirect comparison, where a common comparator arm is or is not utilised to control for between-trial differences in prognostic variables, with the unanchored comparison making assumptions that are very hard to meet. Furthermore, both MAIC and STC as currently applied can only produce estimates that are valid for the populations in the competitor trials, which do not necessarily represent the decision population. We provide recommendations on how and when population adjustment methods should be used to provide statistically valid, clinically meaningful, transparent and consistent results for the purposes of health technology appraisal. TCS08.2 A simulated indirect treatment comparison approach that can incorporate uncertainty about the true relationship between confounders and outcome Petto H., Brnabic A.J.M. 2, Kadziola Z. Eli Lilly, Statistics, Vienna, Austria, 2 Eli Lilly and Company (Eli Lilly Australia Pty. Limited), Sidney, Australia Simulated treatment comparison (STC) has been proposed as a method for indirect treatment comparisons (IC) when individual patient data (IPD) for treatments A and C as well as published aggregated data (AGR) for treatments B and C are available. We develop a simulation approach that builds upon the STC incorporating uncertainty around the true shape of the confounder-outcome relationship. Fitting a parametric model to simulated IPD estimates a shape for the A-C outcome. Fitting the same model then to this IPD combined with simulated AGR data, so that it ends near the B-C outcome estimates a shape for a hypothesis of no treatment difference. Parameters of the first model are now weighted into that of the second, so that increasing distances between IPD and AGR study let any predicted A-C effects compared to that of B-C disappear. Means of these differences under simulated AGR study conditions are calculated and from quantiles of the resulting distribution an alpha test of no difference between A and B is constructed. Application of the method to simulated data from known models shows that our method is conservative, and power decreases with increasing distance in covariates between IPD and AGR study. To investigate usefulness for practitioners we apply the method to simulated data from real study results. In summary, when IPD and AGR data are given the outlined method might be a conservative alternative to other available IC methods i.e. when the true relationship between confounders and outcome is not well known. TCS08.3 Combination of several matching adjusted indirect comparisons (MAIC)s Saure D., Brnabic A.J.M. 2, Kadziola Z. 3, Schacht A. Lilly Deutschland GmbH, Bad Homburg, Germany, 2 Eli Lilly and Company (Eli Lilly Australia Pty. Limited), West Ryde, Australia, 3 Eli Lilly Regional Operations GmbH Austria, Vienna, Austria In HTA, Indirect Comparisons (IC; Bucher997) have become an important tool in order to provide evidence between two treatments when no head to head data is available. However, IC may provide biased results when included trials differ in baseline characteristics that have an influence on the treatment outcome. If individual patient data (IPD) for at least one part of the IC is available, this issue can be addressed by application of the Matching Adjusted Indirect Comparison (MAIC; Signorovitch202) method as an extension of Bucher method. Here, IPD is re-weighted to match baseline characteristics of published data. However, Signorovitch neither provided a solution for more than one study per IC nor for the case, when several bridge comparators are available. Here, assuming that published data are similar regarding inclusion/exclusion criteria and treatment effect modifiers, we propose to merge those results by application of metaanalysis methodology to provide a single effect estimate. We applied classic generic inverse-variance meta-analysis in order to provide a single effect estimate in the above described setting. 34

37 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Here, the role of the studies to be combined in usual meta-analysis is taken by the ICs/MAICs. Note, that the ICs/MAICs can be based on several studies. With this, both fixed effect and random effects models can be computed including confidence intervals, p-values, and measures for statistical heterogeneity.we examined the single steps to be taken, when the proposed procedure is applied to a case study. Furthermore, we suggest solutions for issues that generally may occur and discuss differences to a network-meta-analyses (NMA) based approach. The application of meta-analysis methodology in order to combine the results of several ICs/MAICs is feasible and provides additional evidence to compare two treatments when head to head data is missing. This approach can decrease bias for some cost of decreased precision in comparison to an NMA. TCS02 Statistical Issues in Umbrella Trials Wed :30-8:00 Lecture Hall HS 5 Chair(s): Hong Sun Due to the impressive development of molecular markers in recent years it has become possible to consider both multiple markers and multiple treatments in clinical trials. Hence, flexible designs like umbrella trials are becoming of interest. These are study designs in which multiple markers are assessed in each patient and treatment allocation depends on the marker status. Several umbrella trials have been already conducted in oncology. A number of statistical challenges arise in umbrella trials, including adaptive designs with multiple arms and multiple stages, stratified treatment strategies, flexible randomization schemes, multiple subgroup analysis methods and multiple comparison procedures. In this session, we try to face the possible statistical challenges in umbrella trials, to identify potential problems and discuss first solutions. TCS02. Randomized umbrella and basket trial designs Buyse M. IDDI, San Francisco, United States Various designs have been proposed for umbrella and basket trials. The presence of multiple strata in umbrella and basket designs has several advantages: first, one can borrow information across strata using a Bayesian approach (Simon et al, 206); second, one can relax the error rates or allow for uncertain outcomes in some strata, since the interpretation is enriched by outcomes available in other strata (Sargent et al, 200); and third, one can address randomized questions with reasonable power by analyzing data from all strata. The gain in power in an overall comparison of the randomized groups allows one to analyze efficacy outcomes (such as time to progression or overall survival) across all strata, in addition to making decisions based on stratum-specific activity outcomes (such as overall response rate or clinical benefit rate). The choice of comparators will be discussed using actual examples. References: Sargent DJ, Chan V, Goldberg RM. A three-outcome design for phase II clinical trials. Controlled Clincial Trials 200;22:7-25. Simon R, Geyer S, Subramanian J and Roychowdhury S. The Bayesian basket design for genomic variant-driven phase II trials. Seminars in Oncology 206;43:3-8. TCS02.2 Randomization strategies in umbrella trials: moving towards multi-arm trials, factorial designs and efficient analysis Vach W.,2, von Bubnoff N. 3, Sun H.,4 Medical Faculty & Medical Center, University of Freiburg, Institute of Medical Biometry and Statistics, Freiburg, Germany, 2 University Hospital Basel, Dept. of Orthopaedics & Traumatology, Basel, Switzerland, 3 Medical Faculty & Medical Center, University of Freiburg, Comprehensive Cancer Center, Freiburg, Germany, 4 Grünenthal GmbH, Aachen, Germany The molecular heterogeneity of cancer patients often suggest a variety of targeted therapies. Evaluation of each therapy requires to conduct many small studies in the eligible patients. Umbrella trials try to organize such sub-studies under a common master protocol. Patients eligible for several therapies build a natural link between the sub-studies, but until now no use has been made of this fact. We demonstrate that certain randomization techniques allow to include these patients in several sub-studies simultaneously. This implies an increase in power, in particular if combination treatments are allowed in some patients. Moreover, treatment decisions are based on algorithms combining deterministic and random elements avoiding a dependence on unmeasured variables. This allows to borrow information across sub-studies to increase the power by a joint model-based analysis. We present results of a simulation study illustrating the potential gain in power and reduction in sample size by using sophisticated randomization techniques and estimation models based on penalizing higher order interactions. These theoretical advantages have to be balanced against clinical feasibility, in particular with respect to combination of treatments and the assumption of additive treatment effects. Some empiciral results on the effects observed in combination therapies are dicussed. The ideas presented require further elaboration, but suggest advantages when regarding umbrella trials as more than a collection of substudies. TCS02.3 Implementation of a Bayesian adaptive design for a phase II umbrella trial in advanced non-small cell lung cancer: The National Lung Matrix Trial Billingham L., Fletcher P., Brock K., Wherton D., Llewellyn L., Brown S., Popat S. 2, Middleton G. 3 University of Birmingham, Cancer Research UK Clinical Trials Unit, Birmingham, United Kingdom, 2 Royal Marsden Hospital and Imperial College, London, United Kingdom, 3 University of Birmingham, Institute of Immunology and Immunotherapy, Birmingham, United Kingdom The National Lung Matrix Trial is a flagship trial in the United Kingdom being the first to combine the development of a technology platform that screens for multiple genetic aberrations in tumours (provided by the Cancer Research UK Stratified Medicine Programme 2) with testing of multiple novel genetic-marker-directed drugs. The trial is focused on patients with advanced non-small cell lung cancer and currently includes 7 different drugs targeting 20 molecular markers. In addition, patients with no actionable genetic change are included and will be treated with a sequential pipeline of drugs. The trial has been successfully recruiting patients since March 205. This is a multi-centre, multi-arm, phase II trial, each arm testing an experimental targeted drug in a population stratified by multiple prespecified target biomarkers using a Bayesian adaptive design. The overall trial is an umbrella design in that there are multiple drugs being tested on biomarker-specific subgroups of the single disease-specific population. In addition, some of the treatment arms represent a basket design with a single drug being tested in parallel on a set of defined biomarkers. The aim of statistical analysis is to 35

38 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed determine whether there is sufficient signal of activity in any drug-biomarker combination to warrant further investigation. Target recruitment for each drug-biomarker cohort is 30 patients with formal interim analysis at 5 to allow potential early stopping. The paper will illustrate the statistical methodology underpinning the design and highlight the issues of implementing such trials in the real world. TCS02.4 Allocation to subtrials in an umbrella design of patients that are positive for multiple biomarkers: pragmatically or completely at random? Kesselmeier M., Scherag A. Jena University Hospital, Clinical Epidemiology, Center for Sepsis Control and Care, Jena, Germany Umbrella trials have been suggested to simplify the trial conduct when investigating different biomarker-driven experimental therapies in one disease. In such studies, one uses an overarching platform for patient screening with a subsequent allocation to subtrials according to patients' biomarker status. Moreover, it is allowed to stop or to add subtrials in case another promising targeted therapy has been identified. However, little is known about statistical properties of such trials. Let EXP and EXP2 be two targeted therapies with the corresponding biomarkers B and B2 and consider the case that a patient is positive for both biomarkers (i.e., B+B2+). Currently it has been suggested to allocate the patient to the subtrial in which the biomarker positives are less prevalent [Renfro and Sargent, 207; Ferrarotto et al., 205]. At the other extreme, a completely random allocation with equal probability is another option. This alternative should lead to unbiased estimates of EXP and EXP2 relative to a standard therapy (STD) presumably at costs of longer recruitment times compared to the pragmatic approach. We assume an umbrella trial with three parallel group subtrials (B+, B2+, B-B2-) starting at the same time and without adding additional subtrials. In the biomarkerpositive subtrials, EXP and EXP2, respectively, are compared to the STD while the remaining patients all receive STD. Using simulations we investigate statistical properties like type I error rates, power, bias and recruitment time for varying study characteristics including biomarker prevalence and therapy effects in the subtrials for both allocation mechanisms. TCS02.5 Borrowing strength over patient subgroups in phase II trials: What can be done when treatment is only active in a minority of groups? Wiesenfarth M., Kopp-Schneider A. 2 German Cancer Research Center (DKFZ), Biostatistics, Heidelberg, Germany, 2 German Cancer Research Center (DKFZ), Heidelberg, Germany Confronted with an umbrella trial where response rates are observed in multiple subgroups, we have to deal with small subgroup sample size leading to unfavorable operating characteristics, in particular when interim analyses are considered. If it is expected that the patient subpopulations respond similarly to the therapy, the use of Bayesian hierarchical models has been suggested. Thereby, typically the log-odds of responses are assumed to be Gaussian with a common mean and outcome-adaptive borrowing of information across groups is achieved by assigning a hyperprior on the variance term. The implied shrinkage of estimates towards the common mean is known to improve operating characteristics upon independent analysis of groups if they are indeed similar. However, there is a risk of missing promising groups if treatment is only active in a minority of subgroups. Different approaches have been proposed to improve robustness in such situations. The simplest strategy is to accommodate outliers by heavy-tailed priors on the log-odds of responses. Advanced strategies include mixtures of stratified and hierarchical models and models based on Dirichlet process mixtures where borrowing is achieved through clustering. Another strategy compares the stratified and hierarchical model based on estimated out-of-sample prediction accuracy and recommends the model with better performance. We perform a large scale simulation study to compare the operating characteristics of these approaches in various settings of practical relevance. Further, we present an R package implementing all approaches in a unifying interface which simplifies their accessibility and comparison. MCMC computations are efficiently performed in Stan and JAGS and the package supports a large range of priors on the response rate as well as hyperpriors controlling the degree of borrowing. TCS022 Big Data in Practical Applications Fri :30-0:00 Lecture Hall HS 4 Chair(s): Malgorzata Bogdan The purpose of this session is to present different practical problems which require the analysis of Big Data sets. The problems are related to relevant questions from biological data sciences and all talks will be illustrated with real data examples. The proposed statistical methodologies will be very diverse and will illustrate the richness and versatility of the field of analysis of large data. TCS022. Discovery of structural brain imaging markers of HIV-associated outcomes using connectivity-informed regularization approach Karas M., Brzyski D., Ances B. 2, Goni J. 3, Randolph T. 4, Harezlak J. Indiana University, Epidemiology and Biostatistics, Bloomington, United States, 2 Washington University in St. Louis, St. Louis, United States, 3 Purdue University, West Lafayette, United States, 4 Fred Hutchinson Cancer Research Center, Seattle, United States Study of multimodal brain imaging biomarkers of disease is frequently performed by analyzing each modality separately. In our work, we use a recently proposed regularization method, ripeer (ridge-identity partially empirical eigenvectors for regression), to discover early biomarkers of HIV-associated outcomes including CD4 count and HIV RNA plasma level. Specifically, we incorporate information arising from the functional and structural connectivity in the penalized generalized linear model framework to inform the associations between the brain cortical features and disease outcomes. Penalty terms are defined as a combination of Laplacian matrices arising from the functional and structrucal connectivity adjacency matrices. We study the advantages of employing different measures of connectivity as well as synergistic functional and structural information. Finally, we address the issue of using different cerebral cortex parcellations, from the common FreeSurfer parcellations (68 and 48 cortical areas) to a novel multi-modal parcellation into 360 cortical areas, in discovering global and local biomarkers. 36

39 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS022.2 A novel algorithmic approach to Bayesian logic regression Frommlet F., Hubin A. 2, Storvik G. 2 Medical University Vienna, Vienna, Austria, 2 University of Oslo, Oslo, Norway Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. This talk will introduce an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. Comprehensive simulation studies illustrate its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. GMJMCMC is applied to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and for a backcross population in Drosophila. TCS022.3 Classification of human activity based on spherical representation of accelerometry data Kos M. University of Wroclaw, Institute of Mathematics, Wroclaw, Poland Human health is strongly associated with person s lifestyle and level of activity. Therefore, characterization of daily human activity is an important task. The accelerometer is a body-wearable device which enables precise measurements of acceleration changes over time of a part of a body to which it is attached. It can collect over a observations per hour. The signal from an accelerometer can be used to classify different types of human activity. During the session, a novel procedure of classification will be presented. It is based on spherical representation of raw accelerometry data. The classification accuracy of a method is 90% on within-subject level and 83% for between-subject case. The advantage of the method is that it provides classification of short term activities. This property is important in the problem of comparison of two methods of rehabilitation, since it allows to identify the first signals of recovery. Such good classification properties result from an application of the information from the angular part of a signal, which is partially summarized in the so called spherical variance. This interesting and very useful predictor has never been previously used in the analysis of accelerometry data. It's major advantage over other angular measures is the rotational invariance, which makes it insensitive to shifts in an accelerometer location. TCS022.4 Beyond GWAS. The application of SNP-chips in genomic selection Szyda J., Frąszczak M., Guldbrandsen B. 2, Mielczarek M., Qanbari S. 3, Simianer H. 3, Suchocki T., Żarnecki A. 4, Żukowski K. 4, Genomika Polska Wroclaw University of Environmental and Life Sciences, Institute of Genetics, Biostatistic Group, Wroclaw, Poland, 2 Aarhus University, Department of Molecular Biology and Genetics - Center for Quantitative Genetics and Genomics, Tjele, Denmark, 3 University of Goettingen, Department of Animal Breeding and Genetics, Goettingen, Germany, 4 Institute of Animal Breeding, Genetics, Cracow, Poland The presentation aims to provide examples of the application of SNP chips in livestock genetics, with a strong emphasis on dairy cattle.in particular the statistical methodology underlying routine genomic evaluation of animals for cattle breeding industry is shown. This involves mixed linear models applied to complex phenotypic data, complex pedigree structure and genotypes of the 50K Illumina BeadChip of animals from the Holstein-Friesian breed. Furthermore, examples of basic research applications of the SNP chip are given including: constructing gene regulatory networks for complex traits, the estimation of genomic similarity between traits, modelling of linkage disequilibrium decay in German and Polish cattle breeds, modelling of espistasis, estimating the effect of rare SNP variants and using the Illumina HD SNP chip for the validation of SNPs identified based on whole genome DNA sequence. TCS023 New Innovations in Handling Longitudinal Biomedical Data using Bayesian Methods Wed :30-:00 Lecture Hall HS 5 Chair(s): Arkendu Chatterjee Recent advances in electronic modes of data collection in biomedical sciences have led to data deluge but also given rise to the problem of analyzing longitudinal data. Thus, the appropriate handling of longitudinal data remains an important statistical problem. The primary focus of this session is to bring together established researchers in the field who will share their novel statistical methods for longitudinal data in biomedical sciences. The first speaker Dr. Michael J Daniels will present a general Bayesian nonparametric approach for longitudinal outcome data that addresses both monotone and non-monotone missingness. The second speaker, Dr. Debajyoti Sinha will present his recent work on modelling longitudinal skewed responses in presence of informative cluster sizes. The third speaker, Jason A. Roy, will present his recent work on using multiple source data from electronic medical records to recover longitudinal outcomes. Dr. Roy will introduce a Bayesian nonparametric approach, which allows for flexible modeling of both the withinsubject trajectories over time, and of the outcome distribution itself. The final speaker Dr. Dimitris Rizopoulos will present his recent work on modern use of shared parameter models for dropout using Bayesian methods in two settings: First, when focus is on a survival outcome and second, when focus is on the longitudinal and will illustrate how these can be used to perform sensitivity analysis in nonrandom dropout settings. TCS023. A general Bayesian nonparametric approach for missing outcome data Linero A., Daniels M. 2 Florida State University, Tallahassee, United States, 2 University of Texas at Austin, Statistics & Data Sciences, Austin, United States Missing data is almost always present in real data, and introduces several statistical issues. One fundamental issue is that, in the absence of strong assumptions, effects of interest are typically not non-parametrically identified. In this article, we review the generic approach of identifying restrictions from a likelihood-based perspective, and provide points of contact for several recently proposed 37

40 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed methods. An emphasis of this review is on restrictions for non-monotone missingness, a subject that has been treated sparingly in the literature. We also present a general, fully-bayesian, approach which is widely applicable and capable of handling a variety of identifying restrictions in a uniform manner. TCS023.2 Bayesian analysis of latent class models for recurrent events with informative observation window Sinha D. Florida State University, USA, Tallahassee, United States For longitudinally observed recurrent events with dependent termination, existing Bayesian and likelihood based methods are primarily based on models with patient-specific continuous random frailty effects. These models lack simple physical interpretations of covariate effects on mean, rate functions and risk of future events given only observed history. We present a new latent class based semiparametric model that, unlike previous frailty models, has similar interpretations of the covariate effects on the intensity, mean and rate functions. This property of our model yields a full understanding of the covariate effects on the recurrent events process and termination event. We derive the iterative MCMC tools of full-bayes method to offer a practical semiparametric inference when there is very limited prior information about the baseline intensity. The asymptotic properties of these estimators extend the estimation procedure to a large-sample based inference. The convenient computation and implementation of our methods are illustrated via the analysis of a heart transplant study. We also extend this latent class model to deal with longitudinal data with informative observation window. TCS023.3 Using multiple source data from electronic medical records to recover longitudinal outcomes - a Bayesian nonparametric approach Roy J. University of Pennyslvania, Biostatistics and Epidemiology, Philadelphia, United States Outcomes from electronic medical records can often be defined from several sources, such as diagnostic codes, medication use, or from elevated laboratory results. Some of these sources might be considered more reliable than others. For example, if the outcome is diabetes, a diagnostic code might be considered stronger evidence (less misclassification) than a single elevated fasting blood glucose value. However, different patients will vary in terms of how much information there is about these different variables. In this paper we model the timing of the true outcome using these various sources. Because laboratory measurements are often recorded repeatedly over time, flexible models are necessary. We take a Bayesian nonparametric approach, which allows for flexible modeling of both the within-subject trajectories over time, and of the outcome distribution itself. The method is illustrated using data from the Mini-Sentinel distributed database. TCS023.4 Modern use of shared parameter models for dropout Rizopoulos D. Erasmus MC, Biostatistics, Rotterdam, Netherlands In follow-up studies different types of outcomes are typically collected for each subject. These include longitudinally measured responses (e.g., biomarkers) and the time until an event of interest occurs (e.g., death, dropout). Often these outcomes are separately analyzed, but in many occasions it is of scientific interest to study their association. This type of research question has given rise in the class of joint models for longitudinal and time-to-event data. These models constitute an attractive paradigm for the analysis of follow-up data that is mainly applicable in two settings: First, when focus is on a survival outcome and we wish to account for the effect of endogenous time-dependents covariates measured with error, and second, when focus is on the longitudinal outcome and we wish to correct for non-random dropout. This talk focuses primarily on the latter use of joint models and illustrates how these can be used to perform sensitivity analysis in non-random dropout settings. TCS024 A Critical Look at Propensity Score Methods - and Alternatives Tue :30-3:00 Lecture Hall HS 2 Chair(s): Vanessa Didelez The propensity score (PS) is the probability of treatment given a sufficient set of confounders. It is a basic ingredient for a number of methods used in the context of causal inference and confounder control with observational data. The methods are: matching on -or stratifying by- the PS, regression adjustment or inverse probability weighting with the PS. The use of PS is relatively simple and has become extremely popular and widespread, not least because editors and referees routinely request it. In this session we want to discuss problems with common default usages of PS methods typically based on simple heuristics, main effect logistic regression, selecting strong predictors of treatment, and ad-hoc methods for high dimensional settings. One of the main purposes of using the PS is to balance the observed confounders between treatment groups; in fact, this is often not actually achieved by using simplistic models for the PS, possibly obtained from a data-driven selection process, or by using ill-defined criteria for adequate balance. Another danger is that by using a model selection procedure, strong predictors for the treatment are selected into the PS model. However, predicting treatment is not the role of PS it is to balance those covariates that are confounders, some of which may be weak or moderate predictors of treatment. Other strong predictors of treatment may not be confounders and their inclusion in the PS model can reintroduce bias into the analysis when there is residual unobserved confounding. There is a growing literature showing that (and how) PS model selection therefore needs to take the relation of covariates with the outcome into account, especially in high-dimensional settings. More generally, it needs to formally account for the fact that the aim is not to predict but to estimate a causal parameter, and selection criteria should be chosen accordingly. At the same time issues of post-selection inference need to be addressed. In this session the speakers will discuss these problems and present recent, careful and principled approaches to PS-based confounder selection and control and some alternatives. 38

41 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS024. Covariate selection algorithms for estimating average causal effects Waernbaum I., Dahlberg M. 2, Mörk E. 2, Vikman U. 3 Umeå University, Umeå, Sweden, 2 Uppsala Universitet, Uppsala, Sweden, 3 Institute for Evaluation of Labor Market and Education Policy, IFAU, Uppsala, Sweden Average causal effects can be identified with observational data under an assumption of no unmeasured confounding. From a rich pool of covariates there can be multiple sets of covariates that are sufficient to adjust for, i.e., providing an unbiased estimate of an average causal effect. Variable selection in applied research is commonly implemented by selecting predictors for regression models. When estimating causal effects, matching/stratification and weighting estimators use models for the propensity score (PS). By its practicality researchers then select covariates as part of the model selection for the PS. However, for the purpose of estimating causal effects, there are drawbacks of selecting the strongest predictors of treatment. In the papers presented here, we describe and implement covariate selection procedures providing sufficient subsets of covariates, freestanding from the PS model selection. The selected subsets are then used in PS-based estimators of the average causal effect. We demonstrate general algorithms for i) searching for sufficient subsets of covariates and ii) evaluating sufficiency for a pre-specified subset. The performance of the algorithms are investigated with two data examples and in simulations for propensity score matching and IPW estimators. TCS024.2 Demystifying improved double robustness Daniel R., Vansteelandt S.,2 London School of Hygiene & Tropical Medicine, Medical Statistics, London, United Kingdom, 2 Ghent University, Ghent, Belgium When making inferences about the causal effect of a point binary exposure on an outcome, inverse weighting by the propensity score (IPW) is perceived to be relatively robust to parametric model misspecification, since it does not rely on a model for the outcome given exposure and confounders. Robins and Rotnitzky (200) showed that simple IPW can be improved upon. In particular, they showed that by specifying both a model for the exposure given confounders and a model for the outcome given exposure and confounders, an estimator can be constructed (henceforth the standard doubly robust estimator, SDR) that is consistent if at least one of the two models is correct and is locally asymptotically efficient in the sense that when both models are correctly specified, there is no estimator relying on the correct specification of the first model that has a smaller asymptotic variance than the SDR estimator. The SDR estimator also has appealing properties in high-dimensional settings when data-adaptive estimation is desirable: the fact that the estimator is orthogonal to the scores corresponding to the nuisance functionals means that convergence rates are better than for other estimators, and standard inference possible. Following a critical paper by Kang and Schafer (2007) "Demystifying double robustness", which showed that---at least in some situations---the non-local finite sample performance of SDR can be poor, a number of alternative DR estimators have been proposed (including Cao et al, 2009; Rotnitzky et al, 20; Gruber and van der Laan, 20; Vermeulen and Vansteelandt, 204). These alternative DR estimators share the desirable asymptotic, local, convergence and inferential properties of the SDR but promise better performance in finite samples and/or under misspecification of one or both of the nuisance models. In this talk, we review these alternative DR estimators and compare their performance in a range of high-dimensional settings. TCS025 New Developments in Adaptive Dose-Ranging Studies Tue :30-6:00 Lecture Hall HS 5 Chair(s): Vladimir Dragalin Discussant(s): Alun Bedding Dose-ranging studies remain the sweet-spot for adaptive designs. The use of adaptive designs in dose-ranging studies increases the efficiency of drug development by improving our ability to efficiently learn about the dose response and better determine whether to take a drug forward into confirmatory phase testing and at what dose. Adaptive designs explicitly address multiple trial goals, adaptively allocate subjects according to ongoing information needs, and allow termination for both early success and futility. This approach can maximize the ability to test a larger number of doses in a single trial, while simultaneously increasing the efficiency of the trial in terms of making better Go/NoGo decisions about continuing the trial and/or the development of the drug for a specific indication. There has been growing interest in the utilization of adaptive approaches for the design and analysis of Phase 2 trials. In this session some of the successful case studies will be presented. But also some new developments in the methodology will be highlighted, including two-stage designs and combination dose-response, supplementing dose-response with exposure-response to improve decision making, considering dose-safety together with dose-efficacy responses in the design and analysis. The implementation of some of these new methods in simulation tools will be presented as well. TCS025. Designing and simulating adaptive MCPMod dose-finding studies Mielke T. ICON Clinical Research, Cologne, Germany Dose finding is frequently considered as the sweet-pot for the application of adaptive study designs. Adaptive approaches are of particular interest for the combination of different study targets within one study. A prominent example are designs combining proof of concept and dose-finding under one umbrella through a stage-wise approach. Benefits of these designs include reduced development duration by removing white space between PoC and dose-finding, as well as more informative study designs, as data gained in the PoC stage will be utilized for the dose-finding part. The MCPMod approach is a model-based trend test combined with dose response modelling. MCPMod is an efficient tool for testing the existence of drug related effects and for the estimation of the size of effects. The application of MCPMod is of particular interest for seamless PoC-dose-finding studies with multi-armed PoC stage. Although less powerful as compared to an optimal PoC design with top dose vs. Placebo only, the inclusion of an intermediate dose in the PoC stage allows for valuable interim information, improving decision making on patient allocation for the remainder of the study. The adaptive MCPMod approach will be presented on the example of a PoC-dose-finding study, including considerations on operational feasibility of the study design. Different model-based designs will be compared for a range of operating characteristics under different simulation scenarios to discuss benefits, limitations and potential extensions of the adaptive MCPMod approach. 39

42 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS025.2 A dose-ranging case study utilizing MCP-Mod methodology Tymofyeyev Y. Janssen RD of J&J, Titusville, United States We consider a Phase 2A/2B clinical trial in the mood disease therapeutic area as a case study utilizing MCP-Mod methodology. In the study, an interim analysis (IA) pursuits multiple objectives: confirmation of Proof-of-concept, switch of initial allocation ratio, with the possibility to add or drop treatment arms, and potential acceleration to a Phase 3 program. MCP-Mod techniques are used to classify observed dose-response information in several profiles implying corresponding decision making. Dose-exposure analysis is also of interest to confirm or alter dose-response inference. TCS025.3 MCP-Mod without guesstimates Bornkamp B. Novartis, Basel, Switzerland The standard implementation of MCP-Mod uses a multiple contrast test to assess a dose-response trend (MCP-step). The contrasts used in this approach depend on the assumed dose-response model classes and also on parameter guesstimates within the model class. In this presentation I will review proposals for alternative ways of performing the MCP-step of MCP-Mod without the need to specify guesstimates. That means that the test procedure will only depend on the assumed model classes, no longer on assumed parameter guesstimates. This implies a greater robustness of the procedure, but also less possibility to focus the procedure towards specific shapes. We will specifically discuss two recent proposals based on maximum likelihood ratio statistics (see Dette et al. (205) and Gutjahr and Bornkamp (207)). References: Dette, H., Titoff, S., Volgushev, S. and Bretz, F. (205), Dose response signal detection under model uncertainty. Biometrics, 7: doi: 0./biom.2357 Gutjahr, G. and Bornkamp, B. (207), Likelihood ratio tests for a dose-response effect using multiple nonlinear regression models. Biometrics, 73: doi: 0./biom.2563 TCS025.4 Incorporating safety signals into efficacy based dose finding Endriss V., Bossert S., Krahnke T. 2, Strelkowa N. Boehringer Ingelheim Pharma GmbH & Co. KG, Biostatistics and Data Sciences, Biberach an der Riss, Germany, 2 Cogitars GmbH, Heidelberg, Germany Recommended doses for drugs should be based on both efficacy and safety evaluations, however dose ranging studies are usually powered only for changes in efficacy. This is because safety signals are difficult to plan for due to their diversity. Nevertheless, some safety risks can be anticipated at the preparation step of the trial based on the mode of action or results from non-clinical studies allowing for corresponding adjustment of trial design. In particular, data-driven adaptations of the trial at interim analyses can be applied in order to better take into account pre-planned and arising safety signals. We start by using MCPMod to model and power the trial based on dose dependent changes in efficacy. In addition to the test for nonflat dose response, we estimate success probabilities for observation of a clinically meaningful effect. The safety signals can then be incorporated via the following approaches: Independent modelling of safety and efficacy followed by calculation of a utility score Combined safety and efficacy longitudinal modelling Bivariate test for efficacy and safety The three approaches are evaluated according to a metric, which includes implicated model assumptions, type of possible conclusions which can be drawn at interim analyses, and extrapolation properties for further clinical development. TCS027 Industry Leadership Panel Discussion Thu :30-8:00 Lecture Hall HS 3 Chair(s): Amit Bhattacharyya, Frank Bretz Panelist(s): Ivan S.F. Chan, Nigel Dallow, Eric Gibson, Pandu Kulkarni, Lisa LaVange, Bob Rodriguez, Jerry Schindler, Nevine Zariffa Drug development is an expensive and time consuming process that has been going through many changes due to political, socioeconomic and pricing pressure around the globe. Large pharmaceutical companies often manage a portfolio of several hundred projects on different experimental medicines, resulting in a large number of clinical trials conducted simultaneously across all phases of development. The earlier phases of drug development are learning phases that tend to be more exploratory in nature compared to the confirmatory phase which is focused on larger trials designed to demonstrate evidence in view of supporting regulatory approval and market access. The opportunities for statistical leadership across the many phases of drug development are considerable. It requires strong leadership skills to guide multidisciplinary teams in the design and planning of scientific research and making decisions based on data. It requires more effective communication to nonstatisticians of the value of statistics in using data to answer questions, predict outcomes, and make decisions in the face of uncertainty. Finally, it also requires a greater appreciation of the unique capabilities of alternative quantitative disciplines such as machine learning, data science, and bioinformatics which represent an opportunity for statisticians to achieve greater impact through partnership. In this panel discussion, statistical leaders in the pharmaceutical domain will offer their views on the strategic directions that statisticians can do to help shape and advance the industry, both in leadership and innovation in the following direction: Key attributes to statistical leadership: Discuss the role of competencies like listening, networking, and communication What does it take for statisticians in leadership roles for the business: Building statistical leadership and communication Examples of statistical leadership in drug development: Illustrate individual leadership within a 40

43 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed project team, enterprise leadership across an entire organization, or policy leadership spanning multiple organizations or an entire research community Bringing leadership and innovation together: How to foster an environment that boosts creative thinking to provide best quantitative sciences support for internal decision-making by integrating multiple domain skills Create a working environment with strong scientific focus and operational excellence to attract and retain top talent within the pharmaceutical industry TCS028 Adaptive Study Designs for Clinical Trials and Longitudinal Data Wed :30-6:00 Lecture Hall KR 8 Organizer(s): Matthew Shotwell Chair(s): Tom Parke In this session, we will discuss adaptive study designs in clinical trials and in longitudinal data settings. When designing a study there is often substantial uncertainty in key areas (e.g. features of the target population, size of treatment benefit) that induce uncertainty in the optimal study design (e.g., timing of samples, sample size). Such uncertainties may increase the risk of a negative or failed study. Adaptive designs can be advantageous over standard fixed study designs because they allow researchers to use accumulating information to reduce uncertainty regarding the optimal design. Adaptations can be implemented that improve estimation efficiency and power for inferential targets (e.g., treatment effects), and increase the probability of successfully addressing the research question. Two of the talks will address clinical trial designs: Dr. Saville will discuss Bayesian adaptive platform clinical trial designs, which feature adaptive number of treatments and biomarker-driven response adaptive randomization under a master protocol. Dr. Wong will present a multi-stage extension of Simon s Two-stage designs for phase II clinical trials. The other two talks will address longitudinal data studies: Dr. Shotwell will discuss adaptive opportunistic PK/PD studies wherein samples are timed indirectly by perturbing the environment, and Dr. Schildcrout will address adapting stratum specific sampling probabilities for retrospective studies. TCS028. Individualized sample timing for pharmacokinetic studies using accessible pharmacodynamic data Shotwell M., Wanderer J. 2 Vanderbilt University Medical Center, Biostatistics, Nashville, United States, 2 Vanderbilt University Medical Center, Anesthesiology, Nashville, United States Estimators of pharmacokinetic parameters are not pivotal; their variability depends upon the parameter value. Thus, the optimal timing of blood samples in pharmacokinetic studies is uncertain. However, easily measured pharmacodynamic data may reduce this uncertainty. For example, the pharmacodynamics of neuromuscular blockers (NMB) such a rocuronium are readily observed by electrical stimulation of the ulnar nerve at the wrist, which causes the thumb (and other digits) to twitch. The degree of neuromuscular blockade is thus measured by the force produced in the adductor policis (e.g., by attaching an accelerometer to the thumb) following a sequence of one or more rapid stimuli. Using Bayesian compartmental methods and prior information from the anesthesia literature, these pharmacodynamic data can be used to approximate individual pharmacokinetics, and thus to optimize the timing of blood samples in that individual. We hypothesized that this approach is more efficient than an optimal one-size-fits-all blood sampling schedule. Monte Carlo methods were used to evaluate this hypothesis under a variety of assumptions about model misspecificaiton and error in the measurement of neuromuscular blockade. Preliminary results suggest that optimal individualized timing can substantially improve estimator D-efficiency relative to optimal one-size-fits-all sampling when there is no model misspecification and little measurement error. TCS028.2 Innovations in multi-arm platform adaptive clinical trials Saville B. Berry Consultants, Austin, United States A "platform trial" is a clinical trial with a single master protocol in which multiple treatments are evaluated simultaneously. Bayesian adaptive platform designs offer flexible features such as biomarker-driven response adaptive randomization, the ability to drop treatments for futility and add new treatments, and the ability to explore combination treatments with greater efficiency. We discuss recent statistical innovations in several high-profile platform clinical trials, including brain and pancreatic cancer, pandemic influenza, Alzheimer's and Ebola. TCS028.3 Adaptive outcome dependent sampling designs for longitudinal data Mercaldo N., Schildcrout J. Vanderbilt University Medical Center, Nashville, United States Retrospective, outcome dependent sampling (ODS) designs are efficient compared to standard designs because sampling is targeted towards those who are particularly informative for the estimation target. In the longitudinal data setting, one may exploit outcome vector, and possible covariate data, from an existing cohort study or electronic medical record to identify those whose expensive to ascertain and sample size limiting biomarker / exposure should be collected. Who is most informative is reasonably predictable and will depend upon the target of inference. In this talk we will discuss improvements in ODS designs for longitudinal data that allow the design (i.e., subject specific sampling probabilities) to be adapted based on data that have been collected. These multi-wave ODS sampling designs use data collected from earlier waves in order to alter the design at the present wave or future waves according to study goals (e.g., obtaining a precision threshold). We will describe the class of designs, examine finite sampling operating characteristics, and apply the designs to an exemplar longitudinal cohort study, the Lung Health Study. 4

44 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS028.4 Extended two-stage adaptive designs with three target responses for phase II clinical trials Kim S., Wong W.K. 2 Wayne State University, School of Medicine, Detroit, United States, 2 UCLA, Department of Biostatistics, Los Angeles, United States We develop a nature-inspired stochastic population-based algorithm and call it discrete particle swarm optimization (DPSO) to find extended two-stage adaptive optimal designs that allow 3 target response rates for the drug in a phase II trial. Our proposed designs include the celebrated Simon's two-stage design and its extension that allows 2 target response rates to be specified for the drug. We show that DPSO not only frequently outperforms greedy algorithms, which are currently used to find such designs when there are only a few parameters; it is also capable of solving design problems posed here with more parameters that greedy algorithms cannot solve. In stage of our proposed designs, futility is quickly assessed and if there are sufficient responders to move on to stage 2, one tests one of the three target response rates of the drug, subject to various user-specified testing error rates. Our designs are therefore more flexible and interestingly, do not necessarily require larger expected sample size requirements than 2 stage adaptive designs. Using a real adaptive trial for melanoma patients, we show our proposed design requires one half fewer subjects than the implemented design in the study. TCS029 Statistical Approaches for Model-Based Clinical Trial Designs Tue :30-3:00 Lecture Hall HS 5 Chair(s): Yuan Ji, Peter Mueller Speakers in this session will introduce novel ideas in model-based clinical trial design. These include the use of PK/PD models for mechanism-based phase I dose/response curves (Y. Li); the separation of probability model and parameters for statistical inference versus an action set and utility function for a subgroup analysis in an enrichment design (S. Morita); the use of predictive evidence threshold scaling to make use of non-confirmatory data (B. Neuenschwander); and the use of nonparametric Bayesian inference for an improved design with a time-to-event endpoint (P. Mueller). TCS029. Predictive evidence threshold scaling (PETS): does the evidence meet a confirmatory standard? Neuenschwander B. Novartis Pharma AG, Basel, Switzerland Making better use of evidence is one of the tenets of modern drug development. This calls for an understanding of the evidential strength of non-confirmatory data relative to a confirmatory standard. Predictive evidence threshold scaling (PETS) provides a framework for such a comparison. Under PETS, the evidence meets a confirmatory standard if the predictive probability of a positive effect reaches the predictive evidence threshold from hypothetical confirmatory data. Obtaining these probabilities requires hierarchical models with plausible heterogeneity and bias assumptions. After introducing the methodology, I will illustrate PETS for a recent breakthrough designation of Crizotinib for non-small-cell lung cancer (NSCLC). The example shows that the evidential strength of nonconfirmatory data can meet a confirmatory standard. This finding is encouraging for modern drug development, which aims to use various types of evidence to inform licensing decisions. TCS029.2 Bayesian population finding with biomarkers in a randomized clinical trial Morita S. Kyoto University Graduate School of Medicine, Department of Biomedical Statistics and Bioinformatics, Kyoto, Japan The identification of good predictive biomarkers allows investigators to optimize the target population for a new treatment. We propose a novel utility-based Bayesian population finding (BaPoFi) method to analyze data from a randomized clinical trial with the aim of finding a sensitive patient population. Our approach is based on casting the population finding process as a formal decision problem together with a flexible probability model, Bayesian additive regression trees (BART), to summarize observed data. The proposed method evaluates enhanced treatment effects in patient subpopulations based on counter-factual modeling of responses to new treatment and control for each patient. In extensive simulation studies, we examine the operating characteristics of the proposed method. We compare with a Bayesian regression-based method that implements shrinkage estimates of subgroup-specific treatment effects. For illustration, we apply the proposed method to data from a randomized clinical trial. TCS029.3 Novel Bayesian dose-finding designs in oncology accounting for the effects of schedule and method of administration by using pharmacokinetic/pharmacodynamic modeling Su X., Li Y. University of Texas M.D. Anderson Cancer Center, Department of Biostatistics, Houston, United States We propose novel Bayesian dose-finding designs to identify maximum tolerated dose (MTD) for a single schedule or each of multiple schedules. Dose refers to the total dose administrated during a treatment cycle. Schedule refers to the timing of administration along with the corresponding dosages during a treatment cycle given the total dose. Method of administration refers to the drug formulation and route of administration. Aiming at improving the efficiency of existing dose-finding designs, we propose to model the dose/scheduleconcentration-effect-clinical outcome (DS-C-E-C) relationship, making use of both the PK data collected in the clinical trials and the expert knowledge often available on the mechanisms of the drug effects, in the form of dynamic pharmacokinetic (PK) and pharmacodynamic (PD) profiles. Specifically, we use the PK models, PD models, and nonlinear functional models for the relationships between the dose/schedule and drug concentration over time, between the instantaneous drug concentration and effect over time, and between the drug effect over time and a clinical outcome, respectively. We consider scenarios where analytic solutions are available for the systems of differential equations that characterize the PK and PD profiles. Posterior inference for the joint models is made using slice sampling. We compare the performance of the proposed designs with the continual reassessment method (CRM), Bayesian model averaging continual reassessment method (BMACRM), modified toxicity probability interval method (mtpi), and the nonparametric benchmark design. Our simulation studies show that the performance of the proposed designs is superior to the existing designs, and is close to the nonparametric benchmark design in the single-schedule case, yet it outperforms the nonparametric benchmark design 42

45 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed when applied independently for each schedule, in the presence of multiple schedules. The proposed modeling framework appears to hold strong promise in switching the paradigm of current early phase oncology trial designs. TCS029.4 A Bayesian nonparametric utility-based design for comparing treatments to resolve air leaks after lung surgery Mueller P., Xu Y. 2, Thall P. 3, Mehran R.J. 3 UT Austin, Austin, United States, 2 JHU, Baltimore, United States, 3 MD Anderson Cancer Center, Houston, United States We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of comparing treatments is complicated by the fact that the resolution time distributions are skewed and multi-modal, so using means is misleading. We address these challenges by assuming Bayesian nonparametric probability models for the resolution time distributions and basing the comparative test on weighted means. The weights are elicited as clinical utilities of the resolution times. The proposed design uses posterior expected utilities as group sequential test criteria. The procedure s frequentist properties are studied by computer simulation. If time permits we will also briefly discuss another application of BNP to comparing treatments in the presence of semicompeting risks. Reference: Xu et al. (206), Bayesian Analysis (206), TCS03 The Future World of Clinical Research: Are We Ready for Big Data Wed :30-6:00 Lecture Hall KR 7 Organizer(s): Hans Ulrich Burger Chair(s): Dominik Heinzmann Discussant(s): Tim Friede Recent advances in leveraging totality of data by merging datasets containing digitalized health records, clinical patient-level data which are shared by pharmaceutical companies, data collected in a real-world treatment setting and other sources are aimed to improve patient care and management. These will necessarily lead to a substantial change of the environment in which new medicine is developed and marketed. In essence, this implies that significantly larger and more heterogeneous datasets will be available to more researchers, health care providers and even to patients. The medical community needs to be able to analyze such available data in a timely manner and to draw adequate conclusions out of it. On one hand techniques in the area of advanced analytics may be needed for these datasets to adequately derive relevant information from such large and heterogeneous datasets. On the other hand the many players working on the same or similar data will also provide additional and sometimes diverse insights into what is known about a given treatment, which may lead to better therapy decisions in optimal cases or add confusion in suboptimal ones. This session will provide a forum to discuss and obtain a high level overview on strategies and tools which might be used in this evolving environment and will discuss the challenges statisticians might face. TCS03. Aiming for total evidence synthesis using real world evidence for high precession benefit risk assessment of treatment regimens Herath A. Novartis Pharmaceuticals Corporation, CoE for Realworld Evidence, Frimley, United Kingdom With the widening of anonymized access to public healthcare data, the systematic study of the total population is within our remit. This presentation will focus on the concept of statistical landscaping of the actionable clinical outcomes of patients of chronic diseases with a view to build a coordinate system of outcomes for specific diseases. Statistical landscaping uses the gold standard of outcome data originating from well characterized patient cohorts, randomized clinical trials, observational studies and real world evidence of public healthcare data. The information returned from the activity results in an ensemble outcome model leading to the concept of total evidence synthesis within the disease area. The activity also paves the way for the integration of patient level data including genomic and other molecular expression data and other big data for wider disease and biological inference for therapeutic development and outcome interpretation and more precise benefit risk assessment of the therapeutic interventions. TCS03.2 Big clinical data: what is there for biostatisticians? Burger H.U. Hoffmann La Roche, Biostatistics, Basel, Switzerland The talk will introduce the various changes in the landscape of clinical data which are either already ongoing or will start in future. Enhanced data sharing and digitalization of health care records will both increase substantially the amount of clinical data being available to researchers, inside and outside of pharmaceutical companies. In addition, besides academic institutions and pharmaceutical companies also data driven companies like Google and IBM enter the clinical field using different methods for analyzing clinical data and claim that they can change the landscape of drug development fundamentally. Along with this, "new" tools like machine learning become more important. This all can be seen both as a threat and as an opportunity for statisticians working in industry. The talk will introduce potential implications. 43

46 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS032 Safety Evaluation, Data Sources and Statistical Strategies: Transforming Data into Evidence Fri :45-4:5 Lecture Hall HS 3 Organizer(s): William Wang Chair(s): Michael O'Kelly Panelist(s): Franz König, Jürgen Kübler This session proposal is based on the work of the ASA Biopharmaceutical Section Safety Working Group, specifically on the contribution from three subgroups: Safety Strategies and Analysis, Regulatory Guidance Review and Industry Survey, and Statistical Methods on Safety Monitoring. Safety evaluation is a continuous process taking place throughout the life cycle of a medical product (drug/ biologics/ device). The importance and need of quantitative safety evaluation during the drug development and post-approval has been recognized by industry and regulatory agencies. The regulatory landscape for safety monitoring has been evolving with a number of regulatory documents released recently. Pharmaceutical industry has been investing into data platforms to collect and analyze safety information more thoroughly. The cost of drug development continues increasing, but the lack of necessary safety data can compromise the regulatory acceptance of a program or a decision on drug reimbursement. How can one gather the necessary safety information without adding too much to drug development cost and without delaying the process of taking an effective medicine to a patient? What data do we need to generate evidence at every stage of drug cycle to address safety? These and other questions on safety evaluation, data sources and statistical strategies will be discussed by the session participants. This session will have 2 presentations and a panel discussion. The st presentation will provide an overview of global regulatory landscape on safety, focusing on the aggregate safety monitoring and evaluation (eg FDA IND final rule) as well as the statistical methodologies to enable that. The 2nd presentation will review different sources of safety data and present statistical strategies of looking at pre-marketing stage of drug/ biologic development, postmarketing safety surveillance, and real world data, and discuss how to use the totality of information to generate evidence. The panel discussion will invite thought-leaders from academia and industry, from EU and US regions, to discuss a variety of issues during pre- and post-marketing safety evaluation: ) Blinded and unblinded analysis for aggregated safety monitoring, 2) Role of Data monitoring committee (DMC) vs Safety Assessment Committee (SAC), 3) Integration of pre-marketing and post-marketing safety in regulatory decision making, 4) Generation of evidence for payers, providers, and patients. TCS032. Transforming safety data into evidence Marchenko O., Jiang Q. 2, Russek-Cohen E. 3, Levenson M. 3, Sanchez-Kam M. 4, Zink R. 5, Ma H. 2, Izem R. 3, Krukas M. QuintilesIMS, Durham, United States, 2 Amgen, Thousand Oaks, United States, 3 FDA, Silver Spring, United States, 4 SanchezKam, LLC, Washington, United States, 5 JMP Life Sciences, Cary, United States In this presentation, different sources of safety data including clinical trials, pragmatic trials, registries, electronic health records and claims will be briefly reviewed and selected statistical strategies and methods for analyzing safety data from pre-marketing stage of drug/ biologic development, post-marketing safety surveillance, and real world will be presented. Additionally, strategies on how to use the totality of information to generate evidence will be discussed. Case studies that used real world data successfully to generate evidence for regulators will be reviewed. TCS032.2 On quantitative methodologies and cross-disciplinary best practices for safety monitoring Wang W. Merck & Co., North Wales, United States Safety monitoring is a continuous process taking place throughout the life cycle of a medical product (drug/ biologics/ device). The regulatory landscape for safety monitoring has involved significantly in recent years, with direct impact by the CIOMS safety reports (ie CIOMS VI and X), ICH guidance (e.g E2 series and M4E) and other regional regulatory guidance on safety. This presentation will first provide an overview of global regulatory landscape on safety, focusing on the aggregate safety monitoring and evaluation (eg FDA IND final rule). We will highlight some key statistical methodologies that can enable those evaluations, including methodologies for blinded and unblinded analyses for aggregated safety monitoring. We will further discuss cross-disciplinary best practice in their practical implementations. TCS033 Recent Approaches to Foster Statistical Innovation in the Pharmaceutical Industry Tue :30-6:00 Lecture Hall HS 4 Organizer(s): Hermann Kulmann Chair(s): Alex Dmitrienko The Statistics function at Boehringer Ingelheim has implemented a small, globally acting team of applied methodology statisticians to promote the transfer of innovative statistical methods to trials and projects. At AstraZeneca the Statistical Innovation group in the Advanced Analytics Centre is a global team which provides advanced methodological and analytical support to all therapeutic area teams. It also drives change within the organization, enabling the adoption of innovative statistical approaches across the organization. At Bayer the Biostatistics Innovation Center (BIC) is a novel approach to stay abreast of innovational statistics in their R&D projects. It introduces an alternative organizational approach compared to a dedicated statistical research group. The session will demonstrate different working models and explain how they stand up to current challenges in modern pharmaceutical statistics. Each working model is a result of the companies environmental structure and its needs and shows efficient and successful ways of fostering innovation in pharmaceutical statistics. There will be three presentations including brief discussions with max 30 min each. The BIC presentation from Bayer will be shared by two speakers. Extensive discussion can be deferred to the planned subsequent panel discussion organized by Frank Bretz with heads from several statistic functions of the pharmaceutical industry. 44

47 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS033. Statistical innovation & therapeutic area methodology - strengthening the applied statistical science in clinical drug development Lomp H.-J. Boehringer Ingelheim Pharma GmbH & Co KG., Biberach, Germany At Boehringer Ingelheim (BI), the starting point for a specialized statistical methodology team has been the introduction of Therapeutic Area Statisticians in the early 2000's to harmonize the statistical methodology in clinical development and to ensure statistical input into strategic project portfolio decisions. Recently BI has added to this a dedicated small team of Methodology Statisticians (M-STATs) to form a Statistical Innovation & TA Methodology (SITAM) team with global mandate. The M-STATs are supposed to act as innovation drivers for statistical project/trial work. Their initial focus is on broad, company-wide implementation of an internal statistical Go/NoGo decision framework, Bayesian methods for early clinical development, missing data handling and adaptive designs. M-STATs evaluate new approaches and lead their implementation into daily practices via statistical position papers, prototype software production and by training and consulting. In addition M-STATs also coordinate the collaboration with external academic experts and software companies. This presentation will show the contributions and challenges of implementing the BI approach and its impact on the role of clinical statistics within a globally operating pharma research company. TCS033.2 Statistical Innovation - what does it mean and how is it implemented? Wright D. AstraZeneca, Head, Statistical Innovation, Advanced Analytics Centre, Cambridge, United Kingdom At AstraZeneca the Statistical Innovation group in the AAC is a global team which provides advanced methodological and analytical support to all therapeutic area teams. It also drives change within the organisation, enabling the adoption of innovative statistical approaches across the organisation. In the AAC, Statistical Innovation is considered to be the development of practical methodological solutions to problems faced by the organisation. Hence the group has to balance investigating new statistical methods with reacting quickly to real problems faced in drug development programmes. Once solutions have been found it is important to consider if these solutions are likely to be useful to other teams who face similar or related problems. This evaluation guides the communication strategy required to help the rapid implementation of the innovative approaches as widely as appropriate across the organisation. This presentation will give some examples of solutions that are relevant to more than one therapeutic area and will also highlight areas where more methodological work is required. TCS033.3 Biostatistic Innovation Center (BIC) - an all-inclusive approach Kulmann H., Muysers C. Bayer Pharmaceuticals Statistics, Berlin, Germany The Biostatistics Innovation Center allows all statisticians at Bayer to participate in the implementation of innovations in drug development. There is a clear distinction between an invention and an innovation. BIC focusses on the introduction and application of innovative new statistical methods into drug development rather than the pure invention of statistical methods. It is often an aspiration to do this but unfortunately rarely done in the statistical functions of the pharmaceutical industry. BIC established a few years ago at Bayer the framework as a virtual approach embracing individual innovation groups. Every statistician has the freedom to establish a BIC innovation group and other statisticians have the opportunity to join such a group. The new innovation groups present their topics to an established steering committee and report the status, in particular when the work is completed. They consequently share it transparently to all statisticians in the company. The presentation will show the implementation of BIC to date and will provide examples of achievements of BIC. These examples describe the different innovation approaches, such as the types of statistical areas, the constitution of such working groups, and the various objectives. TCS034 Historical Data for Confirmatory Prospective Clinical Trials a Contradiction? (Part 2) Wed :30-3:00 Lecture Hall HS 3 Chair(s): Hans Ulrich Burger, Marcel Wolbers Panelist(s): David Dejardin, Michael Grayling, Franz König, Dominic Magirr, Stephen Senn, Simon Wandel Discussant(s): David Wright It has taken several decades to establish the wide-spread acceptance and use of randomized clinical trials as the gold standard of medical research. However, today single arm trials gain again increased popularity, especially for phase 2 trials. Reasons for this popularity of single arm trials may include the wider availability of large real-world and historic control data sets and an increasing lack of acceptability of randomization among patients. A large body of literature on the benefits and risks of such non-randomized trials exists and the frequent recommendation is the use of randomization as a key principles to control bias in all but a few specific situations. This session will have a strong educational focus, providing on one hand high level information when a single arm trial may make sense and on the other hand will compare single arm versus randomized trials in the context of a whole development program. The session will end with a panel discussion. TCS034. Beyond RCTs - using historical data in pivotal clinical trials König F., Bretz F. 2, Posch M. Medical University of Vienna, Center for Medical Statistics, Informatics, and Intelligent Systems, Vienna, Austria, 2 Novartis Pharma AG, Basel, Switzerland Randomized controlled trials (RCTs) became the standard for establishing the efficacy of a new treatment. But how to generate sufficient evidence for drug approval if fully powered RCTs are not feasible, as for example, when there is a high unmet medical need, in rare diseases or the development of personalized therapies. Recently, data transparency initiatives have re-shaken the landscape of medical research by granting access to raw data (on individual patient level). This opens new opportunities and raises the question whether to incorporate historical data more prominently in drug development when determining efficacy for new medicines in difficult situations. We propose a new framework for evidence generation, which we call "threshold-crossing." This framework leverages the 45

48 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed wealth of information that is becoming available from completed RCTs and from real world data sources. Relying on formalized procedures, information gleaned from these data is used to enable efficacy assessment of new drugs for carefully selected situations. We will discuss the benefits and caveats of "threshold-crossing" compared to more traditional approaches in terms of type I error rate, power and sample sizes. TCS034.2 Do single-arm trials have a role in drug development plans incorporating randomized trials? Grayling M., Mander A. MRC Biostatistics Unit, Cambridge, United Kingdom Often, single-arm trials are used in phase II to gather the first evidence of an oncological drug's efficacy, with drug activity determined through tumour response using the RECIST criterion. Provided the null hypothesis of 'insufficient drug activity' is rejected, the next step could be a randomised two-arm trial. However, single-arm trials may provide a biased treatment effect because of patient selection, and thus, this development plan may not be an efficient use of resources. Therefore, we compare the performance of development plans consisting of single-arm trials followed by randomised two-arm trials with stand-alone single-stage or group sequential randomised two-arm trials. Through this, we are able to investigate the utility of single-arm trials and determine the most efficient drug development plans, setting our work in the context of a published single-arm non-small-cell lung cancer trial. Reference priors, reflecting the opinions of 'sceptical' and 'enthusiastic' investigators, are used to quantify and guide the suitability of single-arm trials in this setting. We observe that the explored development plans incorporating single-arm trials are often non-optimal. Moreover, even the most pessimistic reference priors have a considerable probability in favour of alternative plans. Analysis suggests expected sample size savings of up to 25% could have been made, and the issues associated with single-arm trials avoided, for the non-small-cell lung cancer treatment through direct progression to a group sequential randomised two-arm trial. Careful consideration should thus be given to the use of single-arm trials in oncological drug development when a randomised trial will follow. TCS035 Challenges in Planning Time to Event Studies Tue :30-6:00 Lecture Hall KR 7 Chair(s): Mike Branson, Hans Ulrich Burger There are still today many challenges in planning time to event studies despite the fact that time to event studies are backbone study designs in many therapeutic areas. Such challenges often go across therapeutic areas and deserve to be discussed. Such challenges include non-proportional hazard assumptions, competing risk situations, post study therapies, drop outs, mid-trial reassessment of planning assumptions. The session will discuss some of these issues in detail. TCS035. Sample size re-estimation for long term time to event trials: a case study and practical considerations Hughes G., Chakravartty A. 2, Li X. 3, Mondal S. 3, Mukhopadhyay P. 3 Novartis Pharma AG, Basel, Switzerland, 2 Novartis Healthcare Private Limited, Hyderabad, India, 3 Novartis Pharmaceuticals Corporation, East Hanover, United States Sample size estimation is an important step in the design of a clinical trial. It is usually determined by statistical power requirements, a pre-specified treatment difference under the alternative hypothesis and any other nuisance parameters. However in some situations these pre-specified assumptions may be inadequate and may need to be revised based on emerging data from the ongoing trial and/or additional external information available. Adaptive methods such as sample size re-estimation can be useful to re-assess the sample size of a trial in order to ensure adequate power in such scenarios. In this talk we present a case study with a long term time-to-event endpoint in which an unblinded sample size re-estimation is incorporated into the trial design. Using this example, the performance of the Cui, Huang and Wang method and the Promising Zone method are assessed under different assumptions of the hazard function and different adaptation strategies with regard to timing and size of the adaptation. In addition, operational perspectives such as enrollment of new patients, duration of follow-up and trial integrity implications will also be discussed. TCS035.2 On some survival analysis methods that can be complementary and potentially alternatives, to the gold-standard Cox model and log-rank test statistic Leon L. Genentech, Biostatistics, San Carlos, United States While the Cox proportional hazards (PH) model and log-rank test statistic are the gold-standard (and for good reason) approaches for primary analyses, there has been a plethora of recent methods that can complement, if not serve as alternatives, to these standards. In particular, when treatment effects evidently vary across time, it is challenging to summarize their impact by a single summary statistic. Depending on the implications of the time dependence it is not clear whether the log-rank statistic (which does not assume, but is most powerful under PH) or the Cox model estimate is ideal for inference. Alternative time-dependent weighting schemes have been extensively studied for the log-rank test and Cox model. For example, later survival differences can be weighted more heavily as in the Gehan-type weighted log-rank test. Weighted log-rank tests are well known, however their weighted Cox model analogs seem less applied. In addition, accelerated failure time (AFT) models are available where semi-parametric inference procedures have been considerably advanced. The AFT model which models the survival time directly as opposed to the hazard function can be a useful complement (perhaps alternative) to the Cox model for its more intuitive interpretation. In this talk we provide an overview of the following: Weighted Cox models and log-ranks tests; Weighted log-rank AFT models; Restricted mean survival time comparisons; and Quantile regression models. We evaluate their performance in detecting treatment effects in settings where therapies may have delayed effects with focus on patterns exhibited in recent immunotherapy cancer trials. 46

49 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS035.3 MIRROS: planning a phase 3 trial with time-to-event endpoint, a cure proportion and a futility interim analysis using response Monnet A., Rufibach K. 2 F. Hoffmann-La Roche Ltd, Biostatistics, Basel, Switzerland, 2 F. Hoffmann-La Roche Ltd, PDBB, Basel, Switzerland With median overall survival (OS) of about six months and no approved drug for more than forty years, the unmet medical need in acute myeloid leukemia is dramatic. Idasanutlin is an MDM2 antagonist that can effectively displace p53 from MDM2 to restore p53 function, leading to cell cycle arrest and apoptosis of cancer cells. Planning the Phase 3 trial MIRROS comparing Idasanutlin + standard of care against the standard of care presented with the following challenges: () To survive AML a patients needs to become eligible for a bone marrow transplant, trough achieving a complete response (CR) after induction therapy. Planning the trial thus needs to account for a cure proportion in both, the treatment and control arm. (2) MIRROS was planned based on Phase data only. To mitigate the risk of directly moving to Phase 3, a futility interim was built in the design, using gates on the odds ratio for CR and event-free survival. The interim analysis was built-in the design using a mechanistic simulation model, making assumptions on response proportions, proportion of transplant survivors, and OS in these various groups. The talk describes the design in detail, discusses sample size planning, exploration of operating characteristics of the futility interim analysis. We conclude with sharing feedback from US and European Health Authorities on the design. TCS037 Adverse Event Analyses: Is It Time for New Standards? Wed :30-8:00 Lecture Hall HS 3 Chair(s): Claudia Schmoor Panelist(s): Ralf Bender, Tim Friede, Frank Pétavy There is a strange methodological gap between the analysis of efficacy outcomes in clinical trials with a time-to-event outcome and safety analyses. For the former, the use of survival analysis is firmly established, while the latter often falls back on rather simplistic and inappropriate statistics. The issue is further complicated by adverse events always being subject to competing risks and often being recurrent. Competing risks render Kaplan-Meier curves inappropriate, recurrent events go beyond standard time-to-first-event analyses. The aim of this session is to bring together colleagues from academia, industry and regulatory agencies to discuss statistical methods and case studies. TCS037. Survival analysis is neither about efficacy nor about survival functions only Beyersmann J. Ulm University, Institute of Statistics, Ulm, Germany The use of survival analysis is mandatory when data are censored. The incidence proportion, i.e., the number of patients with an observed adverse event (AE) divided by group size, is commonly used in safety analyses, but underestimates a patient s absolute AE risk as a consequence of censoring. Another common workhorse is the incidence rate, which accounts for varying follow-up times by dividing by the person-time at risk. However, if naively translated into a probability statement, it will overestimate a patient s absolute AE risk, as will Kaplan-Meier curves for AE outcome. We will review common methods to analyze safety in clinical studies with varying follow-up times, demonstrate why such analyses must account for competing risks and how the methods of choice extend to recurrent AEs. This talk will not be very mathematical, emphasizing concepts and relying on rather simple calculations. Key issues will be demonstrated in a real data example. TCS037.2 Adverse events in time-to-event trials: challenges beyond submission from an industry perspective Voß F. Boehringer Ingelheim Pharma GmbH & Co. KG, Statistics EU, Ingelheim am Rhein, Germany Adverse events are routinely collected in clinical trials to assess the safety of investigational drugs. The data is used for different purposes, e.g. to show that the risk-benefit ratio is positive to support approval and more and more also to assess the added benefit within reimbursement dossiers. Issues arise e.g. in oncology where clinical trials are often designed to show differences in progressionfree survival (PFS) and improved PFS results in differential follow-up times and censored observations for adverse events. The differential follow-up complicates the analysis of further endpoints, including the comparison of adverse events between treatment arms. In this talk, we will discuss these challenges and different methods for the analysis of adverse events using real examples from oncology with focus on the early benefit assessment. TCS037.3 Exposure adjusted incidence rates when hazards vary over time Dunger-Baldauf C. Novartis Pharma AG, Biostatistics and Pharmacometrics, Basel, Switzerland In a clinical study or a meta-analysis, the safety of therapeutic interventions is often compared by evaluating proportions of patients with specific safety events. Bias can arise if censoring or exposure patterns differ between the therapeutic interventions(e.g. due to to a higher rate of discontinuation in one group). Exposure adjusted incidence rates are commonly used as measures which are intended to adjust for these differences. Assuming random censoring, a constant hazard over time and homogeneity across patients per therapeutic intervention, these can be interpreted as maximum likelihood estimates of the hazards in exponential models. However, the model assumptions might not apply in clinical practice. We discuss the impact of non-constant and possibly heterogenous hazards on comparisons based on exposure adjusted incidence rates, and investigate the extent of model deviations which could lead to controversial conclusions. We consider models which take important patient characteristics and non-constant hazards over time into account, illustrated by clinical data. We finally propose a generalised concept of an exposure adjusted incidence rate to hazards which may vary over time. 47

50 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS038 The Assessment of Probability of Success for Clinical Trials Thu :30-6:00 Lecture Hall KR 9 Organizer(s): Hans Ulrich Burger, Kaspar Rufibach, Marcel Wolbers Chair(s): Hans Ulrich Burger, Marcel Wolbers Discussant(s): Ben Saville In modern pharmaceutical drug development and, increasingly, in trials initiated by collaborative academic groups, an assessment of the probability that a certain target effect will be achieved in a clinical trial, taking into account available information on the experimental therapy and the effect of interest, is of critical importance. Generally, these probabilities are calculated using the concept of Bayesian predictive power, or assurance. The speakers will critically discuss aspects of Bayesian predictive power for various scenarios, such as clinical development programs with a biomarker component, prior elicitation for assurance computations, and how to implement and update assurance throughout the lifecycle of a large clinical development program. TCS038. TCS038: leveraging expert knowledge and assurance methods to better estimate probability of success in drug development Dallow N. GlaxoSmithkline, Clinical Statistics, Middlesex, United Kingdom Since 204, GlaxoSmithKline (GSK) has been using formal prior elicitation methods to support internal decision making and analysis in drug development. Prior elicitation is used to enable quantification of existing knowledge about an asset in the absence of directly relevant data. In this talk, I will provide a brief introduction to the methods that GSK have been using for prior elicitation, and discuss some of the benefits and challenges of embedding this process within a large pharmaceutical company. I will also gives some examples of how the elicited priors have been used at GSK, e.g. to quantitatively choose between competing clinical trial designs for the next stage of drug development, to explore staged development activities and to determine the merits of interim/futility assessments. TCS038.2 Probability of success after exploratory biomarker analysis Götte H., Kirchner M. 2, Kieser M. 2 Merck KGaA, Darmstadt, Germany, 2 Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Germany During the development of targeted therapies it is common practice to look for predictive biomarkers. The consequence is that the trial population for phase III is often selected based on the most extreme result from phase II biomarker subgroup analyses. In such a case there is a tendency to overestimate the treatment effect. We investigate whether the overestimation of the treatment effect estimate from phase II is transformed into a positive bias for the probability of success for phase III (Bayesian predictive power). We simulate a phase II/III development program for targeted therapies and observe that selection increases the estimated probability of success. The estimated probability of success after subgroup selection can be expected to be greater than 80% or 90% while the true probability of success is rather around 60% on average. Thus, this overestimation can lead to overoptimistic assumptions when conducting a phase III in the selected subpopulation. Consequently, for practical application a correction is needed. We show how Approximate Bayesian Computation (ABC) techniques can be used to derive a simulation-based bias adjustment method in this situation. This adjustment method is applicable for any transformation/summary of the posterior distribution of the treatment effect like probability of success, posterior probabilities or treatment effect estimates. TCS038.3 Evolvement of assurance over the lifecycle of several Phase 3 trials in a large oncology clinical development program Rufibach K. Methods, Collaboration, and Outreach Group, Department of Biostatistics, F. Hoffmann-La Roche Ltd., Basel, Switzerland Obinutuzumab is a second-generation anti-cd20 antibody targeted to improve outcome in three major lymphoma indications. In 20, Roche launched a full development program for Obinutuzumab with four randomized Phase 3 trials, all with progression-free survival (PFS) as primary endpoint. Each of these trials compared Obinutuzumab plus an indication-specific chemotherapy against the standard of care, Rituximab plus the same chemotherapy. At the time, only a preliminary analysis of a randomized Phase 2 study in one of the indications comparing monotherapy only and assessing response instead of PFS was available. We discuss how an initial assessment of Bayesian Predictive Power (BPP) was put together in this setup, by synthesizing information on the response-pfs association and prior assumptions. We further illustrate how this initial BPP was then updated (or not) at key study or program milestones such as readout of one of the trials in another indication or not stopping a trial at a futility or efficacy interim analyses. BPP is generally used to inform the probability of success, a number with wide-ranging impact in our company such as valuation of the pipeline, gating of decisions by senior management, planning manufacturing capacity, or allocation of resources. We conclude by sharing experiences and recommendations on how to manage and communicate PoS updating throughout an entire development program. TCS042 Statistical Strategies and Methods in Vaccine Development Wed :30-:00 Lecture Hall KR 9 Organizer(s): An Vandebosch Chair(s): Jose Pinheiro Discussant(s): Fabian Tibaldi In vaccine development, identifying vaccine-induced immune response that predicts protection from infection can facilitate and guide further decision-making. If such an immune marker would be measurable and validated, it could guide differentiating among candidate vaccine regimens and planning/designing future trials of potential vaccine candidates. Pre-clinical challenge studies as well as clinical efficacy studies where both infection and immune responses are measured, allow for exploration to identify such markers. The primary aim for a novel vaccine candidate is to predict potential vaccine protection. Immune responses that correlate with protection could support that evaluation, for example in the absence of the ability to evaluate vaccine protection. The talks proposed for this session cover some important statistical challenges in vaccine development. The session aims to illustrate common features across vaccine 48

51 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed development programs including the development of Go/No Go decision criteria to evaluate the potential for sufficient vaccine efficacy, strategies to model immunological data and to introduce novel approaches for data analyses. Case studies of real vaccine trials from Janssen and GSK will be used to illustrate the different issues and methods discussed in the session. They provide examples of situations in which different statistical rules were used to guide the study team and senior management in study planning and decision making. The lessons and methods from the case studies can easily be extended to any other vaccine trial, independent of indication. TCS042. Partial bridging of vaccine efficacy to new populations Luedtke A., Gilbert P. Fred Hutchinson Cancer Research Center, Seattle, United States Suppose one has data from one or more completed vaccine efficacy trials and wishes to estimate the efficacy in a new setting. Often logistical or ethical considerations make running another efficacy trial impossible. Fortunately, if there is a biomarker that is the primary modifier of efficacy, then the biomarker-conditional efficacy may be identical in the completed trials and the new setting, or at least informative enough to meaningfully lower bound this quantity. Given a sample of this biomarker from the new population, we wish to bridge the results of the completed trials to estimate the vaccine efficacy in this new population. Unfortunately, even knowing the new population s true conditional efficacy fails to identify the marginal efficacy due to the unknown baseline risk, i.e. the biomarkerconditional probability of infection or disease incidence among unvaccinated individuals. We define a curve that partially identifies (lower bounds) the marginal efficacy in the new population as a function of the population's marginal unvaccinated risk, under the assumption that one can identify bounds on the new population s baseline risk. Interpreting the curve only requires identifying plausible regions of the new population s marginal unvaccinated risk. We present a nonparametric estimator of this curve and develop valid lower confidence bounds. As a validation exercise, we use data from three completed dengue vaccine efficacy trials to bridge efficacy from each pair of trials to the third trial, pretending that the clinical endpoint was not measured in the third trial. The method effectively provides robust lower bounds on vaccine efficacy. TCS042.2 Statistical modelling to assess vaccine response: case study GSK Callegaro A., Fabian T. GSK Vaccines, Rixensart, Belgium Statistical models are often used in vaccine development to evaluate the protection against a specific disease or pathogen. In particular the so-called correlates of protection can be useful for this kind of evaluations. In this talk we present a case study and we propose a method for the evaluation of correlates of protection in case of large vaccine efficacy. Mixed-effects model has been proposed for the meta-analytic analysis of surrogate markers (Buyse et al. 2000). However, in certain situations the model does pose computational problems. Furthermore, the model can be particularly complex when outcomes and biomarkers are of different nature such as in the case of the Zoster vaccine where the outcome is time-to-event and there is a continuous biomarker. To overcome this type of problems some simplifications have been proposed, such as the bivariate meta-analysis of estimated trial-specific treatment effects (Tibaldi et al. 2003). In this work we consider this simplified approach in the particular case where the vaccine efficacy is close to 00% (this has been observed in one of the GSK zoster vaccines). In this case it can be problematic to estimate the trial-specific treatment effects (and their variability) because of the lack of information in the treatment group. As a solution we consider the penalized likelihood approach (Firth, 993) and we show its performance by simulations. References: Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., & Geys, H. (2000). The validation of surrogate endpoints in meta-analysis of randomized experiments. Biostatistics,, Firth D (993). Bias reduction of maximum likelihood estimates. Biometrika 80, Tibaldi, F., Abrahantes, J. C., Molenberghs, G., Renard, D., Burzykowski, T., Buyse, M., Parmar, M., et al., (2003). Simplified hierarchical linear models for the evaluation of surrogate endpoints. Journal of Statistical Computation and Simulation, 73, TCS042.3 Developing Go/No Go decision criteria to evaluate the potential for sufficient vaccine efficacy for a novel vaccine candidate Vandebosch A., Nijs S., Tolboom J. 2, Wegmann F. 2, Hendriks J. 2 Janssen Research & Development, Beerse, Belgium, 2 Janssen Vaccines & Prevention, Leiden, Netherlands Prior to embarking on a phase 2b efficacy study, Go/No Go decision criteria were developed to evaluate whether the candidate vaccine regimen has the potential to demonstrate sufficient vaccine efficacy. Results from a preclinical efficacy study were used as basis for these criteria. The ultimate aim was to select a candidate from the regimens under evaluation for safety and immunogenicity in the phase 2a study and at the same time, decide as to whether or not the potential for vaccine protection was sufficiently high for the selected regimen. The team was faced with several statistical challenges: ) the available data from a preclinical challenge experiment were scrutinized to explore and identify the relevant immune markers correlating with protection; 2) in turn predict sufficiently accurate the effect of the vaccine 3) develop statistical decision criteria to assess the magnitude of values required to achieve sufficient potential for an efficacious regimen and 4) bridge the results from the animal experiments to humans under biological and statistical assumptions in order to select a regimen in the phase 2a study. Establishing decision criteria was critical for the funders as to whether or not proceed with the efficacy design. In this presentation we aim to illustrate how the team tackled the scientific questions and statistical challenges. Furthermore, we will show evaluation of the decision criteria through simulation of the operating characteristics for various scenarios and finally, its application to the data. Crossfunctional internal and external collaboration with various stakeholders was crucial for the success. 49

52 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS043 Statistical Challenges in the Development of Biosimilars Thu :30-:00 Lecture Hall KR 9 Organizer(s): Dominik Heinzmann, Dejun Tang Chair(s): Stephan Lehr Many innovative biologics that have revolutionized patient care have or will soon loose data exclusivity. Biosimilars are products which are highly similar to the originator biologics in terms of analytical, non-clinical and clinical characterizations, and which are expected to be an affordable alternative for patients. Tailored regulatory pathways have been put in place globally to allow for an abbreviated clinical development which differs from the traditional new drug development. The differences of the developments between biosimilars and innovator biologics are discussed, with focus on extrapolation, sensitive populations and endpoints, as well as interchangeability. Extrapolation is the projection of PK/PD, safety and efficacy data from one indication into other indications where no or only limited data has been collected. Extrapolation is possible if appropriate scientific justification is provided. This includes elements such as: Similar mode of action across indications, using very sensitive populations and endpoints, and a high quality design and conduct of the comparative trials. Statistical approaches to determine sensitive populations and endpoints will be discussed and supported by real case examples. Interchangeability is the possibility of a biological product substituted by a biosimilar without the intervention of the health care provider that the same efficacy and safety can be expected in each patient. Potential trial designs and associated statistical challenges to establish interchangeability will be discussed. TCS043. Extrapolation and determination of sensitive populations and endpoints in the development of biosimilars Liao S., Hua S. 2, Jin B. 3 Pfizer, Statistics, Shanghai, China, 2 Pfizer, Statistics, La Jolla, United States, 3 Pfizer, Statistics, Cambridge, United States Biosimilars are biologics which are highly similar to an originator product in terms of analytical, non-clinical and clinical characterizations. There are tailored regulatory pathways in place for their development. A key concept is the possibility to extrapolate data and results from one indication to other approved indications of the reference product if appropriately scientifically justified. Such a scientific justification includes elements such as similar mode of action across the indications, use of highly sensitive populations and endpoints, and properly designed and conducted trial. The talk will focus on the determination of sensitive populations and endpoints for cancer therapies to mitigate the residual risk in extrapolation across indications. TCS043.2 Challenges and opportunities of interchangeability in biosimilar development Tang D. Novartis-Sandoz, Holzkirchen, Germany Biosimilar development would provide patients options to benefit from highly similar products with reasonable costs. Interchangeability is the possibility of a biological product substituted by a biosimilar without the intervention of the health care provider that the same efficacy and safety can be expected in each patient. This talk will present the overall concept of interchangeability and its regulatory pathways. The challenges and opportunities of the interchangeability will be discussed with some examples in this area. TCS043.3 Extrapolation and determination of sensitive populations and endpoints in the development of biosimilars Heinzmann D. Roche, Biostatistics, Basel, Switzerland Biosimilars are biologics which are highly similar to an originator product in terms of analytical, non-clinical and clinical characterizations. There are tailored regulatory pathways in place for their development. A key concept is the possibility to extrapolate data and results from one indication to other approved indications of the reference product if appropriately scientifically justified. Such a scientific justification includes elements such as similar mode of action across the indications, use of highly sensitive populations and endpoints, and properly designed and conducted trial. The talk will focus on the determination of sensitive populations and endpoints for cancer therapies to mitigate the residual risk in extrapolation across indications. TCS043.4 A regulator s view on statistical issues in analytical biosimilarity assessment Brandt A. BfArM, Bonn, Germany The demonstration of analytical similarity to the reference product with regard to quality attributes (QAs) is fundamental for the demonstration of biosimilarity, requiring a biosimilarity exercise including a quantitative comparison to the reference. Although a formal statistical comparison is not a mandatory regulatory requirement in the EU, several approaches based on various statistical concepts have been proposed to support analytical biosimilarity. Commonly used approaches are variations of methods that derive a reference range based on a sample of reference batches and conclude on similarity based on the coverage of the biosimilar batches by this reference range. Alternatively, equivalence testing aiming to show similarity of parameters characterizing the distributions (such as means) has been proposed. However, in first place, it needs to be decided what constitutes true analytical similarity for the QAs of interest, before well-grounded and appropriate statistical methods can be chosen to conclude on similarity based on the limited number of available reference and biosimilar batches. In order to stipulate the discussion on adequate statistical methods for similarity assessment, it was decided to prepare an EMA draft Reflection paper on statistical methodology for the comparative assessment of quality attributes in drug development summarizing the current regulators thinking in the EU. Key issues of the reflection paper will be presented, and a regulatory view on the currently applied methods and the challenges in the assessment of criteria for the statistical comparison of analytical similarity will be given. 50

53 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS044 Statistical Innovation Panel Discussion Tue :30-8:00 Lecture Hall HS Organizer(s): Frank Bretz Chair(s): Byron Jones Panelist(s): Nicky Best, Hans Ulrich Burger, Andrew Grieve, Joseph Heyse, Hermann Kulmann, Yi Liu, Sandeep Menon, Walt Offen, Jose Pinheiro, Stephen Ruberg, David Wright Statistical innovation is critical to many areas of drug development. Meanwhile, with the rapid progressions of modern science and technology, many new challenging questions have been brought up to create new opportunities for further statistical innovations. In this panel discussion, statistical leaders in the pharmaceutical domain will offer their views on the strategic directions in order to advance good practices in drug development. Some directions covered by this panel discussion include emerging trends of statistical innovation in drug development, barriers for innovation and how to overcome those, bringing leadership and innovation together in order to foster an environment that boosts creative thinking and how to create a collaborative interface with strong scientific focus across all main stakeholders in drug development (industry, regulators, and academia)? TCS045 Ranks and Pseudo-Ranks: Nonparametric Effect Measures in Biomedical Research Fri :30-2:00 Lecture Hall HS 4 Organizer(s): Edgar Brunner Chair(s): Arne Bathke Rank methods are established procedures in biostatistical research since many decades. Recent developments recommend the use of the so-called pseudo-rank methods to avoid paradoxical results in several sample or factorial designs in case of unequal sample sizes. This problem is important in sub-group analysis in clinical trials or clinical epidemiology with ordinal endpoints. It is well-known that rank procedures in repeated measures designs may lead to quite conservative decisions for a larger number of time points. The reason is that the estimator of the df of the limit?2-distribution is biased. This can be improved by using U-statistics based on quadratic and bilinear forms of the ranks (or pseudo-ranks). Optimal sample size planning for the WMW-test has been considered recently under some restrictive assumptions such as data without ties and symmetric distributions. As in practical biostatistical research these assumptions are rarely met in particular for bounded outcome scores this result should be improved. Moreover, it seems more natural to consider the dual problem of minimizing the total sample size by an optimal allocation of the sample sizes to the two samples. The asymptotic equivalence theorem enables a deeper insight in the problem and identifies the key points of the solution. All contributions are motivated by real data sets or trials in biomedical research and are applied to practical examples. Moreover, software for the required computations is provided. TCS045. Sample size calculation for the Wilcoxon-Mann-Whitney test Happ M., Brunner E. 2, Bathke A. University of Salzburg, Department of Mathematics, Salzburg, Austria, 2 Universitätsmedizin Göttingen, Department of Medical Statistics, Göttingen, Germany Assuming a given type-i error rate and power to detect a specified relative effect, we want to minimize the overall sample size needed for testing the null hypothesis H 0 F of equal distributions of two groups with the Wilcoxon-Mann-Whitney test. In a recently published paper it has been shown for symmetric, continuous and stochastically ordered distributions that the power is maximized by using equal sample sizes and it was shown in simulations for general distributions that the ratio of the sample sizes does not necessarily depend on the ratio of the variances of the underlying distributions. We prefer minimizing the sample size instead of maximizing the power as the former arises more often in practice but both approaches will lead to the same answer. In this talk we do not assume continuous distributions. To achieve this for arbitrary alternatives, prior information is needed about the distribution in the first group, e.g. from a previous study. The distribution of the second group can either be generated based on a given effect suggested by a researcher or also be given as prior information. We will give details about minimizing the overall sample size by using a normal approximation and choosing the optimal proportion of sample sizes under these assumptions. Furthermore we show that the optimal design mainly depends on the ratio of the variances of the asymptotic normed placements and the chosen power. TCS045.2 Power of the Wilcoxon-Mann-Whitney test for non-inferiority in the presence of deathcensored observations Schmidtmann I., Konstantinides S. 2, Binder H. University Medical Center Johannes Gutenberg-University, Institute for Medical Biostatistics, Epidemiology and Informatics, Mainz, Germany, 2 University Medical Center Johannes Gutenberg-University, Center for Thrombosis and Hemostasis Mainz, Mainz, Germany We consider the situation of a clinical trial with the goal to establish non-inferiority of a new treatment compared to a standard treatment. The primary endpoint is assumed quantitative, but the probability of a fatal outcome is non-negligible thus censoring by death may occur if patients die before the quantitative outcome can be determined. Excluding censored patients is likely to introduce bias. Felker and Maisel [] have suggested a global rank endpoint for the situation where superiority is to be demonstrated for a quantitative endpoint in the presence of censoring by death. Matsouaka and Betensky [2] provide a formal description and present power and sample size calculations. Without loss of generality, they assume that high values of the quantitative endpoint are favourable. Let N the number of patients in the study of whom m patients have died. Then the ranks to m are assigned to those patients who have died, the surviving patients have ranks m+ to N according to their values of the quantitative endpoint. Using these rank scores, the Mann- Whitney U statistic is computed. We apply this idea to the non-inferiority situation as described in [3]. Using the fact that the Mann-Whitney U statistic follows an asymptotically normal distribution and applying the Matsouaka and Betensky formulas for mean and variance of the U statistic, we derived a formula for the power of the Wilcoxon-Mann-Whitney test for non-inferiority in the presence of death-censored observations. We present an application to planning a study in pulmonary embolism and assess the precision of the formula in a simulation study. [] Felker GM, Maisel AS. Circ Heart Fail. 200;3:

54 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed [2] Matsouaka RA, Betensky RA. Statist. Med. 205, [3] Wellek, S Testing Statistical Hypotheses of Equivalence and Noninferiority, 200 CRC Press TCS045.3 Mann-Whitney type effects in factorial designs - ranks and pseudo-ranks Brunner E. University of Göttingen, Medical Statistics, Göttingen, Germany If rank methods are used for d > 2 samples then paradoxical results may be obtained in case of unequal sample sizes. The reason is that the quantities on which these procedures are based depend on sample sizes through the definition of the weighted mean distribution function H of the distributions in the experiment. For the same set of alternatives either a highly significant or a completely non-significant p-value may be obtained just be altering the ratios of the sample sizes while keeping fixed the total sample size N. This undesirable property applies to the well-known Kruskal-Wallis (KW) test, the Hettmansperger-Norton (HN) test and the Jonckheere- Terpstra (JT) test for trend as well as for the Akritas-Arnold-Brunner (AAB) procedures (997) in factorial designs. While in the one-way layout such paradoxical results can be obtained by a so-called non-transitive set of distributions it can be demonstrated in a two-way layout that even for small sample sizes similar paradoxical results can be obtained by applying the rank procedure to shifted normal distributions. In this case interactions may appear or disappear just by changing the ratios of the samples sizes. In biomedical research, such designs appear in the so-called sub-group analysis in clinical trials or in clinical epidemiology. Here the question is to examine whether an effect (or non-effect) detected in a large trial is also present (or non-present) in a small sub-group. The problem of obtaining paradoxical results can be solved by using the unweighted mean distribution G of the distributions in the experiment. The resulting effects can be estimated by the so-called pseudo-ranks of the observations. Also hypotheses based on these quantities can be reasonably formulated and confidence intervals can be computed. The R-package rankfd performing the computations can be downloaded from CRAN. TCS045.4 Nonparametric ANOVA test for repeated-measures designs Gao X., Konietschke F. 2 York University, Toronto, Canada, 2 University of Texas at Dallas, Department of Mathematical Sciences, Dallas, United States In this talk, we consider the nonparametric inference problem for repeated measurement factorial design. In Brunner et al. (2009), ANOVA type of statistics have been proposed and approximate distributions of quadratic forms in repeated-measures designs have been investigated. We propose to extend the method in Brunner et al (2009) to factorial designs with repeated measurements and estimate the degree of freedom of the ANOVA test statistic using quadratic and bilinear forms based on rank statistics. TCS047 Design and Analysis of Multi-Regional Clinical Trials Tue :30-8:00 Lecture Hall HS 2 Chair(s): Jie Chen Discussant(s): William Wang Multi-regional clinical trials (MRCTs) are attractive because they not only bring potential efficacious treatment early to patients worldwide, but also allow assessment of drug effect among diversified populations. However, acceptance of using MRCT data in New Drug Application (NDA) varies from agency to agency or region to region. In addition, some challenges still remain pertaining to the design, analysis and data interpretation of MRCTs, e.g., assessment of treatment effect for a target ethnic (TE) population in a particular region or country with a small sample size in an MRCT. This session features presentations on the recent development in the design and analysis of multi-regional clinical trials. TCS047. Robust estimates of regional treatment effects in MRCT via semi-parametric modeling Tan M., Yuan A., Wang S., Zhou Y. Georgetown University, Department of Biostatistics, Bioinformatics and Biomathematics, Washington, United States Multiregional randomized clinical trials (MRCT) are increasingly common. Despite of randomization, there may be a differential treatment effect among different regions, potentially due to confounding region specific factors. Most current methods for the assessment of the consistency of the treatment effect between different regions or ethnic groups based on some subjectively specified model, which can be problematic, e.g., as in the BREEZE -3 trials for SERADA which the FDA advisory committee voted against approval. We propose a novel semi-parametric model which accounts for covariates potentially affect regional treatment effect and with non-parametric error distribution. So the estimates of regional treatment effects are robust with respect to model assumptions. The model is estimated by maximizing profile likelihood using EM. The profile likelihood ratio statistic is used to test the existence of regional treatment differences. We derived the asymptotic properties of the estimate and show that semiparametric model performs well by simulation. We then discuss applications to two clinical trials. This work is in collaboration with Yuan A, Wang S and Zhou YZ. TCS047.2 Interim analyses under discrete random-effects model in a multiregional clinical trial Tsou H.-H., Lan K.K.G. 2, Chen C.-T., Hsiao C.-F. National Health Research Institutes, Institute of Population Health Sciences, Miaoli County, Taiwan, 2 Janssen Pharmaceutical Companies of Johnson & Johnson, Raritan, United States When designing a trial, sample size is usually estimated based on limited information and uncertain model assumption. For improving the efficiency of clinical trials, sample size re-estimation or sample size re-distribution based on cumulative data in interim analysis is desired. In this presentation, we focus on the design of multi-regional trials (MRCT) and discuss the issue of combining evidence of regional treatment effects in MRCTs. We consider a discrete random effects model (DREM) to explain the heterogeneous treatment effects among regions. Some strategies for sample size re-estimation or sample size re-allocation are proposed to improve the efficiency of a clinical trial. 52

55 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS047.3 Practical recommendations for regional consistency evaluation and optimal study designs in multi-regional clinical trials Teng Z., Lin J., Zhang B. 2, Chang M. 3 Takeda Pharmaceuticals, Cambridge, United States, 2 Seqirus Pharmaceuticals, Cambridge, United States, 3 Veristat LLC, Southborough, United States In recent years, there is an increasing trend to conduct multi-regional clinical trials (MRCT) for drug development in pharmaceuticals industry. A carefully designed MRCT could be used in supporting the new drug's approval in different regions simultaneously. The primary objective of an MRCT is to investigate the drug's overall efficacy across regions while also assessing the drug's performance in some specific regions. In order to claim the study drug's efficacy and get drug approval in some specific region(s), the local regulatory authority may require the sponsors to provide evidence of consistency in the treatment effect between the overall patient population and the local region. In this talk, we first evaluate the consistency requirements in multi-regional clinical trials for different endpoints, i.e., continuous, binary and survival endpoints. We compare the different consistency requirements of the same endpoint/measurement if multiple consistency requirements are available and our recommendations for each endpoint/measurement will be made based on the comprehensive consideration. Second, we propose two optimal designs for MRCT which provide more effective sample size allocation to ensure certain overall power and probability of regional success for all regions of interest with minimal total sample size requirement. Third, an adaptive design on MRCT will be discussed if time permits. TCS048 Recent Development in Design and Analysis of Dose Finding Trials Fri :30-2:00 Lecture Hall HS 3 Chair(s): Jie Chen One of the most important tasks in clinical development of biopharmaceutical products is to determine its therapeutic windows within which the potential drug produces intended treatment effects (efficacy) but less toxic effect. Although many statistical designs and analyses methods have been proposed for this purpose, dose finding still remains an active area for research. This session features some new developments in the design and analysis of dose finding trials, including both parametric and non-parametric approaches that can easily be implemented in practice. TCS048. A Bayesian approach for determining dose-response relationships based on multiple models Gould L. Merck Research Laboratories, Upper Gwynedd, United States Approaches for identifying doses of a drug to carry forward in development usually address whether responses are related to doses, the doses whose responses differ from control responses, the functional form of the dose-response relationship, and the doses that should be carried forward. In fact, however, the actual functional form of a dose-response relationship may be unnecessary if a response distribution can be determined for any dose. The real objective is to determine if a dose-response relationship exists, regardless of its functional form, and, if so, to identify a range of doses to study further. We describe a Bayesian approach for addressing the issues using an estimation instead of a hypothesis-testing paradigm. Functions of realizations from the posterior distributions of the parameters of linear, generalized, and nonlinear regression models relating response to dose provide realizations from posterior and predictive distributions of quantities that address the key issues directly, in particular distributions of doses needed to achieve a specified response. Multiplicity adjustments are not required. A number of examples illustrate the application of the method. TCS048.2 A curve-free Bayesian decision-theoretic design for phase I trial considering both safety and efficacy endpoints Lu Y., Lee B. 2, Fan S. 3 Stanford University, Department of Biomedical Data Science, Stanford, United States, 2 California State University, San Jose, United States, 3 California State University, East Bay, Hayward, United States In this paper, we discuss a curve-free Bayesian decision model to selection biologically optimal dose (BOD) for phase I/II trials that considers both safety and biological activities. We started from a curve-free model based on Dirichlet priors and joint modeling of safety and biological activities. We assume that toxicity is a monotone increasing function with dose but efficacy may not be. Using a specified loss function, we can identify the dose that has acceptable toxicity rate while achieves highest biological activities. We further extend this model to partial ordered combination treatments. We demonstrate our method through simulations. TCS048.3 BOIN: a novel platform for designing early phase single-agent and drug-combination clinical trials Yuan Y. University of Texas MD Anderson Cancer Center, Biostatistics, Houston, United States We introduce Bayesian optimal interval (BOIN) designs as a novel platform for designing early phase single-agent and drugcombination clinical trials. The BOIN design is motivated by the top priority and concern of clinicians, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. The BOIN design is easy to implement in a way similar to algorithm-based designs, such as the 3+3 design, but is more flexible for choosing the target toxicity rate and cohort size and yields a substantially better performance that is comparable to that of more complex model-based designs. The BOIN design can handle both single-agent and drug-combination phase I trials, and be used to find a single or multiple maximum tolerated doses (MTD). The BOIN design has desirable statistical properties of being coherent and consistent. Web applications with intuitive graphical user interface are freely available at 53

56 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS048.4 A simple and efficient statistical approach for designing an early Phase II clinical trial Ting N. Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, United States There are many challenges in designing early Phase II clinical trials. One reason is that there are many unknowns at this stage of the development, and the other is that the size of the trial at this stage is limited, even though results from such a clinical trial could impact many important decisions and there are high risks associated with each decision. In this manuscript, an ordinal linear contrast test (OLCT) is recommended to help design an efficient early Phase II trial. Performance of the proposed method is compared with MCP- Mod, ANOVA F, and Max T. Results indicate that the performance of ANOVA F and Max T approaches is sub-optimal. OLCT can be comparable with MCP-Mod. In practical applications, OLCT is simple to use, efficient, and robust. For practitioners with limited understanding of MCP-Mod, who have concerns of applying MCP-Mod to their studies, or who are not well versed with the complexity of MCP-Mod software, the OLCT is a useful alternative. TCS049 Development and Validation of Bioanalytical Methods Wed :30-:00 Lecture Hall HS 2 Chair(s): Olga Marchenko During the development of biopharmaceutical products, bioanalytical methods are commonly used in non-clinical and clinical studies such as pharmacokinetic, toxicokinetic, bioavailability, bioequivalence, dose finding and drug-drug interaction studies, etc., in order to describe the exposure to the drugs and their metabolites. It is important that these bioanalytical methods are well characterized throughout the analytical procedures to establish their validity and reliability in the analysis of non-clinical and clinical samples. This session features scientific presentations on emerging nonclinical statistics including introduction to the ICH M0 guideline, nonclinical statistics in process development and scientific exploration. TCS049. Analytical method validation based on total error Yang H. MedImmune, LLC, Gaithersburg, United States The primary purpose of method validation is to demonstrate that the method is fit for its intended use. Traditionally, an analytical method is deemed valid if its performance characteristics such as accuracy and precision are shown to meet pre-specified acceptance criteria. However, these acceptance criteria are not directly related to the method intended purpose, which is usually a gurantee that a high percentage of the test results of the future samples are close to their true values. Alternate "fit for purpose" acceptance criteria based on the concept of total error have been increasingly used. Such criteria allow for assessing method validity, taking into account the relationship between accuracy and precision. Although several statistical test methods have been proposed in literature to test the "fit for use" hypothesis, the majority of the methods are not designed to protect the risk of accepting unsuitable methods, thus having the potential to cause uncontrolled consumer's risk. In this paper, we propose a test method based on generalized pivotal quantity (GPQ) inference. Through simulation studies, the performance of the method is compared to five existing approaches. The results show that both the new method and the method based on β-content tolerance interval with a confidence level of 90%, hereafter referred to as β- content (0.9), control Type I error, thus consumer's risk while the other existing methods do not. It is further demonstrated that the GPQ method is less conservative than the β-content (0.9) approach when the analytical methods are biased; whereas it is more conservative when the analytical methods are unbiased. Therefore selection of either the GPQ or β-content (0.9) approach for an analytical method validation depends on the accuracy of the analytical method. It is also shown that the GPQ method has better asymptotic properties than all of the current methods. TCS049.2 Current bioanalytical method validation guidelines - views from a statistical perspective Lin T.-L. FDA/CBER, Silver Spring, United States Bioanalytical methods are critical for the many clinical and nonclinical studies that are used to support license applications. The current bioanalytical method validation guidelines in the US, EU, and Japan have developed over time with minimum statistical involvement. There are many specific experimental designs and acceptance criteria recommended in the guidelines. These recommendations may raise statistical questions such as appropriateness of the analysis approach, relevancy of statistical criteria to the validation parameter assessed, and statistical power for detecting unacceptable performance, etc. This presentation will take a look at some of these statistical issues. TCS049.4 Parameter estimates for large proportions of data below multiple lower limits of quantification Berger T., Hilgers R.-D., Heussen N. RWTH Aachen University, Department of Medical Statistics, Aachen, Germany Background: Multiple lower limits of quantification (MLOQ) result if there is more than one laboratory responsible to deliver measurements of concentration data while some of the measurements are too low to be measured with sufficient precision. Only one parametric method to estimate distribution parameters of normally distributed data which include observations below MLOQ exists. For the case of a single lower limit of quantification (LLOQ) in a data set there are well-known methods to estimate these parameters such as simple imputation methods and maximum likelihood based approaches. Methods: We propose two maximum likelihood based extensions of LLOQ methods to estimate the parameters mean and variance in the presence of observations below MLOQ for normally distributed data. In a simulation study the performance of our methods is compared to the existing parametric method and simple imputation methods where we focus on estimating the parameter mean. We investigate a broad range of proportions of censored data to evaluate how the methods can deal with the problem of large missing fractions. Furthermore we examine the influence of various sample sizes on the performance. Results and Conclusions: We provide formulas to estimate the distribution parameters mean and variance of data sets in the context of missing data due to MLOQ for simple imputation and maximum likelihood based methods. Furthermore we show that the parameter 54

57 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed estimates from the various methods vary substantially. Based on that we give recommendations as to which method should be applied in practice. TCS049.5 Statistical considerations in pharmaceutical process validation Coppenolle H. Johnson & Johnson, Statistical Decision Sciences, Beerse, Belgium Process performance qualification (PPQ) is an important phase in validation of a pharmaceutical process and requires statistical input to justify the number of samples per lot for acceptance testing of a product critical quality attribute thereby compromising consumer and producer risk. This presentation discusses the relationship between an operating characteristic curve of a traditional variable sampling plan with concepts such as coverage and confidence of a tolerance interval. The use of a fixed-in-advance tolerance interval is proposed for validation purposes at the stage of PPQ. It expresses a confidence about the failure rate for a given quality criterion and represents a natural translation of lot quality. Extension of the concept to multiple variance components or complex quality criteria is straightforward. A Bayesian fixed-in-advance tolerance interval approach is used to outline the role of analytical variability in quality acceptance testing for PPQ. The run variance is an inherent characteristic of an analytical method. Prior information on the run variance is usually obtained from designed method validation studies. It can serve as input for the analytical design of process validation studies to control for possible run biases interfering with the quality decision. The role of analytical variability in the design and analysis of process validation studies will be discussed based on a real life example. A validated analytical method typically acts as a servant to the validation of drug product quality. TCS050 Emerging Topics in the Design and Analysis of Clinical Trials Thu :30-6:00 Lecture Hall HS 4 Organizer(s): Jie Chen Chair(s): Robert Beckman The ultimate objective of clinical trial design and analysis is to provide statistical information for scientific decision making on a drug development program and eventually for regulatory approval of the drug. However, clinical development is a complex process, involving innovative yet efficient trial design and its mid-course adjustment, trial monitoring for patient benefit and safety, good analysis plan and smart interpretation of analysis results for decision making. This session features presentations on two-stage trial designs with midcourse adjustment, innovative trial monitoring approaches and data analysis and a new paradigm for decision making from p-value to probability of success. TCS050. Using ROC curve for sample size determination and treatment screening in two-stage Phase II clinical trials Huang W.-S., Chang Y.-C. Academia Sinica, Institute of Statistical Science, Taipei, Taiwan To ensure the efficacy of a drug and post-marketing safety, the method of clinical trial is usually used to identify the valuable treatments and to compare the efficacy of them with that of a standard control therapy, and therefore, is essential in a costly and time-consuming new drug development process. It is undesirable to recruit patients to the little therapeutic effect treatments due to the ethical and cost imperative. To reduce the cost and shorten the duration of the trials, several two-stage designs are proposed, and the conditional power, based on the interim analysis results, is a feasible way to appraise whether it is worth to continue the lower efficacious treatments into next stage. However, there is a lack of discussions about the influentially impacts on the conditional power of a trial at design stage in the literature. In this article, we provide the optimal conditional power through the method of the receiver operating characteristic (ROC) curve to assess the impacts on the quality of a two-stage design with multiple treatments. Then under the constraint of the optimal conditional power, we propose the optimal design, minimizing the expected sample size, for choosing the best/some promising treatment(s) among several treatments. Tables of the two-stage design subject to optimal conditional power for various combinations of design parameters are provided, and an example for illustration purposes is presented. TCS050.2 Experiences and lessons learned in exploring risk-based monitoring in oncology clinical trials Cheng M., Clow F. 2 Pharmacyclics LLC - an AbbVie Company, Biostatistics, Sunnyvale, United States, 2 Pharmacyclics LLC - an AbbVie Company, Biometrics, Sunnyvale, United States Traditional monitoring, mainly on-site monitoring, is time consuming, expensive and not always necessary. There is a growing consensus that risk-based approaches to monitoring (RBM), focused on risks to the most critical data elements and processes necessary to achieve study objectives, are more likely sufficient, than routine visits to all clinical sites and 00% source data verification (SDV), to ensure subject protection and overall study quality []. RBM is a combination of centralized, remote and on-site monitoring. Centralized monitoring is an important component of a RBM plan. FDA also encourages greater use of centralized monitoring practices, where appropriate, and put less emphasis on on-site monitoring. Statistical methods are important components of centralized monitoring, e.g. probability sampling of sites, data anomaly detection, examining the distribution of data based on variation, outlier/inlier detection, examining differences between and within sites [,2,3]. Over the past 5 years, many research and publications were given to pioneer and perfect the path [-8]; many companies and contract organizations also emerged to offer sponsors solutions to centralized statistical monitoring, with the promise of the reduction of clinical monitoring cost, whereas, no standard matrix has been set up to measure the effectiveness of the proposed RBM vs the others. Pharmacyclics has piloted few oncology studies exploring RBM approaches since We defined pre-specified statistical analysis plan, before the trial began, with targeted, triggered RBM and centralized monitoring, to identify outliers or trends that may need attention. This abstract will discuss the experiences we had, and the lessons learned. 55

58 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS050.3 Bartlett-type corrections and bootstrap adjustments of likelihood-based inference methods for network meta-analysis Noma H., Nagashima K. 2, Maruo K. 3, Gosho M. 4, Furukawa T.A. 5 The Institute of Statistical Mathematics, Department of Data Science, Tokyo, Japan, 2 Chiba University, Graduate School of Medicine, Chiba, Japan, 3 National Center of Neurology and Psychiatry, Translational Medical Center, Tokyo, Japan, 4 University of Tsukuba, Faculty of Medicine, Tsukuba, Japan, 5 Kyoto University, Graduate School of Medicine / School of Public Health, Kyoto, Japan In network meta-analyses that synthesize direct and indirect comparison evidence concerning multiple treatments, multivariate random effects models have been routinely used for addressing between-studies heterogeneities. However, coverage probabilities of the confidence intervals for standard inference methods (e.g., restricted maximum likelihood [REML] estimation) cannot retain their nominal confidence levels under certain conditions, particularly when the number of synthesized studies is moderate or small, because their validities depend on large sample approximations. For conventional pairwise meta-analysis methods, alternative efficient likelihoodbased interval estimation techniques have been developed, namely the likelihood ratio test-based method by Hardy and Thompson (Statist. Med. 996, 5: 69-29) and the efficient score test-based method by Noma (Statist. Med. 20, 30: ). In this study, we extend these two efficient inference methods to a general framework for network meta-analysis. In addition, to address small sample inadequacies of these methods, improved higher order asymptotic methods using Bartlett-type corrections and bootstrap adjustment methods are developed. These methods can also be straightforwardly applied to multivariate meta-regression analyses. In numerical evaluations via simulations, the developed methods performed well compared with the ordinary REML-based inference method. In the case of a moderate number of synthesized studies with large between-studies heterogeneity, the likelihood ratio test- and efficient score test-based methods lost their validities but the Bartlett-type correction and bootstrap adjustment methods performed well. Applications to two network meta-analysis datasets are provided. TCS050.4 Decision making for pharmaceutical development programs - from p-values to probability of success Schueler A. Merck KGaA, Biostatistics, Darmstadt, Germany At several timepoints during clinical development decisions on further invest in a certain compound (i.e. start new clinical trials or not) have to be made. Clinical trials in early phases of development should be planned in a way that their results allow to make appropriate decisions for further development. Therefore, during planning of those trials a combination of sample size and go/no go boundary (for example in terms of posterior probabilities) should be chosen that leads to sufficient high probabilities of correct go and correct no go decisions. If there is already some clarity about the confirmatory phase III design, the probabilities of success (PoS) of phase III should also be taken into account when planning early trials []. Besides considerations within a single development goal (e.g. drug * indication combination), decision making on portfolio level should be also considered. [] Götte H, Schüler A, Kirchner M, Kieser M. Sample size planning for phase II trials based on success probabilities for phase III. Pharm Stat. 205;4(6): TCS05 Quantum Computing in Statistics and Machine Learning Thu :30-3:00 Lecture Hall KR 9 Organizer(s): Valerii Fedorov Chair(s): Frank Bretz Discussant(s): Frank Bretz, Ali Eskandarian Quantum computing is a frequent and highly praised guest on the popular sciences pages but only a few big players like Google or Lockheed-Martin own one. While theory of quantum information and various theoretical aspects of quantum computing attracted probabilists for decades the examples of practical applications are very rare. The proposed session will feature talks that discuss quantum algorithms, their applications in statistics and links between classical statistical algorithms and their quantum versions with the emphasis on those that can be used drug development studies. TCS05. Quantum-enhanced machine learning and AI Wittek P. ICFO, Castelldefels, Spain Quantum technologies are maturing: we see more and more applications of quantum information processing and the recent progress in building scalable universal quantum computers is remarkable. Machine learning and AI comprise an important applied field where quantum resources are expected to give a major boost. On a theoretical level, we can ask what ultimate limits quantum physics imposes on learning protocols. To answer this, we need to generalize classical results such as structural risk minimization and model complexity, and no-free-lunch theorems. On a more practical level, we can study potential speedups of quantum-enhanced protocols, which range between exponential and quadratic reduction in complexity; examples include quantum principal component analysis and quantum K- means clustering. Finally, we may consider what can be implemented with current and near-future technology, particularly when it comes to computationally expensive algorithms such as probabilistic graphical models. In this talk, we give an overview of the key research directions, implementations, and some open questions. TCS05.2 Optimization and machine learning using quantum annealing Tallant G. Lockheed Martin, Fort Worth, United States We provide a brief overview of quantum computing at Lockheed Martin and discuss some of the basic concepts related to quantum annealing and the D-Wave Systems Inc. hardware. Practical challenges associated with using the D-Wave hardware on real applications are reviewed including problem formulation and scaling, embedding problems on the hardware graph, and error mitigation strategies. We will illustrate our findings on example problems from the fields of optimization and machine learning. 56

59 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS05.3 Stochastic global optimization and quantum annealing Zhigljavsky A. Cardiff University, Cardiff, United Kingdom Quantum annealing seems to be a very promising heuristic for solving some discrete optimization problems like the travelling salesman problem. In this talk I turn my attention to continuous optimization. I will overview some basic principles of construction of stochastic global optimization methods for continuous optimization and will try to place quantum annealing as a stochastic global optimization method. I will try to argue that currently quantum annealing in continuous optimization is not more than a set of heuristic rules for construction of peculiar optimization algorithms. TCS05.4 Quantum computing in statistics and machine learning Fedorov V. ICONplc, North Wales, United States The major objective of the talk is to discuss opportunities of quantum computing in statistics and machine learning. Quantum computing hardware and its algorithmic support are rapidly improving and became popular in many scientific areas. We see more and more applications of quantum information processing, machine learning and quantum Monte-Carlo techniques. Machine learning and artificial intelligence comprise important applied areas where quantum optimization (for instance, quantum annealing) is expected to give a major boost. Statistical methods in classification problem and in optimal design are two other areas where quantum computing looks very promising. TCS052 Historical Data for Confirmatory Prospective Clinical Trials a Contradiction? (Part ) Wed :30-:00 Lecture Hall HS 3 Chair(s): Franz König Discussant(s): David Wright The landscape of medical research has changed in the last couple of years due to a range of data sharing initiatives (e.g., EMA data transparency Policy 0070, ICMJE, YODA, EFPIA-PhRMA, ). The availability of patient level data of historical trials opens new opportunities, not only for the exploratory research, but also for designing new trials. A controversially discussed use of historical data is to replace concurrent controls in RCTs by historic controls. Do the savings in terms of sample sizes outweigh the risks of biases? In this session we will highlight new developments and discuss the impact on clinical trial design and analysis strategies. TCS052. Historical controls for clinical trial: do we have a false randomised analogy? Senn S.,2, Collignon O.,3, Schritz A. Luxembourg Institute of Health, Methodology and Statistics, Strassen, Luxembourg, 2 University of Sheffield, School of Health and Related Research, Sheffield, United Kingdom, 3 European Medicines Agency, Human Medicines Evaluation Division - Biostatistics and Methodology Support, London, United Kingdom Where rare diseases are concerned, data are hard to come by and the attraction of using historical data evident. There have been many attempts in recent years to provide reasonable modelling approaches to using historical control data when testing new treatments. Many of these seem to have begun from the starting position that the historical data are analogous to (if a somewhat deficient version of) the control arm in a parallel group trial. We propose, instead, that the analogy chosen should be that of a cluster-randomised trial. A justification will be given by considering the striking results of the TARGET study, which combines both randomised and nonrandomised elements. The implications for incorporating historical controls in the analysis of new treatments are not just as regards modelling, although this is important, but also as regards their conduct and providing regulators and other parties with adequate guarantees of utmost good faith and also of care and attention in performing the work. We consider parallels between frequentist analyses of cluster randomised trials and Bayesian approaches and also provide a list of practical matters that will need to be addressed in any serious attempt to incorporate historical information. Acknowledgements: This work is part of the EU FP7 IDEAL project (grant ) TCS052.2 Rethinking confirmatory prospective clinical trials: why and how co-data matter Wandel S. Novartis Pharma AG, Basel, Switzerland Confirmatory clinical trials are the cornerstone for drug approvals and usually rely on phase III data only. However, at the time of submission, a large body of co-data (historical and actual) will be available from phase I, II and other (potentially ongoing) studies. Whether and how to use these co-data to complement the confirmatory evidence appears to be a valid question. It is especially relevant for diseases that are rare or have a high unmet medical need. Statistically, the problem may be addressed with meta-analytic, normal-normal hierarchical models (NNHM). An interesting feature of such models is their ability to incorporate (but discount) the co-data when estimating the effect of the confirmatory trial, either in a frequentist or Bayesian way. However, since the number of studies is usually small, the Bayesian approach has some advantages. Using an example in rare diseases, I will illustrate how to incorporate the co-data via the NNHM to complement the confirmatory evidence. In particular, I will discuss a strategy for early (conditional) approval that offers an alternative to using the confirmatory data only. The example shows the potential advantage of using co-data in a confirmatory setting. TCS052.3 Small trials, historical data: methods and software for decision making Magirr D. AstraZeneca, Cambridge, United Kingdom In Early Clinical Development (ECD) at AstraZeneca we apply a consistent statistical approach to go/no-go decisions (Frewer, 206). We choose a key endpoint, or possibly two key endpoints, and the sample space for the corresponding point estimate is split into three zones: red, amber and green. The amber zone represents inconclusive evidence, where our senior leaders must use their overall 57

60 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed impression of the data, as well as factors external to the trial, to come to a final decision. Decision criteria are pre-specified by the clinical team at the design stage. Our job as statisticians is to communicate the uncertainties and risks associated with various options. Working with Cytel, we have co-developed a software tool called decide to simplify this task. The methodology has been very successful. It has gained acceptance from senior leaders and clinical teams. This means that the prespecified criteria are always applied at the end of the study - a good scientific practice that can be difficult to implement in early development. Thus far, we are mostly using frequentist methods, despite often possessing plenty of historical information. In this talk I will describe our strategy for using Bayesian methods in go/no-go decision making, and how we are implementing this with the decide software. TCS052.4 Do dynamic borrowing methods control the risk of using historical data? Dejardin D., Delmar P., Patel K. 2, Warne C. 2, Van Rosmalen J. 3, Lesaffre E. 4 F. Hoffmann-La Roche, Biometrics, Basel, Switzerland, 2 Roche Products Ltd, Biometrics, Welwyn Garden City, United Kingdom, 3 Erasmus University Medical Center, Department of Biostatistics, Rotterdam, Netherlands, 4 KU Leuven, Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Leuven, Belgium With increased availability of historical data (e.g. through data sharing initiatives) the questions of how to best use this historical data and how to address the risk of bias linked to these historical data have become more important. One could think of collecting a small dataset (concurrent data) and supplement this data with historical data, for instance, to increase the power of the analysis. A typical example of this would be a randomized trial in which the control arm is small but historical control data are added to the analysis. Recently, several Bayesian methods have been introduced to automatically downweight historical data as a function of its compatibility with the concurrent data. These methods include the normalized power prior, the commensurate prior and the robust mixture prior. We introduce a motivating example from an antibiotic clinical development plan on a binary endpoint. Based on this example, we provide a fair comparison between the methods and show that these methods provide an increase of power while type I error is limited. We show that the classical analysis without historical data with the small current control and same increased type I error provides a better power. One difference between these analysis is that the increased type I error is fixed for the classical analysis while for the dynamic borrowing methods, it depends on the drift. Finally, we discuss these findings in the context of registrational studies. TCS053 Reproducible Research - Can We Really Separate the Wheat from the Chaff? Wed :30-8:00 Lecture Hall HS Chair(s): Franz König, Martin Posch Panelist(s): Doug Altman, Peter Bauer, John P.A. Ioannidis, Markus Müller, Regina Nuzzo, Stephen Senn There is growing controversy on the validity of empirical research. Especially the biosciences have been affected by this discussion. There is an overwhelming flood of new publications, but exciting results of many studies cannot be reproduced in further studies. Besides other factors, particularly statistics has come under fire. For example it has been suggested that many problems are due to selective reporting of only favourable results linked to the use of the statistical methodology of hypothesis testing that has been used for many decades. Thus it has been suggested that frequentist statistical testing should be abandoned immediately, but that is quite unlikely to happen. A further concern is that the common statistical (mal-) practice of not clearly distinguishing between pre-specified and (post-hoc) exploratory analysis may be another reason for non-repeatable conclusions. What can be done to improve the application of statistics in research? Can initiatives to regulate the reporting, pre-registration and sharing of data reinstall trust in research? TCS054 Decision Making in Medical Research Tue :30-3:00 Lecture Hall HS 3 Organizer(s): Gerhard Nehmiz Chair(s): Reinhard Vonthein The session addresses decision making under uncertainty in medical contexts: to detect efficacy signals, to weed out inferior subgroups, to select doses for further investigations, to optimize the size of Phase II. Most presentations apply the Bayesian approach but a multiple-testing approach is also presented. TCS054. Statistical framework for clinical utility indices using Bayesian statistics - a case study Klein S. Bayer AG, Berlin, Germany The clinical utility index (CUI) is defined by CUI(X)= w i U i (X i ), where X i are random variables representing clinical outcome, U i are utility functions on the clinical outcomes, and w i are predefined weights given to each utility component U i of the CUI. The CUI has been introduced as a tool to aid in measuring efficacy-safety tradeoffs. Among other purposes, CUI usage was proposed, eg, for determination of likelihood of success at proof of concept stage or for determination of optimal doses within a Bayesian Framework. A major drawback for CUI is the uncertainty introduced by the (more or less) subjective process of choosing suitable CUI weights w i. A framework for quantifying this uncertainty using appropriate prior distributions for CUI weights and distribution parameters of clinical outcomes will be presented. Using simulations, it will be explored how such an approach could be used during drug development, updating the CUI with each study. Literature: Ouellet D.. Benefit Risk assessment: the use of clinical utility index. Expert Opinion on Drug Safety, 9, (200). Poland B, Hodge FL, Khan A. et al. The clinical utility index as a practical multiattribute approach to drug development decisions. Clin Pharmacol Ther,86, (2009). Graham, G., Gupta, S. & Aarons, L. Determination of an optimal dosage regimen using a Bayesian decision analysis of efficacy and adverse effect data. J. Pharmacokinet. Pharmacodyn. 29, (2002). 58

61 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS054.2 Assessing early signs of efficacy while dealing with unknown dose-response relationship using a Bayesian approach with informative priors Kaiser A., Vonk R. Bayer AG, Research & Clinical Science Statistics, Berlin, Germany To support decision making about larger investments in drug development e.g. for indication expansions, often small clinical trials are conducted to generate early signs of efficacy. If the shape of the underlying dose-response relationship is unknown, efficacy evaluation becomes more difficult due to the small number of patients included in such trials. To tackle this challenge, a four step approach is proposed, allowing a rough assessment of the underlying dose-response shape and the quantification of efficacy signals. In the first step, the potential underlying treatment responses at each dose are analyzed using Bayesian approach allowing the incorporation of prior information. Next, potential underlying dose-response relationships are compared with pre-specified models. By this, probabilities for a model being the best approximation of underlying dose-response are derived. In the third step, posterior probabilities for maximum treatment effect are calculated for each model. Finally, these posterior probabilities are averaged using the probabilities for model selection as weights resulting in a probabilistic quantification of early signs of efficacy. Results from simulation studies about the operational characteristics of this approach will be presented. TCS054.3 Selecting inferior treatments in network meta-analysis König J. University Medical Center Johannes Gutenberg University Mainz, Institute of Medical Biostatistics, Epidemiology and Informatics, Mainz, Germany Whenever more than two treatments can be given in one indication, network meta-analysis is becoming state of the art for comprehensive review and interpretation of available evidence. Several methods have been proposed to rank treatments with respect to some outcome that include point estimates and uncertainty: SUCRA and probabilistic ranking. Little attention has been given to more rigorous procedures, which have been developed for the case of one multi armed study like multiple testing procedures and statistical selection procedures. We demonstrate how some of these multiple decision procedures can be applied to network meta-analysis results. We propose a new method tailored to sort out treatments that are inferior to best. We claim that Hsu's method which provides a contrast estimate comparing a given treatment to the best among all others can be modified to more effectively meet the unbalanced situation often encountered in network meta-analysis. Literature: Jason C. Hsu (985) A method of unconstrained multiple comparisons with the best, Communications in Statistics - Theory and Methods, 4:9, TCS054.4 Balancing the size of Phase II against the risk of wrong decision for Phase III: an example Nehmiz G. Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany We investigate the assurance for Phase III in an example with binary endpoint. We aim to select for Phase III the dose with the highest response percentage observed in Phase II. Now, the probability is >0 that in Phase II an actually sub-optimal dose shows, by random error, the best result and is selected for Phase III, increasing the risk of a non-significant result (failure of the whole project). The loss resulting from project failure is, therefore, to be weighted with the probability of false selection and with the reduction of assurance. The probability of wrong selection is a decreasing function of the Phase-II sample size (NII) and is convex. On the other hand, the Phase-II costs occur with certainty and increase linearly with NII. In order to find the optimal NII, the costs of Phase II and of the failed project are brought on one common scale and added, giving again a convex function of NII, which has a minimum. As the common unit, a suitably defined "effective patient" appears useful and intuitive. The behaviour of the optimal NII under various scenarios of prior information is investigated. We show that in a wide range the dependence of the optimum from the prior distribution is weak; however, the optimum remains strongly dependent from the assumed costs of the failed project. TCS054.5 Evaluation of quantitative go/no-go criteria after phase II using R shiny Sailer M.O., Hoang K. 2, Stucke-Straub K., Fleischer F. Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany, 2 TU Dortmund, Dortmund, Germany In the pharmaceutical industry quantitative go/no-go criteria are increasingly used to assist decision making after phase Ib/II. The goal is to manage the risk of developing too many marginally effective drugs or too few promising substances. We compare criteria after phase Ib/II based on Lalonde et al. (2007, DOI: 0.038/sj.clpt )/ Frewer et al. (206, DOI: 0.002/pst.746) and Fisch et al. (205, DOI: 0.77/ ) to a criterion internally used at Boehringer Ingelheim. In order to allow for the incorporation of prior information we allow for conjugate mixture analysis priors for binary, normal and time-to-event data. We assess decision probabilities based on the resulting posterior distributions under various design priors to evaluate the designs. For use in the planning of phase II clinical trials we use an R shiny app that interactively depicts graphical and tabular displays of the decision region and probabilities for given parameter values. TCS055 Decision Theoretic Approaches for Clinical Trial Design and Analysis Wed :30-6:00 Lecture Hall HS Organizer(s): Franz König, Martin Posch Chair(s): Carl-Fredrik Burman Formal planning of confirmatory clinical trials is typically based on frequentist operating characteristics such as power and type I error rates which are chosen by convention. However, the overall utility of a clinical trial design depends on several further parameters, such as the trial costs and the size of the population to be treated. Decision theoretic approaches based on utility functions can model the impact of a range of factors and guide the planning and analysis of clinical trials. This can be especially useful in complex settings, like for example the development of targeted therapies where differential treatment effects in multiple subgroups are assessed. Furthermore, decision theoretic approaches can be tailored to represent the perspectives of different stakeholders as clinical trial sponsors, patients and society. In this session we present an overview of recent advances in decision theoretic methods for clinical trial design. 59

62 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS055. A Bayesian decision theoretic model of sequential experimentation with delayed response Chick S., Forster M. 2, Pertile P. 3 INSEAD, Technology and Operations Management, Fontainebleau, France, 2 University of York, Economics and Related Studies, York, United Kingdom, 3 University of Verona, Economics, Verona, Italy We present a Bayesian decision-theoretic model of the optimal stopping of a two-armed sequential clinical trial in which the primary outcome arrives with delay. The model seeks the policy which maximises the expected benefits from the health technology adoption decision, accounting for the cost of carrying out the trial itself. Also included are the costs of switching health technologies and the benefits accruing to trial participants and the wider study population. The policy defines the optimal trial design from the choices: do not run trial'/ run fixed sample size trial'/ run sequential trial'. Monte Carlo simulation shows that the optimal Bayes sequential policy outperforms alternative, non-sequential, trial designs in terms of the expected benefits of health technology adoption, net of trial costs. But the expected sample size of the policy can be greater than, equal to, or less than, that of comparator designs. Why? Because the policy achieves the sample size which appropriately balances the expected benefit to patients with the cost of learning during the trial. Our presentation will summarise the main features of the model and discuss the challenges of moving from theory to application. Further details may be found here: TCS055.2 Decision-theoretic value of information approaches to trial design in small populations Madan J., Hee S.W., Pearce M. 2, Stallard N., InSPiRe University of Warwick, Warwick Medical School, Coventry, United Kingdom, 2 University of Warwick, Warwick Business School, Coventry, United Kingdom Standard approaches to the design of confirmatory clinical trials involve sample sizes based on achieving a specified power to detect a given effect size assuming that the null hypothesis will be rejected at a given significance level. This involves choosing values for power and significance that are governed by convention (usually 80% or 90% for power, 5% for significance), but these values are arbitrary, and do not take into account the relative desirability of avoiding type I vs type II errors. Decision-theoretic approaches, such as the value-of-information (VoI) methodology that has been developed in the health economics literature, provide a rationale for trial design related to the consequences of the decision which the trial is being designed to inform. They involve the use of a Bayesian framework in which the prospective study is assumed to update a prior distribution on the pay-off from each decision option from the point of view of a pre-specified decision-maker. We explore how VoI can be used to derive decision-theoretic pre-specified choices for sample size and significance thresholds, by considering the size of the population affected, and the relative value of the factors that drive the pay-off from the trial (treatment effect, costs of treatment and research, and potential harms that may not be revealed by the trial). We show that, as the size of the population increases, the optimal sample size increases and the optimal significance level falls. We further show how the method can be adapted for rare conditions where the prevalence is known imperfectly. TCS055.3 Improving adaptive enrichment decisions using longitudinal observations Burnett T., Jennison C. University of Bath, Mathematical Sciences, Bath, United Kingdom Suppose that, before conducting a confirmatory clinical trial, we have identified a patient sub-population that is expected to receive a greater benefit from the new treatment. An Adaptive Enrichment trial begins by recruiting all available patients then, at an interim analysis, it is decided whether to recruit the remainder of the sample from the full patient population or only from the sub-population. We are able to construct hypothesis tests for Adaptive Enrichment trials such that strong control of the Family Wise Error Rate is assured for all possible decision rules at the interim analysis. In deciding whether or not to adapt the recruitment pattern at the interim analysis, we may only use the observations available at this time and so the primary endpoint will not be available for the most recently recruited patients. Suppose we collect a short term observation for each patient as well as the longer term primary endpoint. Assuming a joint model for the short term and long term observations, we may reduce the variance of the interim estimates of treatment effects in the full population and sub-population, and so improve our decision making at the interim analysis. We use a Bayesian decision framework at the interim analysis to construct Bayes optimal Adaptive Enrichment designs. Using this framework, we also assess the overall performance of different trial designs and demonstrate the overall benefit of enhancing the interim decision using short term observations. TCS055.4 Phase II adaptive enrichment design to select the participant population for future trials, in terms of baseline disease severity score Du Y., Rosenblum M. Johns Hopkins Bloomberg School of Public Health, Biostatistics, Baltimore, United States We propose and evaluate a two-stage, phase 2, adaptive clinical trial design. Its goal is to determine whether future phase 3 (confirmatory) trials should be conducted, and if so, which participant population should be enrolled. The population selected for phase 3 enrollment is defined in terms of an ordinal disease severity score measured at baseline. We optimize the trial design and analysis in a decision theoretic framework, using a utility function that penalizes for including participants who don t benefit and for excluding those who do benefit; the utility function also incorporates the cost of future trials and healthcare system costs. Given such a utility function and a discrete prior distribution on the conditional treatment effect, we compute the Bayes optimal design. The resulting design is compared to simpler designs in simulation studies. We also apply the designs to resampled data from a completed, phase 2 trial evaluating a new surgical intervention for stroke. 60

63 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS055.5 Optimized adaptive enrichment designs for clinical trials with a sensitive subpopulation Ondra T., Jobjörnsson S. 2, Burman C.-F. 3, König F., Stallard N. 4, Posch M. Medical University of Vienna, Vienna, Austria, 2 Chalmers University, Gothenburg, Sweden, 3 AstraZeneca R&D, Molndal, Sweden, 4 The University of Warwick, Coventry, United Kingdom An important objective in the development of targeted therapies is to identify subgroups of patients where the treatment under investigation has a positive benefit risk balance. We are considering clinical trials investigating a treatment in a subpopulation (S), defined by a binary biomarker, as well as the full population (F), consisting of biomarker positive patients (S) and biomarker negative patients (S'). Based on a decision theoretic approach, assigning gains and losses to a particular trial design we compare optimized single stage and adaptive two stage designs. We model the gains of a particular trial design from both the sponsor s (reflecting commercial interests) as well as a societal view (reflecting public health interests). For single stage designs we optimize the number of patients from S and S' to be included in the trial. The optimization of adaptive two stage designs relies on a dynamic programming approach as well as extensive numerical calculations. In particular we optimize the number of patients to be included from S and S' in the first stage and present optimized decision rules, assigning an optimized second stage trial design to a given interim observation. The optimizations are performed for both the sponsor s and the public health utility. This project has received funding from the European Union's 7th Framework Programme for research, technological development and demonstration under the IDEAL Grant Agreement no , and the InSPiRe Grant Agreement no TCS058 Statistical Methodology for the Design and Analysis of Trials in Small Populations Fri :30-0:00 Lecture Hall KR 7 Organizer(s): Martin Posch Chair(s): Geert Molenberghs Discussant(s): Robert Beckman Research in small population groups poses specific challenges in the designs and analysis of clinical trials. There are several initiatives to develop more efficient and effective research designs to study new drugs and treatments for rare diseases and targeted therapies. Approaches include innovative trial designs, evidence synthesis methods across multiple endpoints in single trials and across multiple trials and the development of standards for regulatory assessment of treatments for small populations. In this session we give an overview on recent methodological advancements achieved in the focused research initiatives and discuss directions for further research. TCS058. IDEAL - new developments on designing and analysis of small population group trials Hilgers R.-D., IDeAl Project RWTH Aachen University, Medical Statistics, Aachen, Germany Background: In 202 the EC announced the call on new methodologies for clinical trials for small population groups within the 7th framework program Health Innovation aiming to develop new or improved statistical design methodologies for clinical trials aiming at the efficient assessment of the safety and/or efficacy of a treatment for small population groups in particular for rare diseases or personalised (stratified or individualised) medicine. Methods: IDeAl ( runs over the last three years and integrates design and analysis methods of small population group trials to utilize and connect all possible sources of information in order to optimize the complete process of a clinical trial. Results and Conclusions: Within the talk I will summarize the major findings of integrated research include assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, and the evaluation of biomarkers. Further the results were related to the up coming EMA (29/30.march) meeting showing regulators areas of main interests (FP7 Small-population research methods projects and regulatory application workshop) TCS058.2 Clinical trials in rare diseases: there is no magic potion. Results and way forward from the Asterix project Roes K.,2, Asterix Project UMC Utrecht, Julius Center, Utrecht, Netherlands, 2 Medicines Evaluation Board, Methodology, Utrecht, Netherlands In clinical trials that aim to provide confirmatory evidence for new treatments for rare diseases, the available sample size is often a crucial limitation. In a regulatory (drug approval) setting decision making based on evidence from a limited number of small trials is challenging. The totality of evidence is commonly taken into account, although in an informal fashion. To improve underlying methods as well as decision making, Asterix research followed essentially three pathways: () optimize methods for (flexible) designs in case of finite and small sample sizes, (2) develop appropriate methods for meta-analysis of small number of small trials to support decision making and (3) investigate new patient centered outcomes that capture the often heterogenous disease course efficiently. Furthermore, as a unique feature patient representatives were involved throughout the project. In this presentation, highlights of progress will be addressed, going beyond the methodology. As part of the project, European Public Assessment Reports were used as basis to assess applicability and added value of new methods. Against a well designed framework of medical conditions, this provides a thorough basis for future guidance. It will be clear that rather than a magic potion, careful consideration of several designated methods needs to be tailored to the medical condition and treatment(s). 6

64 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS058.3 Recent advances in methodology for clinical trials in small populations - the InSPiRe project Stallard N. University of Warwick, Warwick Medical School, Coventry, United Kingdom The Innovative Methodology for Small Populations Research (InSPiRe) project was one of three projects funded under the EU Framework Programme 7 call for New methodologies for clinical trials in small population groups. The project, which ran from June 204 to May 207, has brought together experts in innovative clinical trials methods from eight institutions to develop new approaches for the design, analysis and interpretation of trials in rare diseases or small populations. In such settings the large clinical trials generally used to evaluate new drugs and other healthcare interventions are often infeasible. New approaches to the design of such studies, or improved methods of data analysis and decision-making are therefore needed. With the aim of enabling more rapid evaluation of treatments whilst maintaining scientific and statistical rigour, we have developed new methods that include the combination of primary trial data with other information from the same or different studies, adaptive trial designs that allow most efficient use of data and optimal decision-making processes to reach conclusion as quickly as possible. The work has been arranged in four broad areas; use of PK/PD data in early-phase dose-finding trials, particularly in paediatrics, decision-theoretic approaches to sample size determination for clinical trials, efficient designs for confirmatory trials in stratified medicine and evidence synthesis methods to enable use of external data in small clinical trials. This talk will outline the work of the InSPiRe project and describe the main results obtained. TCS058.4 Increasing evidence designs in small population clinical trials Faldum A., Schmidt R., Kwiecien R. IBKF at WWU, Münster, Germany Proof of evidence towards or against a new approach is long lasting with conventional study designs in small population clinical trials. An inflation of the significance level to reduce sample sizes is quite often accepted at the expense of evidence. In this talk an increasing evidence design will be presented to increase the evidence stepwise up to an uninflated α level. Thereby, increasing evidence shall be realized by subsequent stages with sharpening futility bounds in an adaptive-sequential design. At the first stage the procedure starts with a liberal futility bound which allows dropping the treatment approach if a very low performance is observed. Based on this liberal futility bound the trial stage is powered to prevent an erroneous futility stop. This implies a comparatively small sample size for this first stage as compared to traditional statistical designs. Each next stage iteratively raises the hurdle for a new treatment by decreasing its respective futility bound. Firstly, the approach enables to control the rejection-for-futility rate. This will ensure stage by stage a favorable course of the clinical trial since the trial will be stopped by construction in case of a poor performing new therapy. Sample sizes can be adapted after each interim analysis to consider internal or external information. Additionally, at each interim analysis the null hypothesis can be rejected early at level α. If the trial has to be stopped i.e. for financial reasons or since a new promising therapy approach is forthcoming the smallest futility bound that has already been passed successfully can be interpreted as the level of evidence attained in the trial. This interpretation broadens the concept of a level α test. The work is part of the project "Adaptive Designs in Individualized Therapy (ADIT)" and is supported by the Federal Ministry of Education and Research (FKZ 0EK503A). TCS059 Adaptive Design and Multiple Testing Procedures for Small Populations Tue :30-3:00 Lecture Hall KR 8 Organizer(s): Rene Schmidt Chair(s): Andreas Faldum Conventional clinical trial designs typically require large sample sizes and provide little flexibility for mid-trial design modifications without compromising the validity of the statistical inference. In the development of treatments for rare diseases, for example in pediatric oncology, however, a reduction of the necessary sample sizes and trial times, as well as an increased flexibility, for example to change treatment strategies based on preliminary outcomes, would be highly desirable. Similar, recent advances in biomedical analytics provide more and more detailed characterization and classification of medical conditions. While this holds the promise of improved therapies through more individualized treatments, it raises a number of statistical challenges. The desire to make separate inferences for each subgroup of the population leads to a considerable multiplicity problem with often only few patients in each group. Consequently, efficient designs that provide flexibility with regard to the choice of population subgroups together with powerful multiple testing procedures are called for. In this session we will feature a selection of recent advances in adaptive trial design and multiple testing methodology for trials with small sample sizes and/or many subgroups of interest. The covered topics include novel methods for adaptive single arm trials, the use of historical data to overcome sample size limitations, alternative Type I error rates for subgroup analyses as well as methods for evidence synthesis in adaptive enrichment designs. TCS059. Confirmatory adaptive designs for single-arm survival trials Schmidt R., Faldum A., Kwiecien R. University of Münster, Münster, Germany Traditional designs in phase IIa cancer trials are single-arm designs with a binary outcome, for example, tumor response. In some settings, however, a time-to-event endpoint might be preferred. Then the one-sample log-rank test might be the method of choice. With the one-sample log-rank test, the survival curve of the patients is compared to a fixed reference survival curve. In this work, a confirmatory adaptive one-sample log-rank test is proposed where provision is made for data dependent sample size reassessment. The focus is to apply the inverse normal method. Operating characteristics and average sample number (ASN) of the adaptive test are investigated by simulation. It is shown that methodology helps to rescue an underpowered trial and reduces the ASN under the null hypothesis as compared to a single-step fixed sample design. The motivating clinical example is a phase IIa cancer trial for the treatment of children with recurrent or progressive neuroblastoma. 62

65 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS059.2 Adaptive designs in rare disease clinical trials incorporating historical data Hees K., Kieser M. Institut für Medizinische Biometrie und Informatik, Uniklinikum Heidelberg, Heidelberg, Germany Recruiting sufficient patients within an acceptable time horizon is an issue for most clinical trials and is especially challenging in the field of rare diseases. It is therefore an attractive option to include historical data from previous (pilot) trials in the analysis of the current study thus reducing the recruitment burden. Various Bayesian methods for the incorporation of historical information in present trials have been proposed in the literature. In case that the current data match sufficiently well with the historical data, these approaches lead to increased power. However, if this assumption is not met, the gain in power may be much smaller than expected while at the same time a type I error inflation occurs. Therefore, so-called robust prior distributions are well-suited since in case of a prior-data conflict they down-weight the extent to which the historical data is incorporated. When planning the sample size for trials incorporating historical data, not only the type I error rate, the power, and the treatment group difference but additionally the variance and the weight of the historical data have to be specified. However, there is usually some uncertainty in the planning phase about the value of these nuisance parameters. We present methods for blinded and unblinded sample size recalculation in the setting of two-arm superiority trials with historical control data where the variance - and in the unblinded setting additionally the extent to which the historical information is incorporated - is estimated mid-course and the sample size is recalculated accordingly. The operating characteristics of these methods are investigated in terms of actual type I error rate, power, and expected sample size. Application is illustrated with a clinical trial example in patients with systemic sclerosis, a rare connective tissue disorder. TCS059.3 The populationwise error rate - a more liberal error rate for multiplicity adjustment in enrichment designs Rohmeyer K., Brannath W. University of Bremen, KKSB, Bremen, Germany In clinical studies control of the familywise error rate is appropriate when several hypotheses are investigated on the same population. When the population however splits into disjunct subpopulations and each hypothesis only concerns one of these without a claim beyond the subpopulation, the overall study essentially consists of separate trials which share only the same infrastructure. In this case the familywise error rate is unreasonably conservative. In some cases the subpopulations are disjunct by definition (like two groups 'biomarker positive' and 'negative/unknown'), but in many other cases the subpopulations can overlap. For this setting we propose a generalized error rate that takes into account the probability to belong to a certain subpopulation or intersection of subpopulations. This error rate - which we call the populationwise error rate - extends continuously the spectrum from the FWER in the first setting to the unadjusted case for disjunct populations. We start defining simultaneous test procedures with control of the populationwise error rate. We then generalize the closed testing principle and show how to construct step-down tests. The gain in power and sample size by using the populationwise error instead of the familywise error rate is illustrated by first examples. TCS059.4 Adaptive designs for trials with multiple treatments and biomarkers Wason J. University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom Response to treatments is often highly heterogeneous. Increasing availability of biomarkers and targeted treatments has led to the need for trial designs that efficiently test new treatments in biomarker-stratified patient subgroups. Often new treatments are targeted at a specific biomarker subgroup, but may in fact work in a narrower or broader set of patients. I will discuss Bayesian adaptive methodology for trials that have multiple treatments and biomarkers. The proposed design incorporates biological hypotheses about the links between treatments and biomarker subgroups, but allows alternative links to be formed during the trial. The statistical properties of the method compare well to alternative designs available. This design has been developed for trials in ovarian cancer and breast cancer and some methodology issues specific to each application will be discussed. These include the use of continuous biomarker information to allocate patients and adding in new treatments and biomarkers during the trial. TCS059.5 Confirmatory adaptive group sequential designs for time-to-event endpoints using a short-term endpoint for design changes in multi-armed or population enrichment studies Jörgens S., Wassmer G. 2, König F. 2, Posch M. 2 ICON Clinical Research, Innovation Center, Cologne, Germany, 2 Medical University of Vienna, Center for Medical Statistics, Informatics, and Intelligent Systems, Vienna, Austria Adaptive group sequential designs involving multiple test treatment arms or subpopulations of interest often contain a selection interim analysis. Controlling the familywise type I error rate in the strong sense is achieved by applying the closed testing principle to adaptive combination tests or conditional error rate based tests (e.g., Wassmer & Brannath, Springer 206). Patients included in the interim analyses can contribute both to the selection decision and to the final confirmatory analysis. Often, this allows a decrease of sample size and study duration. The applicability of such a design depends on suitable operational characteristics: If the majority of the patients has already been enrolled at the selection interim, the practical benefit may become limited. This is especially true for time-to-event data, where the follow-up period may notably exceed the accrual period. In such cases, the use of secondary endpoints for selection can enhance the operational characteristics of the trial. This case is not fully covered by standard theory. Tailored approaches to guarantee type I error control for time-to-event endpoints have been proposed (Jenkins et al., Pharmaceutical Statistics 20, 0, , Mehta, Cyrus, et al., Statistics in Medicine 204, 33, ). A limitation of these is that they do not allow for early rejection of the null. We derive an extension of the test proposed by Jenkins that allows for interim testing of the null hypotheses while controling the familywise error rate. The operating characteristics of the procedure are investigated in a simulation study. 63

66 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS060 Design of Experiments in Mixed Models Tue :30-3:00 Lecture Hall KR 9 Organizer(s): Maryna Prus Chair(s): Rainer Schwabe Mixed effects models are of growing importance for the analysis of biometrical and biopharmaceutical data. In all situations when more than one observation is reported in statistical entities in a study, the intra-individual or intra-class correlation has to be taken into account. These entities may be, for example, random blocks in agricultural experiments, human individuals in medical or psychological studies, animals in behavioral experiments in biology, centers in multi-center clinical trials or laboratories in inter-laboratory tests. By means of a hierarchical or longitudinal approach parsimonious models can be obtained which are suitable to describe the intraindividual dependence. While their statistical analysis is quite well developed, less interest has been attended to the design of experiments in these models so far. The purpose of this session is to propagate the use of and to shed light on various aspects in the design of experiment which account for the specifics in mixed effects models: Florence Loingeville et al. use a Hamiltonian Monte-Carlo method to design longitudinal count studies in the context of nonlinear mixed effects models which additionally accounts for parameter and model uncertainties. Their approach is implemented in R using a package that efficiently draws Hamiltonian Monte-Carlo samples and calculates partial derivatives of the log-likelihood. Joachim Kunert and Christoph Neumann consider a model for crossover designs with carryover effects and a random interaction between treatments and subjects. In particular they derive optimal designs for the case when there are more periods than treatments in the experiment. Augustyn Markiewicz and Katarzyna Filipiak establish the optimality of certain circular block designs under a mixed interference model with random neighbor effects. Fritjof Freise et al. develop an optimal test design in an item response setup for a longitudinal study in intelligence testing. Here individual response curves are to be fitted for which the individual abilities change with time due to learning. While the first four contributions are devoted to the statistical analysis of population parameter or mean response curves, Maryna Prus propagates optimal designs for prediction, i.e. for the statistical analysis of the random individual parameters or responses. Both longitudinal designs within the individuals as well as cross-sectional designs are considered where treatments cannot be changed within individuals. New design criteria have to be developed for these situations. The resulting optimal designs turn out to differ substantially from those for the analysis of population parameters. TCS060. Using Hamiltonian Monte-Carlo to design longitudinal count studies accounting for parameter and model uncertainties Loingeville F., Nguyen T.T., Riviere M.-K.,2, Mentré F. INSERM, IAME, UMR 37, Paris, France, 2 Sanofi-Aventis R&D, Statistical Methodology Group, Biostatistics & Programming Department, Chilly-Mazarin, France To design longitudinal studies with nonlinear mixed effect models (NLMEM), optimal design based on the expected Fisher information matrix (FIM) can be used. A method evaluating the FIM based on Monte-Carlo Hamiltonian Monte-Carlo (MC/HMC) was implemented in the R package MIXFIM using Stan for HMC sampling. This approach requires a priori knowledge on models and parameters, leading to locally optimal designs. The objective of this work was to extend this MC/HMC-based method to evaluate the FIM in NLMEM accounting for uncertainty in parameters and in models. We showed an illustration of this approach to optimize robust designs for repeated count data. When introducing uncertainty on the population parameters, we evaluated the robust FIM as the expectation of the FIM computed by MC/HMC on these parameters. Then, the compound D-optimality criterion was used to find a common CD-optimal design for several candidate models. The compound DE-optimality criterion was also calculated to find the CDE-optimal design which was robust with respect to both model and parameters. These methods were applied in a longitudinal Poisson count model which event rate parameter λ is function of the dose level. We assumed a log-normal a priori distribution characterizing the uncertainty on the population parameter values as well as several candidate models describing the relationship between log λ and the dose level. We performed combinatorial optimization of 2 among 0 doses. Finally, we noticed that misspecification of models could lead to low D-efficiencies. The CD- or CDE-optimal designs provided then a good compromise for different candidate models. TCS060.2 Crossover designs with a random interaction between subjects and treatments Kunert J. TU Dortmund, Statistics, Dortmund, Germany The talk considers a model for crossover designs with carryover effects and a random interaction between treatments and subjects. Under this model, two observations of the same treatment on the same subject are positively correlated and therefore provide less information than two observations of the same treatment on different subjects. Bludowsky, Kunert and Stufken (Australian and New Zealand Journal of Statistics, 205) derived optimal designs for this model, if the number of periods is less than the number of treatments. In this talk, we consider the case when there are more periods than treatments. TCS060.3 Optimal design for a longitudinal study in item response theory Holling H., Freise F. 2, Schwabe R. 3 University of Münster, Münster, Germany, 2 TU Dortmund University, Dortmund, Germany, 3 University Magdeburg, Magdeburg, Germany Repeated testing of intelligence may lead to considerable gains in ability scores. To efficiently analyze these gains we will develop an optimal test design for such studies where measurement of intelligence is based on a multidimensional logistic Rasch model. Item difficulties are assumed to be known in order to efficiently estimate individual response curves for the abilities due to retesting or learning. 64

67 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS060.4 D- and E-optimal designs in multiple group random coefficient regression models Prus M. Otto-von-Guericke-Universität, Mathematics, Magdeburg, Germany Random coefficient regression (RCR) models are very popular in statistical application, especially in biosciences and medical research. In these models observational units (individuals) are assumed to come from the same population with an unknown population mean and differ from each other by individual random parameters. In the special case of multiple group RCR models individuals in different groups get different kinds of treatment. If group sizes are fixed and the unknown mean parameters may differ from group to group, statistical analysis can be performed in each group separately (see Prus (205)). Here analytical results for optimal group sizes in multi group models with a common population mean for all individuals across all groups for D- and E- criteria will be presented. TCS06a Methodological Challenges in Observational Studies: Promoting Best Practice and Relevant Research, Part Thu :30-3:00 Lecture Hall HS 5 Organizer(s): Willi Sauerbrei Chair(s): Michal Abrahamowicz Observational (non-experimental) studies simultaneously pose a number of analytical challenges that have stimulated intense research in many areas of modern biometrics and biostatistics. The methodological literature grows exponentially yet most methodology is never put to widespread use in applications. Moreover, much methodology addresses single issues, when typically many issues arise together. In response, the international initiative STRATOS (STRengthening Analytical Thinking for Observational Studies) was formed in 203 to stimulate a systematic comparison of pros and cons of alternative methodologies, identification of the issues that remain unresolved and promising approaches for tackling them and crucially, communicating the results to researchers spanning the spectrum from methodological development to application. The STRATOS initiative has grown rapidly to involve over 80 researchers from 6 countries on 4 continents (see Currently there are 9 topic groups. The motivation, mission, structure and main aims of the initiative as well as first results from five topic groups are presented in two connected sessions. The first session is chaired by Michal Abrahamowicz. After a general introduction on the necessity of guidance documents about modelling issues in observational studies, representatives of five topic groups will give their perspectives on recent methodological progress and key outstanding problems not least to the need to focus on practically relevant methodological challenges and communication and implementation of new developments. TCS06a. Motivation, mission, structure and main aims of the STRATOS initiative Sauerbrei W., Perperoglou A. 2, Schmid M. 3, Collins G. 4, Huebner M. 5, Abrahamowicz M. 6, on behalf of the STRATOS initiative University Medical Center Freiburg, Institute of Medical Biometry and Informatics, Freiburg, Germany, 2 University of Essex, Essex, United Kingdom, 3 University of Bonn, Bonn, Germany, 4 University of Oxford, Oxford, United Kingdom, 5 Michigan State University, East Lansing, United States, 6 McGill University, Montreal, Canada The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times, unfortunately often ignored in practice. Part of the underlying problem may be that even experts (whoever they are) do often not agree on potential advantages and disadvantages of competing approaches. Furthermore, many analyses are conducted by applied researchers with limited experience in statistical methodology and software. The lack of guidance on vital practical issues discourages them from using more appropriate methods. Consequently, analyses reported can be flawed, casting doubt on their results and conclusions. The main aim of the STRATOS initiative is to develop guidance documents for researchers with different levels of statistical knowledge. Topics are study design, initial data analysis, missing data, measurement error, variable selection, evaluating models, causal inference, survival analysis, and high dimensional data. We discuss general issues and illustrate some work of the topic group 'Selection of variables and functional forms in multivariable analysis'. Which methods are used in practice, what are strengths and weaknesses of methods used for variables selection and what about procedures used to estimate the functional relationship of a continues variable? Concerning the latter, traditional approaches (assuming linearity; categorization and step functions), fractional polynomials (FPs) and splines are most popular. Advantages of FPs over traditional approaches will be discussed. Details about issues in spline modeling will be presented in a related talk. TCS06a.2 Spline regression modeling using R - methods and first results Schmid M., Perperoglou A. 2, Sauerbrei W. 3, Abrahamowicz M. 4, topic group 2 of the STRATOS initiative University of Bonn, Bonn, Germany, 2 University of Essex, Essex, United Kingdom, 3 University of Freiburg, Freiburg, Germany, 4 McGill University, Montreal, Canada The talk will provide an overview of options that are available in R for building multivariable Gaussian, logistic and Cox regression models using splines. It will compare different methodological approaches and provide practical guidance on available software, in particular with regard to the selection of hyper-parameters. Furthermore, first results on the performance of the approaches with respect to the selection of relevant predictor variables and the detection of functional forms we will be presented. 65

68 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS06a.3 Statistical methods to address measurement error in observational studies: current practice and opportunities for improvement Shaw P., Deffner V. 2, Dodd K.W. 3, Freedman L.S. 4, Keogh R.H. 5, Kipnis V. 3, Kuechenhoff H. 2, Tooze J.A. 6, STRATOS TG4 University of Pennsylvania, Biostatistics and Epidemiology, Phildadelphia, United States, 2 Ludwig-Maximilians-University of Munich, Department of Statistics, Munich, Germany, 3 National Cancer Institute, Division of Cancer Prevention, Bethesda, United States, 4 Gertner Institute for Epidemiology and Health Policy Research, Biostatistics Unit, Tel Hashomer, Israel, 5 London School of Hygiene and Tropical Medicine, London, United Kingdom, 6 Wake Forest School of Medicine, Department of Biostatistical Sciences, Winston-Salem, United States Errors commonly occur in the measurement and classification of many variables of interest in epidemiological observational studies. However, in many fields of epidemiological research the impact of such errors is either not well-understood or is ignored. As part of the STRengthening Analytical Thinking for Observational Studies (STRATOS) Initiative, a Task Group on Measurement Error and Misclassification (TG4) was formed. The goal of this working group was to assess current practice in the biomedical literature for analyses that have error prone data, as well as to develop a guidance document to improve current practice. As part of this effort, a literature review was conducted in four types of research studies which are subject to exposure measurement error: ) nutritional cohort studies, 2) nutritional surveys, 3) physical activity cohort studies, and 4) pollution cohort studies. This survey revealed that while researchers in these areas were generally aware of issues in measurement error that affected these studies, very few researchers adjusted for the error in their analysis. Furthermore, most articles provided incomplete discussion of the potential effects of measurement error on their results. Using a data example from nutritional epidemiology, we illustrate how measurement error can bias analysis and highlight a practical method (regression calibration) to address error in the analysis. Use of methods to correct for error depends on the availability of data to inform estimation of the magnitude and nature of the errors. Currently there is a great need to incorporate the collection of such data within study designs. TCS06b Methodological Challenges in Observational Studies: Promoting Best Practice and Relevant Research, Part 2 Thu :30-6:00 Lecture Hall HS 5 Chair(s): Willi Sauerbrei Observational (non-experimental) studies simultaneously pose a number of analytical challenges that have stimulated intense research in many areas of modern biometrics and biostatistics. The methodological literature grows exponentially yet most methodology is never put to widespread use in applications. Moreover, much methodology addresses single issues, when typically many issues arise together. In response, the international initiative STRATOS (STRengthening Analytical Thinking for Observational Studies) was formed in 203 to stimulate a systematic comparison of pros and cons of alternative methodologies, identification of the issues that remain unresolved and promising approaches for tackling them and crucially, communicating the results to researchers spanning the spectrum from methodological development to application. The STRATOS initiative has grown rapidly to involve over 80 researchers from 6 countries on 4 continents (see Currently there are 9 topic groups. The motivation, mission, structure and main aims of the initiative as well as first results from five topic groups are presented in two connected sessions. The second session by Willi Sauerbrei. After continuing the presentation of results from five topic groups the session will close with a general discussion on future challenges and plans of the initiative and on possibilities to join or cooperate with STRATOS. TCS06b. Issues in popular designs for observational studies Altman D., on behalf of STRATOS TG5: Study Design Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom The aim of TG5 is to provide accessible and accurate guidance for the design of observational studies. There already exist several guidelines for reporting the findings of observational research, notably the STROBE Statement and various extensions, e.g. when using routinely collected data (RECORD), or for examining genetic associations (STREGA). We will discuss progress in developing guidance for planning observational studies. Our initial focus is on choosing an appropriate design for estimating the association of a specific exposure on a particular disease. When appropriate and possible, it can be helpful to conceptualise a hypothetical RCT which addresses the hypothesis under study. This concept provides the structure for thinking about the observational study. Consideration of key differences between the RCT and the observational study follows, including the sampling mechanism that results in a cohort, case-control or other design. The hypothetical RCT design will inform the design of the observational study, including details of the outcome and how it is measured. This process leads to thinking about information and selection bias, and steps that can be taken in the design to minimise bias. We will also discuss issues in some more recent designs. TCS06b.2 Prognostic studies and the need for guidance Sekula P., on behalf of TG5: Study Design Medical Center, University of Freiburg, Genetic Epidemiology, Institute for Medical Biometry and Statistics, Freiburg, Germany Although prognostic studies on single biomarkers have been conducted for many years, they have become increasingly common partly because of the demand of medical decisions tailored to individual patients. The need for guidance has long been recognised. Despite their postulated clinical usefulness in patient care, few biomarkers have been successfully implemented in routine clinical practice, giving rise to discussions to understand the reasons. Methodological weaknesses are common regarding design, conduct, analysis and reporting of such studies, which are usually of an observational nature. Many of these issues could have been avoided if the study had been planned adequately. In cancer research, where clinical data and biosamples from patients are often collected during routine care, studies on the prognostic potential of a specific biomarker can be conducted rather easily and quickly by making use of these archives. These circumstances may tempt researchers to proceed in a 'quick and dirty' fashion without thorough designing of the project. I will present examples from published literature to illustrate the need for guidance regarding different aspects related to study design such as research question, sample size calculation or selection bias. Although the focus is on prognostic studies, many of the discussed issues are also relevant to other types of studies including diagnostic studies. 66

69 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS06b.3 Analysis of high-dimensional data: guidance or (best) practice? Rahnenführer J. TU Dortmund University, Department of Statistics, Dortmund, Germany In many areas of biostatistics, the increasing use and availability of "big data" has created challenges both in data handling and choosing appropriate statistical methods and algorithms. In molecular medicine, "omics" data are ubiquitous and have stimulated extensive collaborations between statisticians, bioinformaticians, biologists, and medical researchers. Electronic health records contain not only standard medical and clinical data on a patient history, but also information from many providers involved in a patient's care, often combining a wealth of data sources. The availability of high-dimensional data with large numbers of observations and/or variables has triggered advances in statistical methodology and machine learning methods, for many tasks in data mining, inference, and prediction. In this talk, we will discuss how the topic group 'high-dimensional data' (TG9) of the STRATOS initiative intends to provide guidance in the jungle of opportunities and pitfalls for the analysis of high-dimensional biological and medical data, making also use of concrete and descriptive examples. Here, the notion 'high-dimensional' does foremost apply to the situation of the availability of a large number of variables. In particular, we present aspects regarding the subtopics of TG9: data pre-processing, exploratory data analysis, data reduction, multiple testing, prediction modeling/algorithms, comparative effectiveness and causal inference, design considerations, data simulation methods, and resources for publicly available high-dimensional data sets. TCS06b.3 Analysis of high-dimensional data: guidance or (best) practice? Rahnenführer J. TU Dortmund University, Department of Statistics, Dortmund, Germany In many areas of biostatistics, the increasing use and availability of "big data" has created challenges both in data handling and choosing appropriate statistical methods and algorithms. In molecular medicine, "omics" data are ubiquitous and have stimulated extensive collaborations between statisticians, bioinformaticians, biologists, and medical researchers. Electronic health records contain not only standard medical and clinical data on a patient history, but also information from many providers involved in a patient's care, often combining a wealth of data sources. The availability of high-dimensional data with large numbers of observations and/or variables has triggered advances in statistical methodology and machine learning methods, for many tasks in data mining, inference, and prediction. In this talk, we will discuss how the topic group 'high-dimensional data' (TG9) of the STRATOS initiative intends to provide guidance in the jungle of opportunities and pitfalls for the analysis of high-dimensional biological and medical data, making also use of concrete and descriptive examples. Here, the notion 'high-dimensional' does foremost apply to the situation of the availability of a large number of variables. In particular, we present aspects regarding the subtopics of TG9: data pre-processing, exploratory data analysis, data reduction, multiple testing, prediction modeling/algorithms, comparative effectiveness and causal inference, design considerations, data simulation methods, and resources for publicly available high-dimensional data sets. TCS062 Complete, Transparent and Unbiased Reporting as a Requisite in Research Wed :30-6:00 Lecture Hall HS 3 Chair(s): Willi Sauerbrei Panelist(s): Anne-Laure Boulesteix, Dirk Eyding, Peggy Sekula For many years the quality of research in the health sciences has been heavily criticized. It is argued that serious improvement would be possible if biomedical research is better chosen, designed, executed, analyzed, regulated, managed, disseminated, and reported. Serious improvements are far from being simple for many of the issues mentioned, but suitable guidance documents have been developed to improve on the reporting of research. Severe weaknesses in this area are unnecessary and can be avoided. Concerning issues in reporting of health science the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network acts as an umbrella organization. Unfortunately, many reviews of publications have clearly shown that the quality of reporting of studies is still bad. Problems seem to be less severe for RCTs than for observational studies. In the latter even basic items of the study population and relevant details of statistical analyses are often not provided. In general, there are plenty of conceivable approaches to statistically analyze data that both make sense from a substantive point of view and are defensible from a theoretical perspective. It is not uncommon that several approaches are conducted, the analysis with the most satisfactory result is selected and published. Consequently, the published literature gives a seriously biased impression, causing severe harm for the results of systematic reviews and meta-analyses. An unbiased assessment of the importance of many factors relevant for decision making in areas like risk assessment, prognosis or treatment is often impossible. In one longer introductory talk and three more specific short talks we will give a general impression about the seriousness of bad reporting and problems it causes for research in the health sciences and for the care of patients. Speakers will discuss similarities and differences of problems in the areas they presented. Jointly with the audience we will aim to identify suitable approaches for substantial improvement of reporting in the near future. TCS062. Complete, transparent and unbiased reporting as a requisite in research Altman D. Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom Clinical research has value only if the study methods have validity and the research findings are published in a usable form. Research publications should not mislead, should allow replication (in principle), and should be presented so that they can be included in a subsequent systematic review and meta-analysis. Deficiencies in research methods have been observed since early in the last century, and major concerns about reporting - especially of clinical trials - have been expressed for over 50 years. The World Medical Association's Helsinki Declaration states that "Researchers have a duty to make publicly available the results of their research on human subjects and are accountable for the completeness and accuracy of their reports." Many studies are never published and, for those that are, a wealth of evidence demonstrates widespread selective reporting of research findings, leading to bias, and inadequate reporting of research methods and findings that prevents readers using the information. The present unacceptable situation is shocking; so too is its wide passive acceptance. The EQUATOR Network was created in 2006 as a concerted effort to bring together guidance on research conduct and reporting that 67

70 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed existed in a fragmentary form across the literature, and promote its dissemination and adherence. Many reporting guidelines exist but their impact is as yet modest. I will consider what actions are needed by different stakeholders to help to raise standards more rapidly. TCS063 Collaboration Space between Biostatistics and Pharmacometrics: Opportunities and Challenges Thu :30-:00 Lecture Hall HS 5 Chair(s): Jose Pinheiro Discussant(s): Efthymios Manolis Biostatistics and clinical pharmacology are pivotal quantitative disciplines in drug development, having well-established roles and recognized importance. Areas of overlap between them offer both enormous potential for synergies, as well as possible challenges for collaboration. Perhaps chief among those overlapping areas is the relatively new discipline of pharmacometrics, which, though typically more associated with clinical pharmacology, has been increasingly embraced by biostatisticians. Pharmacometricians and biostatisticians can greatly benefit from mutual learning and effective interactions. Indeed, there are already many examples in practice where synergistic collaboration between the two disciplines have led to remarkable accomplishments. This session will focus on the identification of opportunities for improving drug development through effective collaboration between biostatisticians and pharmacometricians, with real-life examples showcasing the value that such interactions can bring. Potential challenges for collaboration, both real and perceived, will be discussed, together with proposals on how to address them. Recommendations on the path forward to promote widespread, effective collaboration between the two disciplines will be presented and discussed. TCS063. Drug development needs quantitative sciences that incorporate domain knowledge and quantify risks to optimize learnings and decisions Looby M., Sander O., Guettner A. Novartis, Basel, Switzerland Drug development is process whereby we determine the utility of new treatments based on an on-going assessment the benefit-risk relationship. Central to this process, is the task of learning how to administer the treatment to increase the chances of success, given the constraints imposed by the drug properties, and the clinical setting. This learning process is referred to broadly as dose finding. It is an iterative process often over several clinical studies whereby one needs to select doses based, on often limited, information, in order to assess the benefit-risk relationship more comprehensively in subsequent steps. Given there may great uncertainty whether the drug has any net benefit at all or whether the doses selected to assess it are adequate. This circular problem of dose-finding has been identified as one of the greatest challenges in drug development. To best meet this challenge, it is necessary to consider drug and disease properties in programme designs. We must ensure our analyses methods and designs are fit for purpose, capture the data and analyses to generate the best information efficiently to underwrite the decisions on dose selection. Pharmacometrics is the discipline of characterizing drug action using mechanistic models that capture the longitudinal nature of drug exposure and drug response in individual patients and trial populations. The challenge for drug developers is to find a way to incorporate such valuable information captured by such models and incorporate it in a statistically sound manner in our decision making. We present an example of a combined pharmacometrics and statistics approach to optimally support dose finding of a novel medicine that was motivated by limitations in the traditional methods. We believe this example highlights the need to better integrate pharmacometrics in our statistical development plans and help tackle one of the biggest challenges in drug development. TCS063.2 Bridging the gap between statisticians and pharmacometricians Mentré F. University Paris Diderot / INSERM, IAME, Paris, France Pharmacometrics is widely used in drug development and drug care and relies on a complex statistical tool: nonlinear mixed effect models (NLMEM). The tools for estimation, (covariate) model building, model evaluation, design, statistical inferences in NLMEM were developed mainly by academic statisticians. Those tools are now used mainly by pharmacologists and (some) biostatisticians are still reluctant to use them. The gap, mainly within pharmaceutical industry, between statisticians and pharmacometricians is still important. We will discuss some of the technical pitfalls in both communities on the use (or lack of use) of NLMEM and how a bridge could be built via education and communication. Model based adaptive design is one example of a common method that could link those two groups. TCS063.3 Through the wormhole: connecting statistics and pharmacometrics De Ridder F. Janssen Research & Development, Beerse, Belgium In modern clinical drug development biostatistics and pharmacometrics are two key functions applying quantitative principles to data. Both fields have a long history and have evolved in varying degrees of isolation in the academic, regulatory and industrial environment. Over the last decades, attempts to improve the efficiency of drug development, known under different labels such as Modeling & Simulation, Model-based Drug Development and, most recently, Model-informed Drug Discovery and Development, have brought these disciplines closer together. Pharmacometricians and statisticians often work on the same data, tackle similar questions and are both quantitatively oriented. As such collaboration between them seems inevitable, but is not always the case. To some extent the division between the fields has been artificial, driven by education, tradition and the typical structure of large organisations such as many pharmaceutical R&D companies. Communication is the key for a successful collaboration. This requires that we speak each other's languages, know each other's tools as well as ways of thinking. This can be achieved by appropriate training, go to each other's conferences and job rotation. Organizational structures need to foster continuous interactions. There are two broad types of potentially valuable interactions between biostatistics and pharmacometrics. In the daily work of drug development, any project might benefit from the exchange of viewpoints, ideas and expertise. I will discuss two such case studies: () injection of a dose of empiricism into highly mechanistic viral kinetic models in the development of antiviral agents and (2) mechanistic PK/PD inspiring a dose-response model to guide an adaptive trial design. 68

71 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed In the methodological field, some areas have been dominated by developments in statistics, pharmacometrics or sometimes both. Examples include estimation methods and optimal designs for non-linear mixed models, model diagnostics, handling of missing data and (model-based) meta-analysis. TCS064 Statistical Methods in Outcome Research Wed :30-:00 Lecture Hall KR 7 Chair(s): Irene Schmidtmann TCS064. Propensity score: an alternative method of analyzing intervention effects Kuss O. German Diabetes Center, Institute for Biometrics and Epidemiology, Düsseldorf, Germany There is agreement in medical research that the preferred method for evaluating interventions is the randomized controlled trial. Randomization is the only method that guarantees similar distributions of known and unknown patient characteristics between an intervention and a control group thus enabling true causal statements on intervention effects. However, randomized controlled trials are in some cases "unnecessary, inappropriate, impossible, or inadequate" and have also been criticized for a lack of external validity: patients in randomized controlled trials are usually younger and healthier than the average patient. Non-randomized studies can be an alternative here, however, they suffer from a lack of internal validity: treatment allocation is not randomized and the intervention and control groups may be systematically different in terms of known and (even worse) unknown patient characteristics. A range of statistical procedures have been developed to take account of these differences during analysis. The standard procedures for this are multiple regression models, however, propensity scores (PS) are also increasingly used. The propensity score is defined as the probability that a patient receives the intervention under investigation. In a first step, the PS is estimated from the available data, e.g. in a logistic regression model. In a second step, the actual intervention effect is estimated with the aid of the PS. In this talk, we give a short, nontechnical introduction to the propensity score using an example from coronary bypass surgery. TCS064.2 Identification of healthcare providers with unusual performance in administrative databases: an overview of Bayesian approaches and an application to adenoma detection in colorectal cancer screening Stock C.,2, Uhlmann L. 2, Hoffmeister M., Laux G. 3, Kieser M. 2, Brenner H. German Cancer Research Center (DKFZ), Division of Clinical Epidemiology and Aging Research, Heidelberg, Germany, 2 University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany, 3 Heidelberg University Hospital, Department of General Practice and Health Services Research, Heidelberg, Germany In the evaluation of healthcare practice it is often of interest to detect unusual performance among individual providers, i.e. either high or low performers, using cross-sectional administrative databases. Among a variety of statistical methods that has been proposed for this purpose Bayesian methods are particularly attractive because of the flexibility in modelling assumptions and the possibility of predictions from hierarchical models that fully account for parameter uncertainty. We first present a brief overview of Bayesian methods for detection of unusual performance, focusing on a hierarchical modelling framework that was proposed by Ohlssen, Sharples and Spiegelhalter (J. R. Statist. Soc. A (2007), 70 (4), ) and that distinguishes hypothesis testing and estimation approaches. We outline the suitability of the analytical options depending on the research question and summarize general recommendations. Detection of unusual performance is then illustrated using the example of adenoma detection in colorectal cancer screening by colonoscopy. The adenoma detection rate (i.e. the rate of colonoscopies in which at least adenoma is found) is a major surrogate measure of performance quality for physicians offering screening colonoscopy. It was determined for 422 physicians from Bavaria, Germany, who performed preventive colonoscopies in total over a period of 2 months. A robust random-effects 'null model' for the performance of the majority of physicians was developed and the divergence of individual physicians from this model was determined by taking into account multiple testing (using a false discovery rate of 5%). Sixty-two physicians (5%) with potentially unusual performance and eventually 0 physicians (2%) with unusual performance were identified. The applied approach allows the identification of unusual performance in the absence of a postulated minimum requirement (i.e. a minimum adenoma detection rate). We further illustrate the possible subsequent determination of such minimum requirements for an average case mix of patients. TCS064.3 Multilevel logistic regression analysis to explore cluster-level covariates with an application to the "Quality Improvement in Postoperative Pain Management" (QUIPS) project Scherag A., Komann M. 2, Meißner W. 2, for the QUIPS investigators Jena University Hospital, Research Group Clinical Epidemiology, CSCC, Jena, Germany, 2 Jena University Hospital, Department of Anesthesiology and Intensive Care Medicine, Jena, Germany Multilevel data occur frequently in health services or epidemiologic research. For binary outcomes that are frequently analyzed in medicine, Austin and Merlo (207) have recently discussed several topics centered around multilevel logistic regression models. These models address correlations of subjects within clusters of higher-level units. Typically, investigators apply these models to correct subject-level predictors for clustering effects. Austin and Merlo (207) widen this perspective - particularly for the case when clusterlevel covariates are of interest (such as hospital volume). In the presentation, I summarize and extend their ideas with an additional focus on practical modeling challenges. I demonstrate and apply the methods to data from the years 20 to 204 of the QUIPS project - the world s largest acute pain registry (Meißner et al., 207). References: Austin PC, Merlo J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 207 (in press). Meißner W, Komann M, Erlenwein J, Stamer U, Scherag A. The Quality of Postoperative Pain Therapy in German Hospitals. Dtsch Arztebl Int. 207;4(0):

72 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS065 Dealing with Heterogeneity and Multiplicity in Small Clinical Trials Thu :30-:00 Lecture Hall HS 4 Organizer(s): Susanne Urach Chair(s): Ludwig Hothorn In rare diseases multiplicity issues and challenges due to heterogeneity in the patient population are aggravated by the availability of only small sample sizes. In this session we explore potential solutions for these challenges in single trials as well as in meta-analyses of few studies. In small populations the reason for looking at multiple endpoints are twofold: on the one hand some treatment effects cannot be characterized by a single measure, on the other statistical power of tests can be increased by combining the treatment effect estimates from several outcome measures. A comprehensive overview on recent methods to achieve both aims in clinical trials with small sample sizes will be given. The treatment benefit on all of several endpoints can also be calculated on an individual level using a multi-component endpoint. A partial ordering to derive univariate scores and applying non-parametric tests will be offered. For rare diseases with heterogeneous symptoms that differ substantially from patient to patient the evaluation of the treatment effect of a new drug is challenging. In such settings Goal Attainment Scaling, a measurement instrument to evaluate the effect of an intervention on the basis of individual, patient-specific goals, has been proposed and the most efficient analysis method of such data will be presented. If available data is scarce and heterogeneous, incorporation of historical or external data in the design and analysis of a new clinical trial is of particular interest. Dynamic borrowing through empirical power priors could be employed to estimate the power parameter in the power prior formulation. A method to control type I error by calibrating the degree of similarity between the new and historical data will be displayed. Heterogeneity poses limitations on meta-analysis methods to statistically summarize results from just a few individual studies. A sufficient number of studies is needed as most common statistical methods rely on asymptotic properties. Results of the performance of frequentist meta-analysis methods for only two studies will be given. TCS065. Analysing multiple endpoints in small clinical trials Ristl R. Medical University of Vienna, Wien, Austria The analysis of multiple endpoints in small clinical trials provides the opportunity to extract additional overall information from a limited sample, at the same time any multiplicity adjustment will decrease the power for a definite conclusion on particular endpoints. The presentation will summarize the results on appropriate small sample methods for the analysis of multiple endpoints obtained in the EU FP7 project ASTERIX. In particular, key concepts identified from the literature will be briefly summarized, followed by the presentation of an optimization framework for multivariate exact discrete tests, and small sample adjustments to asymptotic inference methods across multiple marginal models for dependent data. TCS065.2 Statistical properties of hypothesis tests for a Goal Attainment Scaling endpoint Urach S., Gaasterland C. 2, Rosenkranz G., Jilma B. 3, Roes K. 4, Van der Lee H. 2, Posch M., Ristl R. Medical University of Vienna, CEMSIIS, Vienna, Austria, 2 University of Amsterdam, Academic Medical Center, Amsterdam, Netherlands, 3 Medical University of Vienna, Department of Clinical Pharmacology, Vienna, Austria, 4 University Medical Center Utrecht, Julius Center for Health Sciences and Primary Care, Utrecht, Netherlands Goal Attainment Scaling (GAS) is a measurement instrument to evaluate the effect of an intervention on the basis of individual, patientspecific goals. The effect of a treatment on a particular goal is mapped in a pre-specified way to a common ordinal scale. The advantages of this measurement approach are the utilization of patient-centered outcomes and the possibility to combine the information from patients in heterogeneous populations. The latter is of particular interest in rare disease research, because it allows for as large as possible samples. Here we focus on the statistical aspects of using GAS data for the comparison of two treatment groups in a randomized clinical trial. A data generating model is set up based on the assumption of underlying latent multivariate normal responses, which are assumed to be connected with the unknown treatment effect on some common underlying physiological process. The actual ordinal observations are obtained by discretizing these continuous outcomes via thresholds. An extensive simulation study was carried out to find the optimal weighting strategy and parameter settings for GAS data fullfilling the proposed model assumptions. We discuss the scope of possible null hypotheses for the between-group comparison and review methods to aggregate the data on multiple goals within each patient. The results will be illustrated with a clinical trial example concerning children with cerebal palsy. This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH ASTERIX Project - TCS065.3 Dynamic borrowing through empirical power priors that control type I error Nikolakopoulos S., Roes K. University Medical Center Utrecht, Utrecht, Netherlands In the design and analysis of clinical trials, type I error control is a major concern. Prospective rules for inclusion of historical data in the design and analysis of trials is essential for controlling the bias and efficiently using available information. Such rules may be of interest in the case of small populations where available data is scarce and heterogeneity is less well understood, and thus conventional methods for evidence synthesis might fall short. Particularly for borrowing evidence from a single historical study, the concept of power priors can be useful. Power priors employ a parameter γ [0, ] which quantifies the heterogeneity between the historical study and the new study. However, the possibility of borrowing data from a historical trial will usually be associated with an inflation of the type I error. We suggest a new, simple method of estimating the power parameter suitable for the case when only one historical dataset is available. The method is based on predictive distributions and parameterized in such a way that the type I error can be controlled by calibrating the degree of similarity between the new and historical data. The method is demonstrated for normal responses in a one or two group setting but the generalization to other models is straightforward. TCS065.4 Meta-analysis methods revisited for only a few studies Smith A. MHH, Institut für Biometrie, Hannover, Germany Meta-analysis is a well-established tool in evidence synthesis to statistically summarize results from individual studies. Methods are commonly classified into fixed effect (FE) meta-analysis and random effects (RE) meta-analysis. The DerSimonian and Laird (DL) 70

73 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed estimator is traditionally used for RE meta-analysis, but recently other estimators for the between study variance have been compared where the Paule and Mandel (PM) estimate seems to be more promising for binary outcomes (Novianti et al. 204, Veroniki et al. 206, Langan et al. 205; Langan et al. 206). The Hartung-Knapp-Sidik-Jonkmann approach is another approach, which can be applied to both, FE and RE meta-analysis and uses the t-distribution. In order to efficiently perform meta-analysis, however, a sufficient number of studies is needed as most common statistical methods rely on asymptotic properties. The performance of frequentist meta-analysis methods for only a few studies and in the most extreme case of only two studies was rarely considered before even though this is quite common (Turner, 202; Langan et al. 205). Limitations in meta-analysis with a few studies, especially in the light of heterogeneity, will be presented. TCS065.5 Statistical methods and decision approaches for heterogeneous small-sample GWAS Wittkowski K. The Rockefeller University, Center for Clinical and Translational Science, New York, United States Genome-wide association studies (GWAS) are commonly believed to require hundreds of thousands of subjects yet SNPs identified at conventional levels (-log0(p)>7.3) are rarely replicated. Here, we will present two approaches, which, in conjunction, have succeeded in GWAS of subjects from existing epidemiological and phase 2/3 data. Most GWAS are analyzed one SNP at a time, even though disease alleles contributing individually to a disease phenotype are typically selected against. U-statistics for multivariate data avoid artifacts from strong model assumptions (independence and additivity of neighboring SNPs), but became feasible only in (32-bit OS, cloud, GPUs). To increase power, u-statistics were extended to integrate knowledge about genetics (heterodominance, LD-structure, cis-epistasis). The conventional 7.3 level for genome-wide significance is widely believed to be overly conservative, yet no theoretically justified alternative has been developed. Replacing this heuristic level with an adaptive estimate that reflects the nature of GWAS (population characteristics, non-randomized design, dependency between minor allele frequency and highest possible significance) fills this void. The statistical approach was validated in epilepsy (Wittkowski 203), identified a novel intervention in Crohn s disease (L-fucose), suggested the first disease-modifying treatment for children developing autism (Wittkowski 204), and identified a drug to downregulate "derailed endocytosis" in breast cancer and Alzheimer s/parkinson s. With complex diseases, functionally similar genes may be affected in different populations. When combining several (including several "Caucasian") populations into a "meta-analysis", such differences would "dilute" each other, so that many large GWAS have failed to generate hypotheses for addressing medical unmet needs even in diseases with known heritable components. GWAS of only subjects can also identify populations most likely to respond to study medications from phase 2/3 trials. Disclaimer: The presenter has an interest in a company, ASDERA, that has out-licensed the rights to a drug developed using the approaches described here. TCS066 Characterizing Clinical Dose Response Studies Fri :45-4:5 Lecture Hall HS 4 Organizer(s): Sandeep Menon Chair(s): Simon Kirby Discussant(s): Neal Thomas Dose-finding clinical trials play an indispensable role in learning the dose response and identifying the right dose. The emphasis of conducting well-controlled clinical studies to sufficiently characterize the relationship between doses and clinical response is shared by both the pharmaceutical industry and regulatory authorities (ICH, 994; FDA, 2003; EMA, 2006). One of the main challenges is to determine the dose response model. There is a strong support to adopt a model-based approach to dose-finding based on a number of meta-analytical studies in the literature. Some model-based approaches to dose response analysis and dose selection such as the MCP-Mod method involve specifying and averaging several parametric dose response models in the model selection phase (Bretz et al., 2005; Pinheiro et al., 2006; Bornkamp et al., 20). For some diseases, meta-analysis permits the discovery and validation of dose response models by aggregating clinical data of many trials, which cannot be done in individual trials. Specifically, Thomas and Roy (205) reported results of a meta-analysis for a large set of small molecule drugs across different therapeutic areas, demonstrating a consistent dose response pattern well-described by the maximal effect (Emax) model. Wu, Banerjee, Jin, et al (207) investigated a similar meta analysis and presented that Emax models can describe the dose response relationship for DNA-recombinant biological products which may have more complex PD/PK mechanisms. Through the meta-analyses of dose response trials for small molecules and biologics some non-monotonicity was observed. Hence there is an interest in understanding analyses for non-monotonic (or umbrella ordered) dose response. The implications for supporting model-based development and designing future dose-finding will be discussed. TCS066. Application and challenges of using Emax model in dose finding Roy D., Deng Q., Geng J. Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, United States Emax model is one of the most commonly used models in dose response literature and in practice in clinical trials. The model can be derived from receptor occupancy models in pharmacology when the hill parameter is a positive integer and due to this biological explanation, it has been widely seen as one of the most practical dose-response or exposure response models. However, in practice there are some drawbacks or limitations to this. For example, maximum likelihood estimation applied to an Emax model may not converge if your data does not provide enough information on the parameters. Based on simulation study, the probability of selecting the right dose is not satisfactory even when data comes from an Emax model. Especially, while estimating the target dose for a phase III study, this variability and the low coverage probability should not be ignored. This presentation aims to discuss some of these limitations faced during real time implementation and provide insights and outline possible recommendation to overcome the challenges. TCS066.2 Model-based meta-analysis of clinical dose response for biological products Banerjee A., Wu J., Jin B., Menon S., Martin S. Pfizer, Cambridge, United States Dose-finding clinical trials play an indispensable role in learning the dose response and identifying the right dose. One of the main challenges is to determine the dose response model. There is a strong support to adopt a model-based approach to dose-finding based 7

74 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed on a number of metaanalytical studies in the literature. Specifically, Thomas, Sweeney, and Somayaji (204) reported results of a metaanalysis for a large set of small molecule drugs across different therapeutic areas, showing that the hyperbolic Emax model as an appropriate model. There is an increasing interest if the hyperbolic Emax model can describe the dose response relationship for DNArecombinant biological products which may have more complex PD/PK mechanisms. We conducted a literature search across a range of disease areas and sampled a broad set of published dose-ranging trials on biological products. We meta-analyzed the data and assessed if the hyperbolic Emax model can sufficiently describe the dose response summary data. The implications for supporting model-based development and designing future dose-finding for biologics will be discussed based on the results of this meta-analysis. TCS066.3 Design and analysis of dose finding studies under model uncertainty using MCP-Mod Pinheiro J. Janssen Research & Development, Raritan, United States Identifying as adequate dose for confirmatory studies is one of the most important and difficult goals in drug development, having a critical impact on the likelihood of a compound being brought to, and staying in, the market. Despite the main goal of target dose estimation, dose finding studies are often designed, and analyzed, as mini-phase 3 studies, using hypothesis testing methods for dose selection. Model-based methods tend to be more efficient and informative for dose estimation purposes, but are often challenging to use in practice when not much is known about the dose-response relationship, at the time of study design. This talk will present, discuss and illustrate design and analysis considerations for Phase 2 (dose finding) studies using a model-based methodology relying on a combination of multiple comparison procedures (MCP) and modeling techniques, the so-called MCP-Mod method. The approach allows for model uncertainty by using a set of candidate dose-response models which are tested using MCP techniques based on model contrasts. The best of the statistically significant models, if any, is used to estimate target doses in the dose-finding step. Examples from real clinical trials will be used to illustrate the techniques and their application. TCS068 Joint Modeling and Joint Inference for Multiple Outcomes Thu :30-3:00 Lecture Hall HS 4 Chair(s): Robin Ristl Many biomedical studies address complex research questions requiring the simultaneous analysis of multiple outcome variables. Often the multiple outcomes are measured at different scales and they may be observed repeatedly within the same subject, further adding to data complexity. The specification and the actual estimation of an analysis model can be challenging, since dependencies between multiple outcomes, as well as between the outcomes and any required co-variables, need to be appropriately taken into account. Further, statistical inference will typically involve more than one model parameter. To allow for confirmative conclusions, the resulting multiple hypothesis tests or simultaneous confidence intervals need to be adjusted for multiplicity. Irrespective of the involved difficulties, the analysis of multiple outcomes can provide a notable gain in information and is worthwhile to pursue. This session will contain four talks that together give a comprehensive overview on recent methods for joint modeling and joint inference for multiple outcomes. In particular, multiple testing approaches across multiple marginal models and the formulation of true joint models will be covered, contrasting these methods with respect to modeling aims, assumptions and application possibilities. TCS068. Joint inferences for multiple outcomes of Gaussian and/or non-gaussian type in hierarchical studies Molenberghs G. Universiteit Hasselt & KU Leuven, I-BioStat, Hasselt, Belgium Hierarchical and otherwise clustered data can easily be analyzed using maximum likelihood or Bayesian methods in moderate to large samples. This may well be different in very small populations (e.g., clinical trials in rare diseases) or, on the other side of the spectrum, when samples become very large. We explore a few techniques that can be used in such settings. One method is pseudo-likelihood (or composite likelihood), where a cumbersome likelihood is replaced by a simpler function, which is easier to maximize and still produces consistent and asymptotically normal estimates. On the other hand, the sample can be split into sub-samples, each of which is analyzed separately (ideally in parallel), after which the results are funneled into a single set of inferences using appropriate combination rules. These techniques can also be combined. Using a set of examples, we illustrate how the methods work, and what the computational gains are. TCS068.4 Simultaneous inference based on multiple outcomes of possibly mixed type without using a joint model Ritz C. University of Copenhagen, NEXS, Frederiksberg, Denmark Some recent developments and results in the context of adjustment of p-values are presented and cast into a general inferential framework based on sandwich variance-type estimators that may be used for simultaneous inference on a collection of parameters that are estimated from several marginal model fits, while allowing for the absence or lack of any specification of correlations between outcomes. Thus, the proposed methodology may in some cases off er a flexible and operational alternative to joint modeling approaches, such as joint modelling of multiple longitudinal outcomes or of longitudinal and event-time outcomes. The methodology is demonstrated by means of a few data examples. Advantages and limitations are discussed. TCS068.3 Rotation tests vs. conditional multiplier resampling Gerhard D. University of Canterbury, Christchurch, New Zealand When testing treatment effects for multiple outcome variables, a corresponding multiple testing procedure has to consider a global error rate for hypotheses within and between different response variables. A rotation test approach is presented, introducing extensions for 72

75 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed the combination of marginal generalised linear and linear mixed-effects models. The approach is compared to recently developed approaches for combining inference from multiple marginal models based on a flexible use of sandwich variance estimators. TCS068.2 Dose-response analysis with multiple endpoints: the Tukey trend test based on multiple marginal models Hothorn L. Leibniz University Hannover, Hannover, Germany Tukey et al. 985 proposed a quasilinear regression approach, whereas the distribution of the maximum of arithmetic, ordinal and logarithmic dose metameter models can be achieved by the multiple marginal models approach (Pipper et al. 202). A further generalization allows the maximum of both the three regression models (dose considered quantitatively) and the Williams-type contrast (dose considered qualitatively). This versatile trend test provides four advantages: ) almost powerful for any shape of the dose-.response (including sublinear and supralinear), 2) problem-related interpretability based on confidence limits for slopes and/or contrasts, 3) widespread use in the glmm, and 4) considering multiple endpoints- including differently scaled ones- again as a double maximum test. By means of the R library(tukeytrend) (Schaarschmidt et al., 207) case studies for multinomial vector comparisons, multiple binary endpoints, and bivariate normal plus binary endpoints will be explained. References: F. Bretz and L. Hothorn. Statistical analysis of monotone or non-monotone dose-response data from in vitro toxicological assays. ATLA- Altern Lab Anim, 3(Suppl. ):8{96, JUN ISSN J. W. Tukey, J. L. Ciminar, and J. F. Heyse. Testing the statistical certainty of a response to increasing doses of a drug. Biometrics, 4():295{30, 985. doi: / R. C. Pipper, C. B. and H. Bisgaard. A versatile method for confirmatory evaluation of the e#ects of a covariate in multiple models. Journal of the Royal Statistical Society Series C-applied Statistics, 6:35{326, 202. doi: 0./j x. TCS069 Subgroup Analyses in Clinical Trials and Systematic Reviews Thu :30-:00 Lecture Hall KR 7 Organizer(s): Arne Ring Chair(s): Friedhelm Leverkus, Gerd Rosenkranz Discussant(s): Simon Day, Armin Koch The patient population of large confirmatory clinical trials will be somewhat heterogeneous with respect to demographic variables and medical conditions. When the (primary) treatment effects of these trials are evaluated either within each trial or combined within a systematic review the consistency of the effect across medically relevant subgroups (e.g. gender, age or comorbidities) should be investigated. The interpretation of the consistency is sometimes discussed controversially. Therefore, a number of working groups have taken up this topic and developed new considerations and methodologies, which shall be presented in this session. TCS069. Outcome of the EFSPI working group on subgroup analyses Dane A. DaneStat Consulting Limited, Macclesfield, United Kingdom The European Medicines Agency (EMA) has issued draft guidance on the use of subgroup analyses in confirmatory trials. Although this guidance provides good, clear proposals on the importance of pre-specification of likely subgroup effects and how to use this when interpreting trial results, it is less clear which analysis methods would be reasonable, and how to interpret apparent subgroup effects in terms of whether further evaluation or action is necessary. A PSI/EFSPI working group has been investigating a focused set of approaches to subgroup analysis which take account of the number of subgroups explored, but also investigate the ability of each method to detect true subgroup effects. This evaluation has shown that the standardized effect plot, bias-adjusted bootstrapping method and SIDES method all perform more favourably than traditional approaches such as investigating all subgroup-by-treatment interactions individually or applying a global test of interaction. Therefore, these approaches should be considered to aid interpretation and provide context for the observed results for subgroup analyses conducted for Phase 3 clinical trials. TCS069.2 Assessment of consistency of subgroup effects in clinical trials Ring A.,2, Schall R. 2,3, Grill S. 4, Brannath W. 4 medac, Biometry, Wedel, Germany, 2 University of the Free State, Mathematical Statistics and Actuarial Science, Bloemfontein, South Africa, 3 Quintiles, Biostatistics, Bloemfontein, South Africa, 4 University of Bremen, Biometrie, Bremen, Germany Subgroup analyses are often performed using a test of subgroup-by-treatment interaction, which has homogeneity is the nullhypothesis. Hence homogeneity can only be rejected, but not be confirmed through such tests. Instead, we propose an equivalence test to assess the consistency, in order to reject the hypothesis of heterogeneity. The basis for this test is the consistency ratio, which is the contrast of the treatment effects in each of the subgroups, scaled by the overall residual variability. The presentation reviews the performance of both the interaction and the equivalence test for quantitative endpoints which are analysed using a general linear model, and provides an extension for the application to binary endpoints. TCS069.3 Dealing with effect modification in the framework of benefit assessment Bender R. IQWiG, Medical Biometry, Cologne, Germany The interpretation of subgroup analyses in clinical research is still challenging. Besides clinical and statistical criteria such as pre- or post-specification, choice of effect modifiers, multiplicity, power, and the applied statistical technique the reason for conducting subgroup analyses play a role. Moreover, the interpretation of subgroup analyses may be dependent on the context in which the corresponding results are assessed. In the framework of benefit assessment it has to be acknowledged that the research question is given by the contracting agencies and not by the planners of the considered relevant studies. Additionally, the legal foundations have to be taken into 73

76 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed account, which may lead to the requirement to deal also with post hoc subgroup analyses. In the context of benefit assessment for reimbursement decisions the evaluation of superiority in important patient subgroups plays a major role. In this talk, relevant issues and challenges in the assessment of subgroup findings from the reimbursement perspective will be discussed and summarized. TCS069.4 Transferability of study results to the target population in the assessment of additional benefit according to SGB V 35a (AMNOG) Schwenke C., Kupas K. 2, Schwenke S. SCO:SSiS, Berlin, Germany, 2 Bristol-Myers Squibb GmbH & Co.KGaA, Munich, Germany In many dossiers submitted in the course of the AMNOG process the study population deviates from the target population as defined by the GBA. Thus, only a part of the study population can be used for the derivation of an additional benefit. Very often, the suitable sample becomes very small, such that significant effect on study level can not be observed any more in the relevant subpopulation. This begs the question under which circumstances a transferability of the results seen in the study population to the target population can be claimed. A common approach is the utilization of a test on heterogeneity and the deduction of comparability in case on non-significance (p>0.2). This 'proof' of the null-hypothesis, though, is no substitute for the actually needed test for equivalence ('absence of evidence is not evidence of absence'). Grouven et al. presented an approach to this question for binary endpoints at the workshop 'IQWiG im Dialog 203'. The objective was to test the null-hypothesis of 'RR ' taking into account the interaction of the populations (target vs. non-target) with the treatment (verum vs. control) on the basis of an empirical p-value. Small empirical p-values (p as suggested by IQWiG) would then enable the use of the whole study population for the derivation of an additional benefit. We broadened the approach to time-to-event data by means of resampling techniques and investigated the given one-sided significance level of p emp The suggested procedure is illustrated using a real-world example. TCS070 Applications of Modern Survival Analysis Methods Fri :45-4:5 Lecture Hall HS 5 Organizer(s): Lilla Di Scala Chair(s): Harald Heinzl Survival analysis methods are used in many clinical indications for the development of new therapeutics. These methodologies present unique challenges like competing risks, informative censoring and joint modelling with longitudinal data. This session brings together presenters from academia and pharmaceutical industry to discuss some of these challenges with a focus on applications in several disease areas. TCS070. The trend-renewal process with frailty in recurrence data Wienke A. University Halle, Institute of Medical Epidemiology, Biostatistics and Informatics, Halle, Germany Time-to-event data analysis has a long tradition in statistical applications. Models have been developed for data where each observation unit experiences at most one event during follow up. In contrast, in some applications, the subjects may experience more than one event. Recurrent events appear in many fields of science. Often such events are followed by a repair action in technical applications or a treatment in life science. A model to deal with recurrent event times for incomplete repair of technical systems is the trend-renewal process. It is composed of a trend and a renewal component. In the talk we use a Weibull process for both of these components. The model is extended to include a Cox type covariate term to account for observed heterogeneity. A further extension includes a frailty term to account for unobserved heterogeneity. We fit the extended version of the trend-renewal process to data of hospital readmission times of colon cancer patients for illustration. Pietzner D., Wienke A. (203) The trend-renewal process: a useful model for medical recurrence data. Statistics in Medicine 32, Wienke A. (200) Frailty Models in Survival Analysis. Chapman and Hall/CRC, Boca Raton TCS070.2 Solving the Fine-Gray riddle Putter H., van Houwelingen H. Leiden University Medical Center, Medical Statistics and Bioinformatics, Leiden, Netherlands The Fine-Gray approach to the modelling of competing risks data is based on a proportional hazards model for the cumulative incidence function. This approach has puzzled the bio-statistical community for three reasons:. The sub-distribution "hazard" is hard to interpret; 2. It cannot be used dynamically; 3. The proportional hazards model is fitted by means of the partial likelihood, where the risk set includes those patients who had no event yet and those who had already "died" from one of the other competing risks. The alternative approach is based on multi-state models. The advantage is that the cause-specific hazards in a multi-state model are easier to understand. The disadvantage is that the computations and the prognostic models get more complicated and have to take into account all competing risks. In this presentation we introduce a different approach. It is based on decomposing the sub-distribution in two multiplicative components: Sub-distribution hazard (t)= r(t) cause-specific hazard (t). Here, r(t) is the reduction factor that describes which fraction of the population is still at risk for the competing risk of interest. Somehow it summarizes all information from the multi-state that is relevant for this specific competing risk. Different models for the effects on covariates x on r(t) (r(t x)) will be presented and it will be shown how simple models for the cumulative incidence function can be obtained without having "dead" individuals in the risk set. Reference: Fine, J. P. & Gray, R. J. (999), `A proportional hazards model for the Sub-distribution of a competing risk, Journal of the American Statistical Association 94,

77 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS070.3 A discrete semi-markov model for sojourn times in a clinical trial with need-based treatments Glimm E., Yau L. 2 Novartis Pharma AG, Basel, Switzerland, 2 Sandoz Pharmaceuticals, Oberhaching, Germany In this talk, we use a semi-markov model to describe the history of treatment episodes in patients who are treated for a chronic condition pro re nata (i.e. as needed). The patients receive medication only when needed, for example when symptoms occur. Examples of such treatments are allergy relieves or pain medication. The patients' condition can only be assessed at regular visits and the conditions may be "treatment needed", "treatment not needed" and possibly other states. The model is setup to answer a variety of questions, like for example the total medication load in a time interval, but also the duration of episodes and the frequency of switches between the states based on a patient's history. Flowgraph models will be used to describe and analyse the data. It will be illustrated by an example how this is done. TCS070.4 Frailty modelling in long-term cardiovascular clinical studies Di Scala L., Verweij P. Actelion Pharmaceuticals Ltd, Allschwil, Switzerland In cardiovascular studies, often the primary endpoint is a time-to-event one and often the choice of using composite endpoints (which typically include a survival component) is a tool to counter the low event rate of other components. It may prove itself difficult to assess differences in mortality using standard, conventional survival analysis inference procedures in the presence of informative censoring e.g. when data from patients without follow up beyond end of treatment is not independent of the underlying disease process [DeMets 202; Campigotto 204]. This leads to a challenge in the context of the statistical and clinical interpretation of study data [Fleming 2009]. In such situations, a sudden change in the risk of failure may occur due to progression of disease and standard survival methods as Kaplan-Meier estimation may be subject to heavy bias. There is general awareness of the risks of informative censoring, however little is present to inform how to quantify this phenomenon. Joint frailty models are here employed to investigate the complexity of evaluating survival in the presence of other components and of informative termination of treatment, given that the primary endpoint triggers the end of double-blind treatment and, for example, informatively censors death. Statistical simulations are used as tool to assess the probability of numerical imbalances, under scenarios that closely approximate the reality of recent cardiovascular clinical studies. TCS07 Rank-Based Biomedical Computing and Applications Tue :30-6:00 Lecture Hall KR 9 Chair(s): Michael G. Schimek TCS07. Screening for associations limited to subpopulations: eliminating ecological correlations Verducci J. Ohio State University, Statistics, Columbus, United States Some associations between and among variables may be restricted to an unidentified subpopulation represented in a large database. An example is the occurrence of strong correlations between gene expressions in cancer cells responding to chemotherapy. A set of strong correlations tend to be restricted to a subsample because different cancer types employ different defenses [Nagel,et al. (206)]. Ranking based methods to detect such domain limited correlations (Bamattre, et al. 207) may be used to screen for subpopulations with specialized gene networks (Sampath, et al. 206). These methods assume that variables in the subpopulation have the same marginal distributions as the whole population. However, when this assumption does not hold, the screening may be over-inclusive. This talk covers a way to refine the screening process. Nagel et al. (206) DNA repair capacity in multiple pathways predicts chemoresistance in glioblastoma multiforme. American Association for Cancer Research (AACR) Bamattre, S.,Hu, R., and Verducci, J.S. (207) Nonparametric testing for heterogeneous correlation. Big and Complex Data Analysis. Springer, New York Sampath, S., Caloiaro, A., Johnson,W. and Verducci, J.S. (206) The top- K tau-path screen for monotone association in subpopulations. WIREs Comput. Stat. 8 (5) TCS07.2 A Bayesian Mallows rank model for joint analyses of cancer genomics data Zucknick M. University of Oslo, Oslo Centre for Biostatistics and Epidemiology, Oslo, Norway There is increasing interest in the joint analysis of genomic data from different sources (e.g. studies or technology platforms) or even from different disease subtypes. Typical applications could be meta-analyses of previously published genomic studies with the aim of finding consensus gene lists, joint analyses of publicly available data sets from studies that had the same endpoint in order to increase statistical power, or pan-cancer studies where one analyses data from different cancer types jointly to identify gene signatures which are shared across different cancers. However, data from different studies or different platforms can be on different scales and possibly follow different distributions. Standard clustering methods therefore require very careful scaling and normalization of the data to allow such joint analysis. An alternative approach is to transform the data to ranks and work with rank-based scale-free models. In this talk a recently developed Bayesian Mallows model for rank data will be presented, which is computationally tractable even for high-dimensional genomic applications. A mixture modeling approach is used for clustering. This produces posterior probabilities for cluster assignments of the samples and coherent posterior credibility levels of cluster-specific rankings for the genes, to quantify the uncertainty of both clusters and rank estimates. The approach is explored in cancer meta-analysis studies and in a pan-cancer application on gene expression data from twelve different cancers profiled by the TCGA Network. This talk will present joint work with Valeria Vitelli, Elja Arjas, Thomas Fleischer, Vessela N. Kristensen, Magne Thoresen, and Arnoldo Frigessi. 75

78 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed TCS07.3 A new rank-based meta-analysis approach and its application to drug gene signatures Svendova V., Schimek M.G. Medical University Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria Rank data have the advantage of being invariant to transformation and normalisation as long as the relative orderings are preserved and therefore they are most suitable for certain tasks such as the meta-analysis of gene signatures. We propose an indirect inference method that allows us to reconstruct the unknown signals underlying observed ranked lists. The obtained signal estimates represent the consensus of several rankings in the sense of meta-analysis or data integration. A simple but flexible signal plus noise model of the unobserved multiple measurements causal for the observed rankings is assumed. For the evaluation of the model based on the empirical matrix of ranks we apply a distribution function approach in combination with an adaptive Metropolis algorithm for numerical optimisation. Moreover, under the empirical distribution function we can apply non-parametric bootstrap to estimate the standard errors of the signal parameters. As a by-product, rank aggregation results can be obtained from the signal estimates. Changes in gene expression of lung cancer signatures (tumour vs. normal) were compared with gene expression responses to drug treatment in cultured human cells for a large number of drugs. Based on so-called connectivity scores, ranging between - and, ranked lists were formed. Drugs with large negative values are potential therapeutic candidates. The signal estimation approach could identify reliable candidates in agreement with previous bioinformatics research. TCS073 Evidence Synthesis: Statistics meets Formal Epistemology Thu :30-:00 Lecture Hall HS 3 Chair(s): Barbara Osimani Panelist(s): Ralph Edwards, Jürgen Landes, Ulrich Mansmann, Roland Poellinger, Jan Sprenger In a recent contribution to the debate over the so called reproducibility crisis, Andrew Gelman suggested that: "We should not get stuck in the mode in which a data set' is analysed in isolation, without consideration of other studies or relevant scientific knowledge". This session intends to bring together health scientists (Ulrich Mannsmann, LMU; Ralph Edwards, WHO Uppsala Monitoring Centre) and epistemologists (Jürgen Landes, LMU, Barbara Osimani, LMU; Roland Pöllinger, LMU; and Jan Sprenger, Tilburg) in the attempt to develop a new framework where not only different pieces of evidence are synthesized, but also various dimensions of evidence, such as reliability, consistency/coherence and (in)dependence are considered explicitly and jointly contribute to hypothesis confirmation. Whereas the standard approach to evidence assessment has relied on various indicators to evaluate various dimensions of evidence such as quality, strength, and reliability, formal epistemology has a tradition in analyzing the interaction of these various dimensions of evidence in jointly contributing to hypothesis (dis)confirmation. The aim of this session is to explore possible avenues of fruitful collaboration between the two disciplines and develop a three-layer approach to modelling statistical inference: ) A basic level of evidential support to the hypothesis at hand (and various evidence aggregation/amalgamation techniques); 2) Higher order epistemic dimensions related to the various study results and the entire body of evidence: consistency/coherence of items of evidence, (in-) dependence structure; reliability, relevance of the individual items; 3) A further level comprehending information/evidence related to these meta-epistemic dimensions themselves. Such information/ evidence relates to incentives/deterrents for bias (such as financial interests, reputation etc.), social ontology of the research domain, and regulatory constraints. A special merit of this session is to inaugurate a unitary perspective on these dimensions, which have been traditionally investigated from distinct and separate point of views. TCS074 Education for Statistics in Practice, Part Wed :30-6:00 Lecture Hall HS 2 Chair(s): Christoph Muysers, Stephanie Roll Understanding and tackling measurement error: a whistle stop tour of modern practical methods This session, continued after the coffee break, discusses the issues raised by measurement error and practical approaches for analysis that mitigate its effects. Our aim is that participants gain the knowledge and confidence to understand the effects of measurement error and to apply techniques for measurement error correction in their own work. The emphasis will be on practical application and worked examples will be used throughout. Examples will be given using the R software. The course will begin with a discussion of the effects of measurement error in regression analyses. Focus will then move to techniques for mitigating those effects via statistical analysis and study design. Several methods will be introduced, including regression calibration, simulation extrapolation (SIMEX), likelihood-based methods, and Bayesian methods. The primary focus will be on measurement error in explanatory covariates, but error in response variables will also be discussed. Issues arising from different types of error will be covered. The session will draw on the work of the STRATOS Initiative s measurement error topic group, which is led by Professor Laurence Freedman and Dr Victor Kipnis. TCS074. Education for statistics in practice understanding and tackling measurement error: a whistle stop tour of modern practical methods Shaw P., Keogh R. 2 University of Pennsylvania Perelman School of Medicine, Department of Biostatistics, Epidemiology and Informatics, Philadelphia, United States, 2 London School of Hygiene & Tropical Medicine, Department of Medical Statistics, London, United Kingdom Measurement error and misclassification of variables are frequently encountered in many fields of research and can impact strongly on the results of statistical analyses. However, investigators often do not pay serious attention to the biases that can result from mismeasurement. This session discusses the issues raised by measurement error and practical approaches for analysis which mitigate its effects. Our aim is that participants gain the knowledge and confidence to understand the effects of measurement error and to apply techniques for measurement error correction in their own work. The session will be arranged in sections of minutes. The emphasis will be on practical application and worked examples will be used throughout. Examples will be given using the freely available R software. Practical resources will be made available to course participants which will facilitate application of the methods covered. The following topics will be covered. 76

79 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Effects of measurement error: We will begin with an introduction to the effects of measurement error in statistical regression analyses. The primary focus will be on measurement error in explanatory covariates and on 'classical' measurement error. The later part of the course includes special sections on measurement error in outcome variables and more complex types of error. Methods for mitigating the effects of measurement error: There is a large literature on methods for 'correcting' the effects of measurement error. This section will focus on the following methods: - Regression calibration - Simulation extrapolation (SIMEX) - Likelihood-based methods - Bayesian methods The emphasis will be on their practical application. The advantages and limitations of the different methods will be discussed. Most methods require information on the form of the measurement error, via gold standard measures or repeated measures, or from an external source. The role of sensitivity analyses will also be considered. Specialised types of error: A variety of measurement error structures can arise from different settings, with particular examples being in nutritional epidemiology and environmental epidemiology. We will give an overview of the impacts of different types of errors, including differential and Berkson error, and outline the extension of the earlier methods to these potentially more complex situations. Special considerations for misclassification in categorical variables: While our main focus will be on error in continuous variables, we will also include a discussion of the effects of and methods to address misclassification in categorical variables, which is typically characterised in terms of sensitivities and specificities for dichotomous variables. Outcome measurement error: We consider the effects of measurement error in an outcome variable, which are different from those in a covariate. The effects of measurement error in outcomes have tended to be under-studied in the past, but there is now a growing recognition of their potential impact. Case study of a more advanced design: We will go through a detailed data example, in which we will estimate the structure of the measurement error in an exposure using regression calibration and carry out an adjusted analysis of this exposure with a time-to-event outcome. This analysis will also demonstrate methods for obtaining appropriate standard errors and confidence intervals for the association parameter of interest. Implications for study design: We will discuss how measurement error in a covariate of interest affects power and present ways to adjust the study design to accommodate measurement error. We also briefly discuss study design issues for calibration/reliability studies. About the presenters: Pam Shaw is an Associate Professor at the University of Pennsylvania Perelman School of Medicine (Department of Biostatistics, Epidemiology and Informatics). Dr. Shaw's research interests include measurement error, design of clinical trials, and chronic disease epidemiology. She has a particular interest in behavioral intervention studies and nutritional and physical activity epidemiology. Ruth Keogh is an Associate Professor at the London School of Hygiene and Tropical Medicine (Department of Medical Statistics). Aside from measurement error, Ruth's research interests include missing data, survival analysis and dynamic prediction, and case-control study design and analysis. She is especially interested in applications in cystic fibrosis and nutritional epidemiology. Drs Shaw and Keogh are representing the STRATOS initiative's measurement error topic group, which is led by Professor Laurence Freedman and Dr Victor Kipnis. References: Carroll R, Ruppert D, Stefanski L, Crainiceanu C. Measurement Error in Nonlinear Models. Chapman & Hall/CRC Press, Boca Raton, FL, Gustafson, P. Measurement error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman & Hall/CRC Press, Boca Raton, FL, R Core Team (207). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL TCS074 Education for Statistics in Practice, Part 2 Wed :30-8:00 Lecture Hall HS 2 Chair(s): Christoph Muysers, Stephanie Roll TCS074. Education for statistics in practice understanding and tackling measurement error: a whistle stop tour of modern practical methods Keogh R., Shaw P. 2 London School of Hygiene & Tropical Medicine, Department of Medical Statistics, London, United Kingdom, 2 University of Pennsylvania Perelman School of Medicine, Department of Biostatistics, Epidemiology and Informatics, Philadelphia, United States Please see above for abstract. Understanding and tackling measurement error: a whistle stop tour of modern practical methods (cont'd). 77

80 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Contributed Sessions CS00 Topics in Agricultural Statistics: Estimation Methods and Experimental Design Fri :45-4:5 Lecture Hall KR 7 CS00. An application of mixture models to estimate the size of Salmonella infected flock data through the EM algorithm using validation information Azevedo C., Bohning D., Arnold M. 2, Maruotti A. University of Southampton, Southampton Statistical Sciences Research Institute, Southampton, United Kingdom, 2 Animal and Plant Health Agency, Department of Epidemiological Sciences, Weybridge, United Kingdom Capture-recapture methods are used to estimate the total size N of a target population of interest when it is partially observed. Due to an incomplete identification/registration mechanism in real life applications, we observe only the positive counts representing the number of repeated identifications, and in order to estimate N, we might be able to predict the number of units of the unobserved part. Sometimes, a validation sample is available in the study, providing valuable information on the unobserved units. The estimate of the total population size can be obtained by fitting jointly a zero-truncated distribution to the truncated data and an untruncated distribution of the same class to the untruncated data by means of the EM algorithm. We consider a flexible non-parametric mixture model approach allowing the heterogeneity of the data by means of a nested EM algorithm using validation information. A simulation study illustrates the major ideas of this application. CS00.2 Generalized confidence intervals for random effects in linear random-effects models Al-Sarraj R., von Brömssen C., Forkman J. 2 Swedish University of Agricultural Sciences, Energy and Technology, Uppsala, Sweden, 2 Swedish University of Agricultural Sciences, Crop Production Ecology, Uppsala, Sweden Linear random-effects models are linear models with a single intercept parameter and several unknown variance components. These models are used in many research areas, for example in the analysis of experiments in the agricultural sciences. Random effects are usually predicted using best linear unbiased prediction (BLUP). BLUPs are functions of the unknown variance components, usually estimated by the maximum likelihood (ML) method or the restricted maximum likelihood (REML) method. In some situations, for example in small agricultural field experiments, non-positive estimates of variance components can be obtained, resulting in a difficulty to assess the precision in the BLUPs. When frequentist methods fail to provide useful solutions, the methods of generalized inference, i.e., generalized p-values and generalized confidence intervals, can be of practical value. Generalized confidence intervals are constructed through repeated sampling from generalized pivotal quantities. In this study, generalized confidence intervals were derived for linear combinations of random effects. For both balanced and unbalanced data in two-way layouts, linear random-effects models were considered, with and without interaction. Coverage of generalized confidence intervals was estimated through simulation, based on an agricultural field experiment. Generalized confidence intervals were compared with confidence intervals derived using the REML method. Coverage of generalized confidence intervals was closer to the nominal value than coverage of confidence intervals based on the REML procedure. CS00.3 Cultivar evaluation trials - reduzing replicates of intensities versus reduzing locations. What is the price? Hartung K., Möhring J. 2 Landwirtschaftliches Technologiezentrum Augustenberg, Karlsruhe, Germany, 2 Universität Hohenheim, Stuttgart, Germany In the German cultivar evaluation trials (Landessortenversuch), registered varieties of various arable crops are tested for their regional performance in several federal states (Bundesland). For this purpose varieties are usually grown at different locations with at least two replicates. Certain crops, such as wheat, are tested in two cultivation intensities: optimized and reduced. Depending on the state, optimized intensity varies between "good professional practice" ( gute fachliche Praxis") and "maximum protection". The reduced intensity corresponds to the optimized intensity but fungicides and growth regulators are not used. Varieties are tested for their resistance against diseases in the reduced intensity. However, diseases do not occur in all locations and in every year. Furthermore, occurrence varies from location to location for each disease. And the occurrence of diseases in one location cannot be predicted before sowing for a specific year. Due to cost savings locations are to be closed. Alternatively, the number of locations could be maintained but the number of replicates at each location could be reduced. This means that either the number of replicates in the reduced intensity, or in the optimized intensity or in both intensities are reduced to one replicate per location. In the extreme case of cost savings, there could be a reduction in locations and replicates. This talk would like to show the influence of the different scenarios on the standard error of the difference (s.e.d.) of two varieties. For this purpose, variance components based on data of the last eight years were estimated and used to simulate the expected s.e.d. of the various scenarios. CS00.4 More on computer-generated augmented designs Piepho H.-P., Bußwinkel L. 2, Vo-Thanh N. University of Hohenheim, Biostatistics Unit, Institute of Crop Science, Stuttgart, Germany, 2 University of Hohenheim, Stuttgart, Germany Augmented designs play an important role in plant breeding for early-generation field trials when new varieties are developed and sufficient material is often not available for planting more than one experimental plot. The key idea popularized by Walter Federer is to include check varieties that can be replicated in order to obtain a valid estimate of error and allow adjustments for blocks, whereas the test lines are tested without replication. The simplest augmented design with one blocking factor can be constructed using a randomize complete block design for a few check varieties and then augmenting each block of this design with unreplicated test lines. The basic idea is readily extended to incomplete blocks, and it is also applicable with augmented row-column designs. Recently, Piepho and 78

81 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Williams (206) proposed a method to generate augmented designs with three blocking factors, i.e. rows, columns and block superimposed onto the row-column layout. However, the approach works only for small number of checks. In order to overcome this limitation, in this paper, we propose four different search strategies to generate augmented block designs including one, two, and three blocking factors. The first three strategies make use of the OPTEX procedure of SAS, while the fourth search strategy is based on a simulated annealing algorithm to search for augmented block designs. We adapt the weighted optimality criterion proposed by Morgan and Wang (200) to search for augmented designs in the fourth strategy. Finally, we compare our designs with those in the literature and those generated by using the CycDesigN and DiGGeR packages. CS0 Meta-Analysis, Meta-Regression and Multistate Models Tue :30-8:00 Lecture Hall KR 7 CS0.2 Resolve conflicting rankings of outcomes in network meta-analysis: Partial ordering of treatments Rücker G., Schwarzer G. Medical Center - University of Freiburg, Institute for Medical Biometry and Statistics, Freiburg, Germany In systematic reviews typically different health-relevant outcomes are considered, which may be related to efficacy, safety, quality of life or costs. If, based on a (network) meta-analysis, different outcomes lead to different rankings of the treatments, this complicates decision making. Whereas considering only one outcome would result in a full order of the treatments, considering two or more outcomes typically allows only a partial order to be achieved. Partial orders have been introduced in other areas of science, such as environmental chemistry or econometrics. They can be visualized by so-called Hasse diagrams and by scatter plots. We first explain the methods using a small fictional example. Using the R package netmeta, we then show how to apply them to real data from two network meta-analyses in depression and nasopharyngeal carcinoma. CS0.3 Rigorous statistical modelling of functional MRI data of the brain Möbius T.W.D. Christian Albrechts-Universität zu Kiel, Institut für Medizinische Informatik und Statistik, Kiel, Germany The current state-of-the-art approach to the statistical analysis of functional MR-images involves a variety of pre-processing steps, which alter the signal to noise ratio of the original data. I will present a new and original approach for the statistical analysis of functional MR-imaging data of brain scans. The method essentially fits a weighted least squares model to arbitrary points of a 3D-random field. Without prior spacial smoothing, i.e., without altering the original 4D-image, the method nevertheless results in a smooth fit of the underlying activation pattern. More importantly, though, the method yields a trustworthy estimate of the uncertainty of the estimated activation field for each subject in a study. The availability of this uncertainty field allows -- for the first time -- to model group studies and group-wise comparisons using random effects meta regression models, acknowledging the fact that individual subjects are random entities in group studies, and that the variability in the estimated individual activation patterns varies across the brain and between subjects. At last, the method will be applied to a neuroimaging example from psychiatry. CS0.4 Multistate survival models with shared parameters and covariates applied to real world oncology data Balázs T., Rakonczai P., Lang Z.,2, Bacskai M. Healthware Consulting Ltd., Research and Data Analysis Department, Budapest, Hungary, 2 University of Veterinary Medicine Budapest, Department of Biomathematics and Informatics, Budapest, Hungary Recently, the application of multistate models (MSM) has become increasingly popular in medical research. In the MSM framework health states can be considered as nodes of a directed graph, where the edges represent the possible transitions between nodes. Transition times between states can be modelled by classical survival models e.g. Cox proportional hazard models. In parallel transitions, the effects of shared parameters and covariates can be estimated by using a joint likelihood approach. The main advantage of this method is that parameters of different transition routes can be compared with statistical tests that utilise their joint covariance matrix. Calculation of confidence intervals and p values is also possible. MSMs may encompass complex systems of states and transitions, therefore the proper interpretation of parameters can also be challenging. On the other hand, due to the transparent structure, simulations of state transitions are relatively easy and fast to carry out. We present an application of the MSM model in oncology, where the database covered almost the entire Hungarian population. The starting state of the patients was the diagnosis of cancer and follow-up lasted until progression, death or non-informative censoring. The state of progression was identified by the occurrence of metastatic cancer. Different types of metastases can presumably have different impact on the survival time, so metastasis was split into several substates. The explanatory variables were gender, age, year of diagnosis and comorbidities (Charlson index). Simulation of patient pathways was based on transition probabilities calculated from the hazard functions of the competing events. CS0.5 Minimising healthcare costs using Markov-chain-based control charts Dobi B.,2, Zempléni A. 2 Healthware Consulting Ltd., Budapest, Hungary, 2 Eötvös Loránd University, Department of Probability Theory and Statistics, Budapest, Hungary Control charts, which are part of the statistical process control framework are traditionally used in industrial statistics. Economically optimal charts based on sample size, sampling frequency and control limits are well-established. In the past years an increase of medical uses was seen, though most papers in the area use control charts only for quality monitoring (see e.g. Duclos et al., 2009). In an earlier paper Zempléni et al. (2004) introduced a Markov chain-based method for optimal design of Shewhart-type control charts, based on economic calculations, which originate from Duncan (974). A key development in this paper was the random shift size, compared to the conventional fixed shift size assumption seen in many chart designs. We present a new approach in which not only the size of the shift - which can be seen as the degradation of the patient s health - can be random, but the effect of the treatment too. This means that we do not use the often-present assumption of perfect repair which is usually not applicable for medical treatments. This method is suitable for setting up a cost efficient treatment protocol by finding the optimal frequency of control visits (sampling frequency) and optimal criteria (control limits) for medical intervention. The average cost of the treatment protocol can be estimated by the 79

82 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed stationary distribution of the Markov chain. We illustrate the approach with simulation, based on real-life medical data. CS02 Genetic Agricultural and Environmental Studies Thu :30-8:00 Lecture Hall KR 9 CS02. B-spline basis functions for modelling marker effects in backcross experiments Reichelt M., Mayer M., Teuscher F., Reinsch N. Leibniz Institute for Farm Animal Biology (FBN), Institute of Genetics and Biometry, Dummerstorf, Germany The identification of chromosome sections affecting quantitative traits (QTLs) is essential in the field of quantitative genetics. There are various Bayesian approaches to identify these QTLs. In experimental populations - such as for backcross type - higher marker densities result in more parameters than observations (p>>n). One of the consequences are inflated estimates. B-splines offer the opportunity to model the genetic effects of any number of markers by a limited number of basis function effects. Methods with basis function effects were compared to single marker methods in a simulation study. We simulated 8 different scenarios for a backcross population. 2 QTLs with different effects were either independent or linked in repulsion and coupling in each case. For each scenario 200 experiments with 500 individuals were examined and results averaged over repeated experiments. Markers were equally spaced with cm (p>>n) and 5 cm (p n) distances. B-spline basis functions improved the precision of estimated marker effects, genetic variances and genetic predictions compared to a method with single marker effects. Computation time was immensely decreased. In conclusion B-splines offer a suitable way for adapting the number of parameters to the size of the genome, irrespective of marker density. CS02.2 A factor analytic mixed model approach for the analysis of genotype by treatment by environment data Borg L., Smith A., Cullis B. University of Wollongong, Wollongong, Australia The accurate evaluation of genotype performance for a range of traits, including disease resistance, is of great importance to the productivity and sustainability of major Australian commercial crops. Typically, the data generated from crop evaluation programmes arise from a series of field trials known as multi-environment trials (METs), which investigate genotype performance over a range of environments. In evaluation trials for disease resistance, it is not uncommon for some genotypes to be chemically treated against the afflicting disease. An important example in Australia is the assessment of genotypes for resistance to blackleg disease in canola crops where it is common practice to treat canola seeds with a fungicide. Genotypes are either grown in trials as treated, untreated or as both. There are a number of methods for the analysis of MET data. These methods, however, do not specifically address the analysis of data with an underlying three-way structure of genotype by treatment by environment (GxTxE). Here, we propose an extension of the factor analytic mixed model approach for MET data, using the canola blackleg data as the motivating example. Historically in the analysis of blackleg data, the factorial genotype by treatment structure of the data was not accounted for. Entries, which are the combinations of genotypes and fungicide treatments present in trials, were regarded as `genotypes' and a two-way analysis of 'genotypes' by environments was conducted. The analysis of our example showed that the accuracy of genotype predictions, and thence information for growers, was substantially improved with the use of the three-way GxTxE approach compared with the historical approach. CS02.3 Sparse phenotyping designs for early stage selection experiments in plant breeding programs Cullis B., Smith A., Cocks N., Butler D. University of Wollongong, Centre for Bioinformatics and Biometrics (CBB), Wollongong, Australia The early stages of cereal and pulse breeding programs typically involve in excess of 500 test lines. The test lines are promoted through a series of trials based on their performance (yield) and other desirable traits such as heat/drought tolerance, disease resistance, etc. It is therefore important to ensure the design (and analysis) of these trials are efficient in order to appropriately and accurately guide the breeders through their selection decisions, until only a small number of elite lines remain. The design of early stage variety trials in Australia provided the motivation for developing a new design strategy. The preliminary stages of these programs have limited seed supply, which limits the number of trials and replicates of test lines that can be sown. Traditionally, completely balanced block designs or grid plot designs were sown at a small number of environments in order to select the highest performing lines for promotion to the later stages of the program. Given our understanding of variety (i.e. line) by environment interaction, this approach is not a sensible or optimal use of the limited resources available. A new method to allow for a larger number of environments to be sampled for situations where seed supply is limited and number of test lines is large will be discussed. This strategy will be referred to as sparse phenotyping, which is developed within the linear mixed model framework as a model-based design approach to generating optimal trial designs for early stage selection experiments. CS02.4 Construction of genetic risk scores for gene-environment interaction studies: internal vs. external weights Hüls A., Schwender H. 2, Schikowski T., Ickstadt K. 3, Krämer U. IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany, 2 Heinrich Heine University, Düsseldorf, Germany, 3 TU Dortmund University, Faculty of Statistics, Dortmund, Germany There is evidence that weighted genetic risk scores (GRS) defined as weighted sums over risk alleles of single nucleotide polymorphisms (SNPs) are powerful to detect gene-environment (GxE) interactions. The gold standard is to use external weights from meta-analyses. In a recent study focusing on scenarios without any available external weights, we could show that GRS with internal weights from marginal genetic effects estimated with elastic net regression are a powerful and reliable alternative to single SNP 80

83 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed approaches or unweighted GRS. In this presentation, we present the results from a simulation study for the detection of GxE interactions in which we compared power and type I error of GRS approaches with external vs. internal weights in scenarios with six risk SNPs and an increasing number of noise SNPs (up to 00) and with varying minor allele frequencies. Our simulation study showed that the power to detect GxE interactions reached by applying weighted GRS with internal weights from the marginal genetic effects was only slightly lower than the power achieved by weighted GRS with external weights from a metaanalysis. Furthermore, applying weighted GRS with internal weights reached a higher power than applying weights from the interaction itself when using parts of the data to estimate the weights and the remaining data to determine the GRS. In conclusion, when no appropriate external weights are available (e.g., ethnic differences or differences in the phenotype assessment), we recommend to use internal weights from the marginal genetic effects to construct weighted GRS for GxE interaction studies. CS02.5 The estimation of additive, dominance and epistatic effects underlying hoof disease and lameness in Fleckvieh and Braunvieh cows Suchocki T., Egger-Danner C. 2, Schwarzenbacher H. 2, Szyda J. Wroclaw University of Environmental and Life Sciences, Department of Animal Genetics, Wroclaw, Poland, 2 ZuchtData EDV Dienstleistungen GmbH, Vienna, Australia The goal of the project was the statistical dissection of genetic determination of lameness and hoof disease in cattle. Data available for the analysis includes several traits associated with hoof disease and lameness, pedigree and 76,934 SNP genotypes for 298 cows representing Brownvieh and Fleckvieh breeds ascertained within the frame of the Efficient Cow project. We estimated additive and dominance effects of the SNPs and modeled the pairwise epistasis for markers selected based on their significance or abnormal LD pattern. Mixed linear models with a fixed additive and dominance effect of SNPs and a random additive polygenic animal effect of a cow were used for this purpose. Various model selection criteria were applied in order to define the final model, which includes SNPs which showed significant additive, dominance and (or) epistatic effects on lameness. Furthermore those SNPs were functionally annotated to the UMD3. bovine reference genome. Gene Ontology (GO) terms and KEGG pathways underlying annotated genes were analyzed in order to identify functional clusters significantly enriched within this set of genes. CS03 Topics in Computational Statistics Wed :30-3:00 Lecture Hall HS 5 CS03. Approximate confidence distribution computing: an effective likelihood-free method with statistical guarantees Thornton S., Xie M.-G. 2 Rutgers University, Piscataway, United States, 2 Rutgers University, Statistics and Biostatistics, Piscataway, United States Approximate Bayesian computing (ABC) is a likelihood-free method that has grown increasingly popular since early applications in population genetics. However, the theoretical justification for inference based on this method has yet to be fully developed especially pertaining to the use of non-sufficient summary statistics. We introduce a more general computational technique, approximate confidence distribution computing (ACC) to overcome two defects of the ABC method, namely, lack of theory supporting the use of nonsufficient summary statistics and lack of guardian for the selection of prior. Specifically, we establish frequentist coverage properties for the outcome of the ACC method by using the theory of confidence distributions, and thus inference based on ACC is justified (even if reliant upon a non-sufficient summary statistic). Furthermore, the ACC method is very broadly applicable; in fact, the ABC algorithm can be viewed as a special case of an ACC method without damaging the integrity of ACC based inference. We supplement the theory with simulation studies and an epidemiological application to illustrate the benefits of the ACC method. It is also demonstrated that a welltended ACC algorithm can greatly increase its computing efficiency over a typical ABC algorithm. CS03.2 Rapid detection of antibiotic resistance by real-time robust regression procedures Borowski M., Görlich D., Idelevich E.A. 2, Hoy M. 2, Becker K. 2 University of Muenster, Institute of Biostatistics and Clinical Research, Muenster, Germany, 2 University Hospital Muenster, Institute of Medical Microbiology, Muenster, Germany The detection of antibiotic resistant bacteria is essential for appropriate treatment and infection control, yet reliable and fast methods are still missing. We propose new approaches for fast and reliable discrimination between resistant and susceptible bacteria that are based on real-time estimating and comparing slopes in time series (or data streams) of bacterial concentration. We utilize two recently developed procedures for real-time analysis of data streams, the SCARM (Borowski and Fried, Statistics and Computing 24(4), 204) and the STM (Borowski, Busse, Fried, Statistics and Computing 25(5), 205). The SCARM is used to estimate the current slope in a data stream by robust Repeated Median regression in a moving window sample. Since the size of the window sample is adapted consecutively to the current data situation, the SCARM delivers reliable slope estimates in most data situations. The STM is a SCARMbased procedure for real-time monitoring the coherence of two data streams, i.e. the similarity of their current slopes. We applied our new SCARM- and STM-based approaches to concentration time series from several resistant and susceptible bacteria to assess their ability to discriminate between resistance and susceptibility. Sensitivity and specificity estimates obtained by leave-oneout cross-validation generally exceeded 90-95% after 3-4 hours. Bearing in mind that current methods of susceptibility testing require about 24 hours for reliable results, our approaches appear promising to accelerate susceptibility testing considerably. CS03.3 A multi-criteria approach to find predictive and sparse models with stable feature selection for high-dimensional data Bommert A., Rahnenführer J., Lang M. TU Dortmund University, Statistics, Dortmund, Germany Finding a good predictive model for a high-dimensional data set is often challenging. For genetics data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because in bioinformatics applications, the models are used not only for prediction but also for drawing biological conclusions, which makes the interpretability and reliability of the models crucial. We suggest using three criteria when fitting 8

84 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability of a model, we first compare a variety of stability measures. We find that for the stability assessment behaviour of these measures it is most important if a measure contains a correction term for large numbers of chosen features. While the uncorrected measures show a very similar stability assessment behaviour, the results for the corrected measures differ noticeably. Then, we analyse Pareto fronts to find models that perform well considering all three target criteria. We conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy compared to models fitted only considering the classification accuracy. CS03.4 Tunability and importance of hyperparameters of machine learning algorithms for supervised learning Probst P., Boulesteix A.-L., Bischl B. LMU Munich, Munich, Germany Modern machine learning algorithms like gradient boosting, random forest and neural networks involve a number of hyperparameters. For applications, users can either select defaults from the implementing software, set them to manually specified values or run a computational tuning strategy to choose them, approximately optimally, for the specific dataset at hand. In this context, we define tunability vaguely as the amount of performance that can be gained by changing a hyperparameter value from its default to its optimal value. In this contribution, we define general measures that quantify the tunability of hyperparameters. Furthermore, we assess the tunability of various parameters of some of the most common machine learning algorithms for classification and regression based on experimental results on more than 00 datasets from the OpenML platform. The results yield interesting insights into the investigated hyperparameters, that in some cases allow general conclusions on their tunability. This may help users to decide whether it is worth conducting a possibly time consuming tuning strategy, focusing on the most important hyperparameters and choosing adequate hyperparameter spaces for tuning. CS03.5 Application of particle filter algorithms for monitoring of bioprocesses Stelzer I.V.,2, Kager J., Herwig C.,2 TU Wien, Institute of Chemical, Environmental and Biological Engineering, Vienna, Austria, 2 TU Wien, Christian Doppler Laboratory for Mechanistic and Physiological Methods for Improved Bioprocesses, Vienna, Austria A particle filter algorithm and an extended Kalman filter algorithm for state estimation are compared theoretically with respect to estimation quality and time complexity. Using a structured nonlinear model for biomass estimation in a P. chrysogenum fed-batch process, simulations are performed in quasi real time mode using experimental data. It can be shown that, depending on the number of particles, particle filters are more accurate although being computationally more expensive. Thus, particle filters represent a powerful class of suboptimal Bayesian algorithms for real time state estimation in bioprocesses. CS05 Survival Analysis and Assessment of Survival Prediction Models Tue :30-8:00 Lecture Hall KR 8 CS05. C-statistics: should leave-one-out crossvalidation be banned? Geroldinger A., Lusa L. 2, Nold M. 3, Heinze G. Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Vienna, Austria, 2 University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia, 3 University Hospital Jena, Institute of Medical Statistics, Computer Sciences and Documentation, Jena, Germany The c-statistic is a widely used measure to quantify the discrimination ability of logistic and Cox regression models. For a binary outcome it is simply the proportion of all pairs of observations with opposite outcomes which are correctly ranked by the model. Clearly, calculating the c-statistic for the data on which the model was built will often give too optimistic results, especially in the situation of small samples or rare events. Data sampling techniques such as crossvalidation or bootstrap are frequently used to correct for this overoptimism. Leave-one-out (LOO) crossvalidation has the advantage of being applicable even with small samples where 0-fold crossvalidation might not be feasible. However, mostly in the machine learning community, the accuracy of LOO crossvalidated c- statistics is under debate since it was shown that they can be severely biased towards 0. We discuss these results and demonstrate by simulations that the bias in LOO crossvalidated c-statistics depends strongly on the estimation method. For instance, the negative bias in LOO crossvalidated c-statistics was much stronger for ridge regression than for maximum likelihood estimation. Our simulations indicate that leave-pair-out crossvalidation, a method proposed as alternative to LOO crossvalidation, might be a better choice. Finally, we compare these methods using data from a study on arterial closure devices in minimally invasive cardiac surgery. This work was supported by the Austrian Science Fund (FWF) within project I CS05.2 Concordance indices for composite survival outcomes Cheung L., Pan Q. 2, Katki H. National Institute of Health, NCI, Bethesda, United States, 2 George Washington University, Statistics, Washington, United States Harrell's c index is widely used to measure the accuracy in predicting univariate survival outcomes. However, survival outcomes relating to a disease of interest may show up in multiple endpoints of interest. We propose two extensions of Harrell's c index for composite survival outcomes that account for frequencies of occurrences and the severity/importance of the outcomes. A weighted C index is proposed for a disease process with multiple equally important endpoints, and a most severe comparable C index is proposed for a disease process with a rare primary outcome and a correlated secondary outcome. Asymptotic properties are derived based on the theorems for U-statistics. In the simulation studies, our extensions gain efficiency and power in identifying true prognostic variables. We illustrate these novel concordance indices using the Epidemiology of Diabetes Intervention and Complications (EDIC) and the Diabetes Prevention Program (DPP) trials. In EDIC, the prognosis of diabetic patients at risk for multiple equally important microvascular complications are evaluated using the weighted C index. In DPP, patients with impaired glucose resistance (IGR) who may either 82

85 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed progress to type II diabetes or regress to normal glucose resistance(ngr). The proposed most severe comparable index better evaluates the accuracy in predicting diabetes risk with the help of auxiliary NGR outcomes. CS05.3 Empirical properties of the Schemper-Henderson measure of explained variation in shared frailty models Gleiss A., Schemper M. Medical University Vienna, Section for Clinical Biometrics, Vienna, Austria The Schemper-Henderson measure of explained variation (EV) is an established measure of importance of fixed prognostic factors in studies of survival. It has recently been extended to include a shared frailty representing, e.g., a random center effect. This extension permits to quantify the relative importance of the center effect in comparison to the fixed effects such as treatment. The known properties of the original Schemper-Henderson measure are also valid for the extended measure, since the underlying principle - absolute differences between the subjects' survival processes and an estimated survival function - remains the same. However, with the presence of random effects in Cox regression, additional questions arise which have now been explored in an extensive simulation study: - Is the choice of frailty distribution crucial? - How does the variance parameter of the frailty distribution affect EV values? - Does the size or the number of centers affect EV values? - Do EV values for center effects differ if centers are modelled by fixed effects rather than by a random effect? While the measure is quite insensitive to the choice of the frailty distribution, its dependence on the frailty variance parameter turns out to be approximately linear. The number of centers hardly affects the proportion of variation explained by the random center effect, whereas center size does. Finally, explained variation is very similar for fixed center effects and for a random center effect if centers contain at least 20 patients each. CS05.4 Non-standard nested case-control designs for matching w.r.t. a time-dependent exposure Feifel J., Beyersmann J. Ulm University, Institute of Statistics, Ulm, Germany In a survival analysis, only the uncensored patients contribute information on the actual event times. If the outcome is rare or if interest lies in evaluating expensive covariates, nested case-control designs are attractive, because only a small subset of the patients at risk just prior to the observed event time need to be evaluated. A related situation occurs when interest lies in the impact of a timedependent exposure such as occurrence of an adverse event or disease progression. Here, investigators often wish to match for timeuntil-exposure. The aim of this talk is to investigate how a nested case-control design can be used for this purpose. In practice, nested case-control sampling typically selects controls at random, but the martingale machinery behind nested case-control does allow for choosing controls dependent on the past. We will discuss several options how to account for past time-dependent exposure status within a nested case-control framework and their relative merits. The methods will be illustrated by observational data on the impact of hospital-acquired infection on hospital mortality. CS05.5 Some simple designs for censored survival trials Kimber A., Konstantinou M. 2, Biedermann S. 3 University of Southampton, Mathematical Sciences, Southampton, United Kingdom, 2 Ruhr-Universitaet Bochum, Bochum, Germany, 3 University of Southampton, Southampton, United Kingdom In a two-armed randomised trial where the aim is to estimate the treatment effect, it is clearly best to have an equal number of subjects in each arm trial, isn't it? Not necessarily if the response is the time to an event and may be censored. In this talk we see that locally optimal designs are easy to find in the case of possibly censored exponential event times, that these designs have interesting robustness properties, both in terms of misspecification of parameter values and of misspecification of event-time distribution. These designs also turn out to be nearly optimal in some situations if we prefer to use a semi-parametric Cox proportional hazards model rather than a fully parametric one. CS06 Topics in Survival Analysis Fri :30-2:00 Lecture Hall KR 7 CS06. A comparison of two approaches to stabilize cumulative incidence estimation of pregnancy outcome with delayed entries Rousson V., Aurousseau A. 2, Winterfeld U. 3, Beyersmann J. 4, Allignol A. 4 University Hospital Lausanne, Division of Biostatistics, Lausanne, Switzerland, 2 University of Bordeaux, Bordeaux, France, 3 University Hospital Lausanne, Division of Clinical Pharmacology, Lausanne, Switzerland, 4 University of Ulm, Institute of Statistics, Ulm, Germany A pregnancy may end up with (at least) three possible events: normal birth, spontaneous abortion or elective termination, yielding a competing risks issue when studying an association between a risk factor and a pregnancy outcome. Cumulative incidences, i.e. probabilities to end up with the different outcomes depending on gestational age, can be estimated via the nonparametric Aalen- Johansen estimate. Another issue in such observational studies is that it is rare that a woman is followed up from the very first day of her pregnancy, delayed entries inducing left truncation. As in traditional survival analysis, the issue can be solved by considering ``at risk at a given age only those women who entered the study before that age. However, the number of women at risk at an early age might be extremely low. If only one or two women experience an event at such an early age, the cumulative incidence will increase exaggeratedly at that age because of the small denominator, resulting in unstable estimates. One solution to reduce the problem has been recently proposed, which is simply to ignore those early events, creating a small bias but enhancing stability of estimates (see Friedrich et al, "Nonparametric estimation of pregnancy outcome probabilities", to appear in the Annals of Applied Statistics). In this presentation, we propose an alternative simple approach which consists to postpone to later ages (rather than to ignore) those early events. The two approaches will be compared with respect to bias, stability and sensitivity on the smoothing parameter (the minimal number of women at risks to consider without applying an ignore/postpone action) via simulations reproducing various realistic pregnancy scenarios, and illustrated with data from a study on the possible effects of statins on pregnancy outcomes. 83

86 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS06.2 Evaluating quality of care using time-to-event endpoints based on patient follow-up data Hengelbrock J., Höhle M.,2 Institute for Quality Assurance and Transparency in Healthcare (IQTIG), Medical Biometry and Statistics, Berlin, Germany, 2 Stockholm University, Department of Mathematics, Stockholm, Sweden The recently founded Federal Institute for Quality Assurance and Transparency in Healthcare (IQTIG) is the central institution for the statutory quality assurance in health care in Germany. In accordance with its statutes, it is scientifically independent and works on behalf of The Federal Joint Committee as well as the Federal Ministry of Health, providing its expertise in various tasks of quality assurance of medical care. One of the biostatistical tasks has been the develoopment of a statistical framework for the analysis of recently introduced quality indicators based on longitudinal patient data. For certain surgeries such as the implantation of cardiac pacemakers as well as hip and knee endoprostheses, subsequent operations of patients can be linked to their implantation surgery and, by doing so, hospitals with short periods between initial and subsequent operations can be identified. Within the context of survival analysis, we describe how such data can be analyzed with the (un-adjusted) Kaplan-Meier based survival rate as well as the (risk-adjusted) standardized mortality ratio (SMR). The construction of appropriate confidence intervals is discussed and we show how these indicators based on longitudinal patient data can be embedded into a sequential monitoring scheme applicable to the German hospital profiling performed at the IQTIG. In addition, we provide results from a simulation study comparing the proposed methodology against other alternatives. CS06.3 Flexible modelling of personalised dynamic prediction curves using landmarking, with a case-study in cystic fibrosis Keogh R., Barrett J. 2, Seaman S. 2, Szczesnia R. 3, Wood A. 4 London School of Hygiene and Tropical Medicine, Department of Medical Statistics, London, United Kingdom, 2 University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom, 3 Cincinnati Children's Hospital Medical Center, Division of Biostatistics and Epidemiology, Cincinnati, United States, 4 University of Cambridge, Department of Public Health and Primary Care, Cambridge, United Kingdom In 'dynamic' prediction of survival we make updated predictions of individuals' survival as new longitudinal measures of health status become available. Landmarking is an attractive and flexible method for dynamic prediction. Applications of landmarking typically focus on estimating the probability of survival to time t+w given survival to `landmark' time t, using longitudinal data up to t. To obtain such predictions, Cox models are fitted from time t, with censoring at t+w. However, interest often lies in the entire survival curve given survival to time t, rather than survival to a single timepoint t+w. We show how 'dynamic prediction curves' can be used in landmarking. Cox models and Royston-Parmar flexible parametric survival models will be discussed, including use of time-varying effects. Estimation uses a 2-stage procedure. First, mixed models are fitted to the longitudinal data up to each landmark t. Second, predicted values from the mixed model are used as predictors in a flexible survival model conditional on survival to t. We also show how the models for dynamic prediction curves can be combined across landmark times and how to obtain confidence intervals for the curves. The methods will be discussed with reference to a case-study in cystic fibrosis (CF). The aim was to obtain personalised dynamic prediction curves for survival in CF from a series of landmark ages using longitudinal data from the UK CF Registry. We also discuss how some of the obstacles arising in this study can be addressed, including occurrence of intermediate events and missing data. CS06.4 Excess length of stay due to ventilation-acquired pneumonia in intensive care unit patients: A multistate approach accounting for time-dependent mechanical ventilation Bluhmki T., Timsit J.-F. 2, Beyersmann J., COMBACTE-MAGNET Consortium Ulm University, Institute of Statistics, Ulm, Germany, 2 Saint Joseph Hospital, Department of General Intensive Care Medicine, Paris, France Ventilation-acquired pneumonia (VAP) in patients in the intensive care unit (ICU) is associated with increased mortality and morbidity, and often leads to a prolonged length of stay (LoS). Excess LoS is a key-quantity in cost-benefit analyses for infection control, but its estimation must account for the time-dependency of VAP exposure employing multistate methodology. The aim of this talk is to decompose this excess LoS into the proportions of extra days spent under and not under mechanical ventilation. On the one hand, this is of economical interest, because mechanical ventilation is a major cost driver in ICU; on the other hand, it is a patient-relevant quality of life indicator. For that purpose, we suggest a multistate model accounting for the time-dependent nature of both VAP and mechanical ventilation during ICU stay and, using landmarking, derive an estimator for the desired quantities. The conceptual challenge is that estimation involves complex functionals of the matrix of transition probabilities, which in turn are based on the transition hazards. Results are compared to the more common notion of excess LoS not accounting for ventilation status. The methods are applied to the OUTCOMEREA database, which contains data of a prospective, observational, multicenter study including 2 French ICUs. First results indicate that excess LoS associated with VAP is mainly triggered by extra days spent under mechanical ventilation. CS06.5 A new method for joint modelling of survival time and cumulated cost Lang Z.,2, Rakonczai P. 2 University of Veterinary Medicine Budapest, Department of Biomathematics and Informatics, Budapest, Hungary, 2 Healthware Consulting Ltd., Budapest, Hungary A new time-cost joint survival model with random trend is introduced that incorporates time and cost scaled hazards of the same events. In the proposed method a Cox proportional hazard (CPH) model is fitted to the survival time to a specific event simultaneously with another CPH model of the same event on a transformed scale representing the trend of cumulated cost instead of the time. In fact, cumulated cost processes of patients are considered as the sum of non-negative, strictly monotone increasing, continuous, random functions of time (trend components) and independent residual Wiener processes (residual components). In the two CPH models a shared, patient-specific frailty parameter is included representing the intrinsic stochastic dependency between the cumulated time and cost. Optionally, time-dependent covariates are included. Non-informative time scaled censoring of the observed data is allowed. Our approach is based on the classical parametric distribution families. The estimation of the parameters is not straightforward; our experimental method applies the EM algorithm. The advantage of this joint time - cost survival model is that time and cost scaled hazard and survival functions are linked, therefore they can be defined properly and analysed simultaneously. This approach provides more realistic approximations of patient care costs than traditional time-dependent cost models. We also discuss stability of model fitting and accuracy of the estimated parameters based on simulation experiments mirroring realworld cancer mortality data. 84

87 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS07 Analysis of Complex Time to Event Data Thu :30-:00 Lecture Hall KR 8 CS07. The win ratio and its generalized analytic solution for analyzing a composite endpoint considering the clinical importance order among components Dong G., Li D. 2, Ballerstedt S. 3, Vandemeulebroecke M. 3, Wang D. 4 istats Inc., Long Island City, United States, 2 Eisai Inc., Woodcliff Lake, United States, 3 Novartis Pharma AG, Basel, Switzerland, 4 School of Tropical Medicine, Liverpool, United Kingdom A composite endpoint consists of multiple components combined in one outcome. It is frequently used as the primary endpoint, e.g., in cardiovascular, oncology and transplant trials. There are two main disadvantages associated with the use of a composite endpoint: a) conventional approaches treat its components equally; and b) in time-to-event analyses, the first event that occurrs may not be the most important component. Pocock et al. (202) introduced the win ratio to address these disadvantages. The win ratio method takes into account the order of importance of the different components: it compares each subject in the Treatment group with every subject in the Control group to determine who is the "winner" or the "loser" based on the prioritized components, and then it takes the ratio of the number of winners in the Treatment group to that in the Control group. During the past few years, the win ratio has been applied in the design and analysis of some clinical trials, and there are also some new methodological developments such as Luo et al. (205), Bebu and Lachin (206), Wang and Pocock (206) and Dong et al. (206). This presentation will briefly review the win ratio and its methodological developments and applications in some clinical trials; and then focus on the most recent method by Dong et al. (206). This method provides a generalized analytic solution that is valid for any way of defining winners, losers and ties. CS07.2 A systematic comparison of recurrent event models for the application to composite endpoints Ozga A.-K.,2, Rauch G. 2, Kieser M. Institut für Medizinische Biometrie und Informatik, Uniklinikum Heidelberg, Heidelberg, Germany, 2 Institut für Medizinische Biometrie und Epidemiologie, Uniklinikum Hamburg-Eppendorf, Hamburg, Germany Many clinical trials focus on the comparison of the treatment e ffect between two or more groups concerning a rarely occurring event. In this situation, showing a relevant e ffect with an acceptable power requires the observation of a large number of patients. For feasibility issues, it is therefore often considered to include several event types of interest, non-fatal or fatal, and to combine them within a composite endpoint. Commonly, a composite endpoint is analyzed with standard survival analysis techniques by assessing the time to the first occurring event. This approach neglects that an individual can experience more than one event which leads to a loss of information. As an alternative, composite endpoints could be analyzed by models for recurrent events. There exists a number of such models, e.g. regression models based on count data or Cox-based models such as the approaches of Andersen and Gill, Prentice, Williams and Peterson or Wei, Lin and Weissfeld. Although some of the methods were already systematically compared within the literature there exists no systematic investigation for the special requirements regarding composite endpoints. Therefore, within this work a simulation based comparison of recurrent event models applied to composite endpoints is provided for di fferent clinically relevant scenarios. We demonstrate that all models deliver di fferent e ffect estimators which can considerably deviate under commonly met data scenarios. This presentation discusses the pros and cons of the investigated methods in the context of composite endpoints and provides recommendations for an adequate statistical analysis strategy and a meaningful interpretation of results. CS07.3 Predict long-term binary outcome using multistate models with interim data in patients with chronic diseases Hsu Schmitz S.-F. Novartis, Basel, Switzerland Patients with chronic diseases may go through different disease stages before reaching an end stage. Here we consider a clinical trial where patients under continuous treatment may transition back-and-forth between the no-response stage and the response-stage. Some patients may finally discontinue treatment due to treatment failure or other reasons, e.g. intolerance. Suppose the main interest is a binary response/no-response variable at a given long-term time point. Patients who discontinued treatment, irrespective of the reason, prior to the long-term time point will be considered as non-responders. To provide hints at an earlier time, an interim analysis at a shortterm time point may be implemented to predict the long-term binary outcome. Survival data analysis with multistate models could be a practical tool in this setting with the advantages of incorporating potential competing risks, e.g. intolerance, as well as making use of all data available at the interim cutoff, instead of ignoring data collected after the short-term time point as in the binary approach. The applicability of multistate models was first explored in historical data of a real trial. The model prediction was consistent with the observed historical data. The multistate models were then applied to simulated data under different scenarios of a hypothetical trial. The simulation results show good operating characteristics of the prediction for the long-term binary outcome. CS07.4 Using a surrogate endpoint without sacrificing the outcome of interest Rochon J., Lange T. 2,3 Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany, 2 University of Copenhagen, Copenhagen, Denmark, 3 Peking University, Beijing, China In clinical trials aiming at improving survival, the time from start of treatment until death can be very long. This challenge can be faced by considering surrogate endpoints that are quicker observed. Progression-free survival (PFS) is an often-used surrogate endpoint for overall survival (OS) in oncology trials. However, when changing the primary outcome from OS to PFS the causal parameter of interest also changes. In fact, the treatment hazard ratios for PFS and OS usually differ. We propose an illness-death model that keeps the focus on OS and, at the same time, uses the progression information of PFS in an efficient way. Most importantly, the causal parameter remains unchanged, such that the reported hazard ratio for OS truly reflects the treatment effect on survival. We describe the implications of confounding on the path between progression and death, and discuss practical applications of the new approach. 85

88 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS08 Selected Topics Fri :30-2:00 Lecture Hall KR 8 CS08. A comparison of dilution experiments to estimate the detection proportion of qualitative microbiological methods Manju M.A., van den Heuvel E. 2, IJzerman-Boon P. Merck Sharp & Dohme (MSD), Center for Mathematical Sciences, Oss, Netherlands, 2 Eindhoven University of Technology, Department of Mathematics and Computer Science, Eindhoven, Netherlands The performance of a qualitative microbiological test method can be quantified by its detection proportion, i.e. the probability to detect a single organism in a test sample. The detection proportion quantifies the sensitivity of the method and is strongly related to the limit of detection. To quantify the performance of the test method, the detection proportion must be estimated based on an experiment for several micro-organisms. Ideally, one would create several test samples with exactly one organism per test sample, and the proportion of positively tested samples from such an ideal experiment would be an estimate of the detection proportion. In practice, however, spiking test samples with an exact number of organisms is impossible. Therefore, this presentation investigates which dilution strategy, when either one or multiple dilutions spiked with micro-organisms are split into test samples, is optimal for estimation of the detection proportion, The first optimal design assumes that the distribution of the spikes is generalized Poisson and the second optimal design assumes that only the mean spike in the dilution is known, which is practically more relevant. The optimal design minimizes the mean squared error of our proposed moment estimator for the detection proportion, choosing the number of dilutions and the mean spike level for the dilutions. The results are determined by simulations, and show that the optimal strategy differs from the most common practice. CS08.2 Comparison of dissolution profiles: a statistician's perspective Hoffelder T. Boehringer Ingelheim, Ingelheim am Rhein, Germany Dissolution profile comparisons are used in the context of post-approval changes where the manufacturer has to demonstrate that the quality of the product is not affected by the change. A dissolution profile comparison yields a multivariate equivalence testing problem. Around this topic basic statistical principles are in conflict with widely-used interpretations of current guidelines resulting in time-intensive discussions in pharmaceutical practice. From a statistician's perspective there is potential for international harmonization and the following suggestions could improve the situation:. A clear definition of the variability criterion for the similarity factor as such can be found in the EMA guideline is helpful. 2. Sample size recommendations should be interpreted as minimum, not as maximum requirements especially for highly variable data. 3. In case of several batches per reference or test group pooled comparisons should be performed instead of multiple batch-tobatch comparisons. 4. FDA Guideline recommendations concerning multivariate equivalence procedures for highly variable data are based on the state of statistical knowledge at 997 and need to be updated. 5. For highly variable profiles EMA guideline recommends that the used statistical method should be "statistically valid and satisfactorily justified". Several statistical methods, the T 2 -test for equivalence and f2 based methods are compared and discussed. The T 2 -test for equivalence is superior to f2 based methods regarding statistical and practical properties. Software implementations of T 2 -test for equivalence are available in R and SAS. CS08.3 Estimating the force of infection and incidence using routine data affected by outcome dependent sampling Herzog S., Abrams S. 2, Hens N. 2 Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria, 2 Hasselt University, Center for Statistics, Hasselt, Belgium In routine data, we are often faced with outcome dependent sampling (ODS), i.e. the probability to have a further observation from the same person depends on the outcome of the current one. For example, Austrian prenatal care includes screening for the early detection and treatment of toxoplasmosis infection. According to the guidelines, a pregnant woman should be tested up to three times during her pregnancy if the first test fails to show antibodies. On the other hand, no further tests should be conducted if the first test was positive. The ODS in this example leads to an overrepresentation of negative test results which needs to be accounted for when estimating the force of infection (FOI) from (serial) cross-sectional data. We propose - for a Susceptible-Infected (SI) course of infection - an estimator for the FOI thereby dealing with ODS. This estimator is derived theoretically while relying on conditional independence assumptions, i.e. the probability to have a new observation depends only on the outcome (negative or positive test) of the previous observation. In a simulation study, we compared our proposed estimator for the FOI with two others: one which uses all observed data but ignores ODS, i.e. treats all observations as being independent, and one which uses only the first observation from each person. Maximum likelihood estimators are compared in terms of absolute bias and precision. The insights gained through the aforementioned simulation study are then applied to serial cross-sectional data on toxoplasmosis infection. CS08.4 Evaluation of clustering algorithms for tandem mass spectra Rieder V., Rahnenführer J. TU Dortmund University, Dortmund, Germany In the field of proteomics liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an established method for the identification of peptides and proteins. Duplicated spectra, i.e. multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering of tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, e.g. Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. 86

89 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed We compare different clustering algorithms based on the cosine distance between spectra. In the proteomics community, CAST, MS- Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, i.e. hierarchical clustering (complete-linkage clustering) and density-based clustering (DBSCAN). We evaluate all clustering algorithms on real data of samples from species with and without database search annotation as well as on simulated distance matrices. Parameter settings are varied for all clustering algorithms, and the cluster results are compared with one another and with peptide annotations. Validation measures, e.g. the adjusted Rand index, the purity and the number of singletons, are used to evaluate and compare the different approaches. CS08.5 Illness-death model: statistical perspective and differential equations Brinks R.,2, Hoyer A. 2 University Hospital Duesseldorf, Hiller Research Unit for Rheumatology, Duesseldorf, Germany, 2 German Diabetes Center, Institute for Biometrics and Epidemiology, Duesseldorf, Germany The aim of this work is to relate the theory of Markov processes with a recently found differential equation [] associated with the illnessdeath-model. We show that the Kolmogorov Differential Equations can be used to derive a relation between the prevalence and the transition rates in the illness-death-model. The Kolmogorov Differential Equations provide a framework to examine mathematical welldefinedness and epidemiological meaningfulness of the prevalence of the disease. As an application, we derive the incidence of diabetes from a series of cross-sections in the Survey of Health, Ageing and Retirement in Europe (SHARE) study. [] Brinks R, Landwehr S (205) Change Rates and Prevalence of a Dichotomous Variable: Simulations and Applications, PloS One, 0(3): e08955 CS09 Biomarker Discovery Fri :45-4:5 Lecture Hall KR 8 CS09. Model selection based on combined penalties for biomarker identification Vradi E., Brannath W. 2, Jaki T. 3, Vonk R. Bayer AG, Research and Clinical Statistics, Berlin, Germany, 2 University of Bremen, Competence Centre for Clinical Trials, Bremen, Germany, 3 Lancaster University, Department of Mathematics and Statistics, Lancaster, United Kingdom The growing role of targeted medicine has led to an increased focus on the development of actionable biomarkers. Current penalized selection methods that are used to identify biomarker panels for classification in high dimensional data, however, often result in highly complex panels that need careful pruning for practical use. In the framework of regularization methods a penalty that is a weighted sum of the L and L0 norm has been proposed to account for the complexity of the resulting model. In practice, the limitation of this penalty is that the objective function is non-convex, non-smooth, the optimization is computationally intensive and the application to highdimensional settings is challenging. In this paper we propose a stepwise forward variable selection method which combines the L0 with L or L2 norms. The penalized likelihood criterion that is used in the stepwise selection procedure results in more parsimonious models, keeping only the most relevant features. Simulation results and a real application show that our approach exhibits a comparable performance with common selection methods with respect to the prediction performance whilst minimizing the number of variables in the selected model resulting in a more parsimonious model as desired. CS09.2 Seamlessly translating biomarkers from scientific discovery to clinical development in precision medicine Li E., Feng S. 2 Jounce Therapeutics, Biostatistics and Data Management, Cambirdge, United States, 2 AbbVie Pharmaceuticals, Biostatistics, Cambridge, United States Biomarkers play central role in Precision Medicine. In practice, however, translating biomarkers from scientific discovery to clinical development may not be trivial. In this presentation, with a number of examples, we will discuss some common issues in drug development using biomarker data to bridge the early discovery and pre-clinical work to clinical stage development. In particular, exploiting appropriate statistical tools can be critical in addressing such issues in many cases. Examples focus on data quality assessment in assay development for novel scientific techniques; public BIG DATA used as reference; inconsistent statistical models applied in drug discovery phase and later development; and quantification of the incremental value of proposed biomarkers in trial designs and subsequent activities along clinical development paradigm. In all cases, utilizing the right statistical tools is essential for preserving scientific and clinical rigor, ensuring the efficient and seamless translation of early scientific discovery biomarker data into later stage clinical development. CS09.3 Sample size calculation for the identification of prognostic biomarkers for time-to-event outcomes using high-dimensional omics data Krzykalla J., Benner A. German Cancer Research Center, Heidelberg, Germany Numerous studies aim at predicting time-to-event outcomes through omics-based discovery of biomarkers. More than for traditional, typically one-dimensional studies, thorough planning and design are crucial for the success. Due to the fact that the required sample size for the discovery set has to be assessed with respect to not only a single variable but a set of variables, the terms of power and test error have to be adapted to fit the characteristics of a multiple testing scenario. Jung (JCSSB, 203) proposed a revised version of the well-established formula of Schoenfeld to achieve a pre-defined average power while ensuring control of the false discovery rate. The methods of Dobbin and Simon (Biostatistics, 2007) and Dobbin and Song (Biostatistics, 203) pursue a different goal: the sample size is specified such that the (multivariable) predictor's performance is close to the optimal one that could be achieved with an infinite training set. Performance is measured either by the probability of correct classification in a landmark analysis or by the predictive accuracy of the prediction model, e.g. a multivariable Cox model. However, sparsity is not a matter of concern for the last two approaches. A particular problem when dealing with high-dimensional omics data is that candidate biomarkers are strongly correlated and distribution assumptions are often questionable. Multivariate normality and uncorrelatedness, both partly assumed by the above mentioned approaches, are strong simplifications that do not reflect the truth. One possibility to overcome this problem is to use a real data set and 87

90 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed manipulate it in such a way that the parameters of interest are known with certainty. This is the concept of a so-called Plasmode data set. An overview of methods of varying complexity for sample size calculation when dealing with high-dimensional omics data will be given and their advantages and pitfalls will be discussed. CS09.4 Evaluation of performance of machine learning methods for genomic prediction using three maize breeding datasets Ogutu J.O., Piepho H.-P. University of Hohenheim, Biostatistics Unit, Institute of Crop Science, Stuttgart, Germany Genomic prediction (or selection) is becoming widely recognized as an efficient and cost-effective component of molecular plant and animal breeding schemes. We evaluate the relative predictive performances of 8 machine-learning methods for genomic prediction in plant breeding using three empirical maize data sets. The specific methods we consider are ridge regression, ridge regression best linear unbiased prediction (BLUP); lasso-type methods (lasso, adaptive lasso, elastic net, adaptive elastic net, bridge regression); regression tree based methods (random forests, boosted regression trees); support vector regression machines; concave (nonconvex) methods (Minimax Concave Penalty (MC)P and Smoothly Clipped Absolute Deviation (SCAD)); grouped methods (group lasso, sparse group lasso, overlapping group lasso, group bridge, group MCP, group SCAD, and group exponential methods). We apply each of these methods to maize breeding data sets for 200, 20 and 202 produced by KWS for the Synbreed project. The number of all genotyped lines was 073, 857 and 38 for the 200, 20 and 202 data sets, respectively. Each line was genotyped for 3227 SNP markers. A subset of the markers with non-zero variance were split into groups of sizes, 0, 20, 30, 40, 50, 60, 70, 80, 90 and 00. Groups were defined by systematically grouping consecutive and spatially adjacent markers, separately for each of 0 chromosomes. 5-fold cross validation is used to assess the predictive ability of each method on each of the three distinct maize data sets. The relative predictive performances of these methods are also against their performances on animal breeding data sets with known true breeding values, simulated for the QTLMAS workshops for 200, 20 and 202. CS0 Computational Statistics Thu :30-8:00 Lecture Hall KR 7 CS0. The betaboost package - a boosting framework to estimate and select models for bounded outcomes like Health Related Quality of Life data Mayr A., Weinhold L. 2, Hofner B. 3, Tietze S. 4, Gefeller O., Schmid M. 2 Friedrich-Alexander-Universität Erlangen-Nürnberg, Department of Medical Informatics, Biometry and Epidemiology, Erlangen- Nuremberg, Germany, 2 University Hospital Bonn, Department of Medical Biometry, Informatics and Epidemiology, Bonn, Germany, 3 Paul-Ehrlich-Institut, Section Biostatistics, Langen, Germany, 4 Friedrich-Alexander-Universität Erlangen-Nürnberg, Department of Nephrology and Hypertension, Erlangen-Nuremberg, Germany The analysis of bounded outcomes like the ones resulting from commonly used Health Related Quality of Life scales is a widely discussed issue. Furthermore, modern epidemiological studies often collect vast amounts of data, leading to large numbers of potential explanatory variables. The methodological challenges in this context are two-fold: (i) the bounded outcome makes traditional statistical modelling approaches problematic and (ii) the classical statistical inference methods become unfeasible in the presence of high-dimensional data. With the betaboost package, we present a statistical software tackling both issues by incorporating flexible approaches for beta regression in a model-based boosting framework. In our software, two different variants for beta-regression are implemented: while classical beta regression focuses on modelling the expected value via the mean parameter, an extended version additionally models the precision parameter based on covariates in the spirit of distributional regression. Our software incorporates a boosting algorithm which is originally a machine learning approach but was later adapted to estimate statistical models. An inherent advantage of these statistical boosting algorithms is that they (i) can deal with high-dimensional data, (ii) are able to simultaneously select the most influential predictors from a potentially large amount of candidate variables, (iii) still yield statistical models that are in the same way interpretable as if they were estimated via classical approaches and (iv) allow to incorporate different type of predictor effects (e.g., linear, non-linear, spatial). With the betaboost package, we provide a powerful tool for the analysis of bounded outcomes while incorporating automated datadriven variable selection. CS0.2 kerndeepstacknet: an R package for tuning kernel deep stacking networks Welchowski T., Schmid M. University Hospital Bonn, Department of Medical Biometry, Informatics and Epidemiology, Bonn, Germany Kernel deep stacking networks (KDSNs) are a novel method for supervised learning in biomedical research. Belonging to the class of deep learning methods, KDSNs use multiple layers of non-linear transformations to derive abstractions of the input variables. This architecture can efficiently represent complex nonlinear dependencies in the joint distribution of the inputs and the response variable. While training of deep artificial neural networks usually involves the optimization of a non-convex problem, often implying local optima and slow convergence, KDSNs are characterized by an efficient fitting procedure that is based on a series of kernel ridge regression models with closed-form solutions. This talk will address the tuning of KDSNs, which is a challenging task due to the multiple hyper-parameters that have to be specified before network fitting. Specifically, we propose a data-driven tuning strategy for KDSNs that is based on model-based optimization (MBO). The proposed tuning approach explores the hyper-parameter space via a meta-model that uses a performance criterion such as the area under the curve or the root mean squared error as outcome variable. Simulation studies show that the MBO approach is substantially faster than traditional grid search strategies. Analysis of real data sets demonstrates that MBO-tuned KDSNs are competitive to other state-of-art machine learning techniques in terms of prediction accuracy. We also extend the KDSN framework by new tools for variable selection and dropout. The fitting and tuning procedures are implemented in the R package kerndeepstacknet. 88

91 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS0.3 The revival of the Gini importance? Wright M.N., Nembrini S. 2 Universität zu Lübeck, Lübeck, Germany, 2 Università degli Studi di Brescia, Brescia, Italy Random forests are fast, flexible and represent a robust approach to analyze high-dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are interpretable and fast to compute. However, they are biased in favor of variables with many possible split points and high entropy. Also, they are always non-negative and consequently not centered around zero, making statistical testing difficult. We propose a bias-corrected version of the impurity importance. For this, a permuted version of each variable is created. A standard impurity importance is computed for every variable, including the permutations. Then, the corrected impurity importance is computed as the difference between the importance of a variable and its permuted version. The proposed method is implemented in the R package ranger in an efficient manner, avoiding explicit storage of permuted variables. In simulation studies, we show that the proposed importance measure is unbiased in terms of the number of possible split points and entropy. Further, it is centered around zero allowing for the statistical testing of variables. These tests keep the significance level and have comparable power to tests based on the permutation importance. Finally, we apply the proposed measure to real data from gene expression and genome-wide associations studies. CS0.4 A comprehensive non-parametric approach for the analysis of diagnostic accuracy trials and its implementation in R Rooney D., Zapf A. 2 German Aerospace Center (DLR), Institute of Aerospace Medicine, Cologne, Germany, 2 University Medical Center Göttingen, Department of Medical Statistics, Göttingen, Germany Background: The goal of diagnostic methods is to differentiate a population by the presence or absence of a condition. This creates a twofold problem since increased sensitivity counteracts specificity and vice versa. Due to this antagonism different measures are employed to evaluate diagnostic accuracy trials. While early trials often focus on the area under the receiver operating characteristics (ROC) curve, the AUC, as an integral measure, confirmatory trials usually aim at meeting prescribed thresholds of sensitivity and specificity. Study designs often need to account for additional factors, such as human rating in medical imaging. Methods: Sensitivity and specificity can be represented as specific ROC curves. In consequence the asymptotic distribution of the AUC can be used to derive equally reliable point and interval estimators for all measures. Furthermore, it enables unified decision making by applying a non-parametric analysis of variance model. It facilitates two-factorial designs and the covariance matrix can be constructed to allow a nested arrangement of factors as well as experimental units. Multiple observations can cluster within one unit and the target condition may be inconsistently present within each cluster. The model can be adjusted for the influence of an unlimited number of covariates using a regression based methodology. Results: The model was implemented in an R package to offer a user-friendly platform for the unified analysis of AUC, sensitivity and specificity in a broad range of trial designs. Simulations indicate that adequate consideration of trial characteristics leads to a better coverage probability and bias reduction. CS0.5 Boosting joint models for longitudinal and time-to-event data - a selection and allocation algorithm Waldmann E., Mayr A. Friedrich-Alexander-Universität Erlangen-Nürnberg, Department of Medical Informatics, Biometry and Epidemiology, Erlangen, Germany Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach a data structure common in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. In this contribution, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables and allocate them to the appropriate sub-predictor even in highdimensional data situations. We apply it to the Danish cystic fibrosis registry which collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models. CS Dose Finding and Drug Combination Studies Tue :30-8:00 Lecture Hall HS 5 CS. A Bayesian approach to modelling drug interactions Cremaschi A.,2, Frigessi A., Zucknick M., Taskén K. 2 Universitetet i Oslo, Biostatistics, Oslo, Norway, 2 Universitetet i Oslo, NCMM, Oslo, Norway The combination of two drugs administered simultaneously may lead to significantly different effects when compared with the responses obtained from monotherapy studies on the same kind of experiment. This can be representative of synergistic or antagonistic behaviour. A primary issue with this evaluation is to establish a reference value representing the condition where the compounds do not interact, also called zero-interaction level. To this aim, several heuristic approaches have been proposed that define such baseline level, and then compare it with the response obtained from combination. However, these approaches rely on different assumptions for the doseresponse curve, and may provide conflicting outcomes in several situations. In order to overcome these issues, we propose a Bayesian regression framework for modelling the response surface when two drugs are combined, and apply it to a dataset from cancer therapy. Posterior computations are obtained via MCMC algorithms, providing estimates of the zero-interaction level. Interestingly, the Bayesian framework allows for density estimation and prediction for those dose combinations that have not been tested. Additionally, a comparison with existing methodologies is provided. 89

92 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS.2 Using a Bayesian nonparametric model to shorten the trial duration of phase II oncology trials for drug combinations Yada S., Hamada C. Tokyo University of Science, Faculty of Engineering, Katsushika-ku, Japan Phase II clinical trials for anticancer drugs can help confirm the effectiveness of therapeutic drugs, and determine whether to move forward to late phase trials. Tumor response, often used as an efficacy endpoint of solid tumors, is measured at each time point specified in the protocol. The best overall response is defined as the best response recorded from the start until the end of treatment. In the existing method in which efficacy is modeled as a binary endpoint, the dose combination allocated to the next cohort is selected using the estimated efficacy probability based on the best overall response. Thus, it is not possible to select the dose combination allocated to the next cohort until the follow-up period has been completed and the best overall responses have determined in the current cohort. As a result, the duration of the trial becomes long depending on the sample size and the assessment points of the trial. In this presentation, we propose a method applying analysis of missing data to shorten the duration of the trial. The dose combination allocated to the next cohort is determined at the point where the follow-up period in the current cohort is not completed. The unobserved efficacy data at that point is treated as missing data. A Bayesian nonparametric model is used to account for the missing data mechanism. The dose combination for the next cohort is selected using the estimated efficacy probability after imputation for the missing data. We report the results of simulation studies comparing our proposed method with the existing method. CS.3 A flexible non-parametric dose-finding design for Phase II clinical trial Mozgunov P., Jaki T. Lancaster University, Department of Mathematics and Statistics, Lancaster, United Kingdom In trials of small populations it has been advocated recently that designs maximising the expected number of successes (ENS) should be used to increase the proportion of the population that receives the superior treatment. Particularly, optimal multi-arm bandit (MAB) and the dynamic programming approaches have received a considerable attention. These approaches do not use any parametric assumption between treatments. However, the original designs result in low statistical power if the difference between the recommended dose and control to be tested. As the result, some rule-based modification were proposed. Using recent development in the theory of the weighted information measures we develop the new fully adaptive dose-finding design which does not require any parametric assumption and lead to the accurate recommendation with high probability. Moreover, the introduced sensitivity parameter determines the trade-off between the ENS and the statistical power. It is shown that the proposed design performs similarly to the optimal MAB approach (both in terms of ENS and the statistical power) for small values of sensitivity parameter. For greater values, it performs as good as fixed randomization in terms of the statistical power, but with greater ENS. In contrast to the currently studied methods, the proposed design can target any percentile and not only the most efficacious dose which is applicable in many Phase II clinical trials. Moreover, the proposed design is computationally easy and intuitively clear. CS.4 Finding the optimal dose in clinical trials using MCP-Mod: hands-on experience based on the implementation in a Phase II study Möst L. Metronomia Clinical Research GmbH, München, Germany Identifying the optimal dose is a key goal during drug development and its importance cannot be stressed enough. Selecting a too high dose might result in an increased risk of adverse events or other safety concerns, which can be avoided using a lower dose. Selecting a too low dose might result in insufficient efficacy of the drug. In the past, two main approaches were applied for identifying the optimal dose in dose-finding studies: Multiple comparison procedures, where the dose is considered as a qualitative factor, or model-based procedures, where a functional dose-response relationship in terms of a parametric model is assumed and the dose is considered as a quantitative factor. Bretz et al. (2005) combined both approaches by introducing the MCP-Mod procedure, which marked an important methodological milestone in the advancement of doseresponse analyses. The MCP-Mod procedure allows the specification of several candidate dose-response models. Each candidate model is used to test for a dose-response signal using multiple comparison techniques while preserving the overall type I error. The most appropriate model to describe the dose-response relationship is then selected or several models are used to estimate a doseresponse relationship by averaging the results from the models with a significant dose-response signal. We applied the MCP-Mod procedure for finding the optimal dose in a Phase II study for treatment of seasonal allergy. This talk gives a brief summary of the MCP-Mod methodology. The application of the method is then illustrated with emphasis on practical and methodological challenges that arose during the planning, conduct and analysis of the mentioned study. Model interpretations, results and conclusions are included. Advantages of the MCP-Mod procedure in comparison to standard analyses which are often performed in dose-finding studies are discussed. From a practical perspective, prior discussions with clinicians, lessons learned, and computational challenges are also covered. CS2 Optimizing Drug Development Programs Tue :30-8:00 Lecture Hall KR 9 CS2. Visualisation of the impact of decision criteria and programme design on the quality of a drug development portfolio Harbron C. Roche, Biostatistics, Welwyn Garden City, United Kingdom Background: Early clinical development proceeds in a sequential fashion with a number of studies and go/no-go decision points between each study. The basis of the criteria used to make these decisions has an impact on both the proportion of truly effective agents that correctly are permitted to progress to further investigation and the number of ineffective agents that are incorrectly also progressed incurring wasted costs. Increasingly alternatives to the traditional three phase clinical development programme are also being considered including adaptive designs, skipping phase 2 and single arm studies which also impact the overall likelihood of an agent being progressed to being studied in later phases. Methods: The properties of these study and program designs and decision criteria are typically considered on a single study or single 90

93 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed agent basis. In this talk we propose a graphical approach for describing the impact of different designs and decision criteria on a portfolio of novel agents allowing the impact of different development strategies on a portfolio to be studied. Results: The impact on the quality of a development portfolio, that is the distribution of effect sizes of those agents being progressed are visualised. The potential impacts of using more or less aggressive development strategies as well as approaches such as surrogate endpoints and single arm studies are considered. These approaches allow development teams to make more informed decisions on their choices of development program. CS2.2 Integrated planning of phase II/III programs with several phase III trials: optimal sample size allocation and go/no-go decision rules Preussler S., Kieser M., Kirchner M. University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany The conduct of phase II and III programs is costly, time consuming and, due to high failure rates in late development stages, risky. As the go/no-go decision and the sample size chosen for phase III are based on the results observed in phase II, there is a strong connection between phase II and III trials. An integrated planning of phase II and III is therefore reasonable. The performance of phase II/III programs crucially depends on the allocation of the resources to phase II and III in terms of sample size and the rule applied to decide whether to stop or to proceed to phase III. Recently, a utility-based approach was proposed, where optimal planning of phase II/III programs is achieved by taking fixed and variable costs of the drug development program and potential gains after a successful launch into account. However, this method is restricted to programs with a single phase III trial, while regulatory authorities generally require statistical significance in two or more phase III trials. We present a generalization of this procedure to programs where two or more phase III trials are performed. Optimal phase II sample sizes and go/no-go decision rules are provided for time-to-event outcomes and scenarios, where at least one or two phase III trials need to be successful. We investigate the consequences of the biased treatment effect estimate induced by the go/no-go decision and the effects of different strengths of correlation between the phase III trials, which are due to the common information emerging from the phase II trial, and give an outlook on the scenarios where multiple endpoints are included in the trials. The proposed method is illustrated by application to different settings typically met in oncology drug development. CS2.3 Assessing optimal designs of phase II/III programs for clinical trials with multiple endpoints in a utility-based framework Kirchner M., Kieser M., Dölger E., Götte H. 2 University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany, 2 Merck KGaA, Darmstadt, Germany The drug development decision of proceeding with phase III is generally an important one as it requires large investment. As failure rate in the late stage of development is historically high, there is a strong need for a structured framework of quantitative decision making. Program-wise planning of phase II and III based on maximizing expected utility has been proposed to optimize two design aspects of phase II: sample size and choice of decision boundaries. This approach is restricted to a single time-to-event endpoint. We generalize this procedure to the setting of clinical trials with multiple endpoints and (asymptotically) normally distributed test statistics. Optimal phase II sample sizes and go/no-go decision rules are provided both for the "all-or-none" and the "at-least-one" win criterion with application to drug development programs in the fields of Alzheimer's disease and oncology. Besides assuming fixed true effects, prior distributions are employed. As the method is computationally extensive, we propose a sampling approach to calculate expected goprobabilities, expected phase III sample sizes and probabilities of success to determine expected utility. CS2.4 Beyond p-values: a phase II dual-criterion design with statistical significance and clinical relevance Scheuer N., Roychoudhury S. 2, Neuenschwander B. Novartis Pharma AG, Basel, Switzerland, 2 Novartis Pharmaceutical Company, East Hanover, United States Background: Proof-of-concept (PoC) trials play a critical role in the clinical development of an experimental therapy. Typically, these trials offer the first read-out of efficacy and lead to one of the following decisions: consider further development (GO), stop development (NO-GO), or seek further information. To achieve that goal, statistical significance is assessed but usually fails to produce efficient decision in absence of clinically-relevant effect estimates. To palliate this, we propose a dual-criterion design which formally combines a statistical and clinical criterion. Methods: The dual-criterion design requires two inputs; a null hypothesis (i.e., no effect) and a decision value (i.e., minimum effect estimate). Unlike the standard design, with statistical significance as the sole success criterion, the decision value is explicit while the study power is implicit. Sample size determination in dual-criterion design requires special attention with regard to operating characteristics (i.e., error rates) and implied study outcomes. Results: We successfully applied the dual-criterion design in oncology Phase II trials with binary and time-to-event endpoints. The evaluation covered a characterization of the decision criteria, sample size, as well as data scenarios and operating characteristics. Yet, despite their apparent simplicity, those designs can be conceptually challenging especially in terms of implementation and communication between the statistician and the trial design team. Conclusions: When properly understood and well executed, dual-criterion design based on statistical significance and clinically relevant effect size improves evidence-based, quantitative decision-making in early phase PoC trials. CS2.5 Clinical trial design to optimize drug development Parke T., Fitzgerald M. 2 Berry Consultants, Abingdon, United Kingdom, 2 Berry Consultants, Austin, United States Clinical trial design in drug development has to balance a number of factors, in particular - time, cost, the likelihood of success and the expected precision of the resulting estimates. Often, in the absence of anything else trial designs are simply set to meet traditional type- error and power targets and a budgetary limit. Design choices will be determined by those well documented human biases like hindsight bias and confirmation bias, to which we are all prey. Perhaps if we could simulate the different trial options and their consequences for subsequent trials, the probability of getting approval and likely net revenue, we could optimize our trial design decisions in an objective way. In this presentation we demonstrate some software that allows us to do exactly this, and share some early lessons learned along the way. 9

94 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS3 Innovative Clinical Trial Designs Wed :30-6:00 Lecture Hall KR 9 CS3. Recently developed designs of late-phase clinical trials: opportunities and challenges Alsop J. Numerus, Wokingham, United Kingdom Clinical trial designs often fail to deliver the data that jointly satisfy the needs of both regulatory and reimbursement authorities. Sponsors usually prioritise the collection of phase III data from explanatory trials over and above the collection of real-world evidence from more pragmatic trials. Moreover, comparing efficacy data obtained in a pre-regulatory approval setting with effectiveness data in a post-approval setting remains a challenge. Various new study designs have been recently developed which aim to address these challenges. Designs such as the cohort multiple randomised controlled trial, integrated efficacy-to-effectiveness trial, and mixed randomised trial will be outlined and their various pros and cons discussed. Examples of where and how these trials have been implemented will be presented, along with their ethical and practical challenges. CS3.2 Journeying through the development of an adaptive designs reporting guidance: findings from Delphi process Dimairo M., Todd S. 2, Julious S., Jaki T. 3, Wason J. 4, Hind D., Mander A. 4, Weir C. 5, Koenig F. 6, Altman D. 7, Nicholl J., Hamasaki T. 8, Proschan M. 9, Scott J. 0, Walton M., Ando Y. 2, Biggs K., Pallmann P. 3, Coates E. University of Sheffield, ScHARR, Sheffield, United Kingdom, 2 University of Reading, Reading, United Kingdom, 3 Lancaster University, Department of Mathematics & Statistics, Lancaster, United Kingdom, 4 University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom, 5 University of Edinburgh, Edinburgh, United Kingdom, 6 Medical University of Vienna, Vienna, Austria, 7 University of Oxford, Centre for Statistics in Medicine, Oxford, United Kingdom, 8 National Cerebral and Cardiovascular Centre, Osaka, Japan, 9 National Institute of Allergy and Infectious Diseases, Bethesda, United States, 0 Division of Biostatistics in the Center for Biologics Evaluation and Research, FDA, Silver Spring, United States, Janssen Pharmaceuticals, USA, United States, 2 Pharmaceuticals and Medical Devices Agency, Tokyo, Japan Introduction: The need to evaluate new health interventions using efficient trial designs has increased in recent years. Adaptive Designs (ADs) are one way to enhance study design efficiency. ADs offer the opportunity to use accruing data within an ongoing trial to modify, or adapt, aspects of that trial while preserving trial integrity and validity. Although ADs appear to offer many advantages, they are not routinely applied due to a number of obstacles that include: lack of practical knowledge; limited access to case studies to learn from; concerns about credibility and potential introduction of bias. Adequate transparent reporting is one of the leading facilitators to address these obstacles. There is no existing reporting guidance for ADs. As a result, deficiencies in their reporting may influence their credibility and limit their ability to inform future related research. We aim to address reporting deficiencies to mitigate some of the obstacles to the use of ADs. Methods: A multidisciplinary international consortium of key stakeholders in clinical trials research was formed to lead the development of a consensus driven reporting guidance for trials that use ADs in the form of a CONSORT extension. As part of the ACE (Adaptive designs CONSORT Extension) project, we are surveying international stakeholders on their perception of the importance of potential reporting items during rounds of the Delphi process followed by a consensus meeting. Results: We talk about the aims of the ACE project, the reporting guidance development process, share results from the Delphi process rounds and lessons learned, and describe the future direction of the project. Discussion: We hope the CONSORT guidance will mitigate some of the obstacles to the use of ADs by enhancing their credibility and helping to improve their reproducibility and replicability by better and more transparent reporting. More so, help researchers design better adaptive trials. CS3.3 Criteria for futility and efficacy evaluation in interim analyses and final evaluation of phase II-trials Kopp-Schneider A., Wiesenfarth M., Witt R. 2, Witt O. 2 German Cancer Research Center, Biostatistics, Heidelberg, Germany, 2 German Cancer Research Center, Clinical Cooperation Unit Pediatric Oncology, Heidelberg, Germany A phase II trial is typically a small-scale study to determine whether an experimental treatment should continue further clinical evaluation. In this setting, interim analyses are commonly performed to allow for early stopping for futility and/or efficacy. The use of Bayesian posterior probability as decision rule for early stopping and for final analysis has been suggested, especially in the context of biomarker-targeted therapies with small numbers of patients. In comparison to traditional hypothesis testing-based approaches the advantage is the flexibility with respect to number and timing of interim analyses. Further, the final number of patients included in the study does not have to be fixed in advance. The INFORM2 phase I/II trial series addresses individualized therapy for relapsed malignancies in childhood using next-generation diagnostics. The trials are one-arm trials with dichotomous endpoint and will include interim futility and/or efficacy evaluations. Sample size is restricted by recruitment rate and duration and hence identical evaluation criteria for interim and final analyses will be used. We will show a workflow for planning of the trials and discuss the choice of the Bayesian model and the prior distributions for decision making. An R package will be presented to evaluate the trials' operating characteristics by analytical calculations and Monte Carlo simulations. CS3.4 Patient-oriented randomization: A new clinical trial design Schulz C., Timm J. 2 PAREXEL International GmbH, Berlin, Germany, 2 University of Bremen, Competence Center for Clinical Trials Bremen, Bremen, Germany A new randomization design for clinical trials, called "patient-oriented randomization design" was developed to counter problems of "classical" randomized controlled trials comparing strategies consisting of different treatments in each strategy in presence of heterogeneity in patient-drug-interactions. The discrepancies between daily clinical perception and results of randomized controlled trials lead to the conviction that the methodological approach of "classical" randomized controlled trials, such as the block randomization 92

95 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed design, is inappropriate in this set-up. The patient-oriented approach of the "CUtLASS" design reflects everyday clinical practice, by allowing for a patient-oriented choice of one treatment of each strategy. The allocation to a strategy is random. However, the results are highly dependent on the physicians preferences. The goal of the patient-oriented randomization design described here is to take an intermediate path between randomized controlled trials and the "CUtLASS" design. The idea of this new design is to randomize two treatment pairs each consisting of one treatment of each strategy in a first step and subsequently, to involve the investigators in deciding for a pair most appropriate to the patients needs. Finally, the allocation to one treatment of the chosen pair will be random. The consideration concentrates on the implementation such as performed in the "NeSSy" study [-2] and on the properties of the patient-oriented randomization designs, which depend mainly on the number of treatments in each strategy. Furthermore, consequences of the patient-oriented decisions of the physicians are considered and the statistical model to proof the benefits of this innovative design are presented. References: [] Schulz C et al. Patient-orientated randomisation: A new trial design applied in the Neuroleptic Strategy Study. Clin Trials 206; 3: [2] Gründer G et al.; The NeSSy Study Group. Effects of first-versus second-generation antipsychotics on quality of life in schizophrenia: a double-blind randomised study. Lancet Psychiatry 206; 3: CS3.5 Quantitative approaches to design and analysis of experimental medicine clinical trials Hicks K., Archer G., Christie J., Miller S., Rigat F. GlaxoSmithKline, Clinical Statistics, Stevenage, United Kingdom Experimental Medicine (EM) clinical trials typically feature a relatively small number of patients and, because the objective of EM is to learn about the biological mechanisms of disease and their interaction with pharmacotherapy, a relatively large number of endpoints. EM trials then, even more than "classical" clinical trials, require careful thinking about their design and interpretation, to avoid the pit fall of post hoc reasoning. Power statements and "Null hypothesis significance testing" are of low utility; we need techniques to assess the fitness of many competing mechanistic models. The Bayesian paradigm is a natural framework for this multivariate set-up; we demonstrate how to use multivariate Bayesian methods to design EM studies (and to estimate their probability of success), to assess the utility of the evidence from such studies, and to support "Go/No-Go" decision-making based on EM study read-outs. CS4 Novel Approaches to Address Missing Data Wed :30-8:00 Lecture Hall KR 7 CS4. Optimal design when outcome values are not missing at random Lee K.M., Mitra R. 2, Biedermann S. 2 University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom, 2 University of Southampton, Mathematical Sciences, Southampton, United Kingdom The presence of missing values complicates statistical analyses. In design of experiments, missing values are particularly problematic when constructing optimal designs, as it is not known which values are missing at the design stage. When data are missing at random (MAR) it is possible to incorporate this information into the optimality criterion that is used to find designs. However, when data are not missing at random (NMAR) this framework can lead to inefficient designs. We investigate and address the specific challenges that not missing at random values present when finding optimal designs for linear regression models. We show that the optimality criteria will depend on model parameters that traditionally do not affect the design, such as regression coefficients and the residual variance. We also develop a framework that improves efficiency of designs over those found assuming values are missing at random. CS4.2 A note on posterior predictive checks to assess model fit for incomplete data Chatterjee A., Xu D. 2, Daniels M. 2 Merck & Co., Rahway, United States, 2 The University of Texas at Austin, Austin, United States We examine two posterior predictive distribution based approaches to assess model fit for incomplete longitudinal data. The first approach assesses fit based on replicated complete data as advocated in Gelman et al. (2005). The second approach assesses fit based on replicated observed data. Differences between the two approaches are discussed and an analytic example is presented for illustration and understanding. Both checks are applied to data from a longitudinal clinical trial. The proposed checks can easily be implemented in standard software like (Win) BUGS/JAGS/Stan. CS4.3 Analysis of longitudinal studies with a degenerate drop-out mechanism: a case study considering a preclinical cancer trial conducted in mice Forman J., Picchini U. 2 University of Copenhagen, Section of Biostatistics, Department of Public Health, Copenhagen, Denmark, 2 University of Lund, Center for Mathematical Sciences, Lund, Sweden Degenerate drop-out mechanisms occur in longitudinal studies in which subjects are removed from the study for ethical reasons when the outcome passes above or below a prespecified level. Missing data from such a mechanism are missing at random, but statistical analysis is nevertheless problematic since any kind of implicit or explicit imputation is a model based extrapolation which can never be validated from the data. In addition standard linear models for longitudinal data are not appropriate since they target the difference in means between two truncated distributions. In this talk I consider a pre-clinical cancer trial where tumors are grown in mice until reching a critial size. I will present the results of two different analysis, one based on a biomathematical model for tumor growth and the other a non-parametric composite end point analysis, and discuss their mutual benifits and shortcomings. 93

96 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS5 Modelling in Early Drug Development Wed :30-8:00 Lecture Hall KR 9 CS5. Integrative modelling of experimental medicine clinical data Rigat F., Best N. GlaxoSmithKline, Quantitative Science - Statistical Innovation, Stevenage, United Kingdom The widening gap between investment in and output of novel medicines is mostly imputable to the difficult translation between preclinical and clinical stages in R&D (Wehling, M. [2008]). Experimental medicine (EM) studies are a novel breed of early clinical trials focussed on breaking this critical bottleneck by learning how treatment changes the specific biomarkers driving clinical outcomes. This talk will demonstrate how Bayesian methods play a pivotal role in the design and analysis of EM trials, by integrating concepts and prior data from quantitative pharmacology and epidemiology into estimable statistical models. A first example will focus on Bayesian inference for multi-compartment pharmacokinetic models using functional uniform priors (Bornkamp, B. [202]) and prediction of the ensuing pharmacodynamic effects. A second example will focus on predicting changes in a categorical clinical endpoint using an informative prior based on first in human EM study biomarker data. The talk will conclude with a perspective on the broader role of Bayesian modelling in the thriving field of model-based drug development. CS5.2 Adjustment of endogenous concentrations in pharmacokinetic modeling Bauer A., Wolfsegger M.J. Shire, Pharmacometrics & Pre-Clinical Biostatistics, Vienna, Austria Estimating pharmacokinetic parameters in the presence of an endogenous concentration is not straightforward as cross-reactivity in the analytical methodology prevents differentiation between endogenous and dose-related exogenous concentrations. Previous simulation studies showed the importance of an adequate model to account for baseline data in pharmacodynamic responses. However, awareness of this issue in pharmacokinetic analyses is often lacking. Endogenous concentrations, when assumed to show no circadian rhythm, are frequently handled by subtracting the baseline (pre-dose) concentration from post-dose concentrations. This procedure ignores analytic assay variability and results in post-dose differences which may have higher variability than would occur in patients without an intrinsic concentration. A constant endogenous baseline can be modeled as a turnover rate using a system of differential equations which is a burden to many common software packages and may be difficult for non-experts to interpret. Instead of modeling the change in concentration over time using differential equations, the paper presents a simple and clear solution how pharmacokinetic models can be fitted directly to the concentration versus time data taking the endogenous baseline adequately into account. The proposed model overcomes the limitations of the approaches mentioned above while its generality allows accounting for different routes of administration, number of compartments, and types of baselines (constant or non-constant). CS5.3 Standards for projects involving modelling and simulation: a proposal and a survey O'Kelly M., PSI Modelling and Simulation Special Interest Group QuintilesIMS, Advisory Analytics, Dublin, Ireland Recent proposals for best practice in modelling and simulation in the U.S. and the European Union will be described. Elements that are essential for effective modelling and simulation will be proposed. Questions addressed will include how best practice fits with the inherently iterative nature of modelling and simulation projects; whether a single best practice could apply to the wide range of projects in the pharmaceutical research that use modelling and simulation, from translational research to health economics; and benefits that could accrue from wider use of best practice in modelling and simulation. An informal survey of publications featuring modelling and simulation will highlight current practice and will raise questions as to how practice might be improved - or are we doing just fine as we are? CS5.4 Modelling approaches in dose finding clinical trial: Simulation-based study comparing predictive performances of model averaging and model selection Buatois S.,2, Ueckert S. 3, Frey N. 2, Retout S. 2, Mentré F. IAME, UMR 37, INSERM, Paris, France, 2 Roche Pharma Research and Early Development, Clinical Pharmacology, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland, 3 Uppsala Universitet, Department of Pharmaceutical Biosciences, Uppsala, Sweden Objectives: In dose finding clinical trials, modeling based approaches require selection of the model (MS) that best describes the data. However, MS ignores model uncertainty which could impair predictive performance. To overcome this limit, model averaging (MA) might be used and has recently been applied to nonlinear mixed effect models (NLMEM). MA allows taking into account the uncertainty across all candidate models by weighting them in function of an information criterion (IC). The objective of this work is to compare predictive performances of MA and MS based on a predefine set of NLMEMs with a same disease progression model and different dose-effect relationships. Methods: Clinical trial simulations were based on a simplified version of a disease model which characterizes the time course of visual acuity (VA) of age-related macular degeneration patients. For each trial, parameters of four candidate models (emax, sigmoid emax, log-linear and linear) were estimated using importance sampling in NONMEM7.3 and several IC were investigated to select a model (MS) or compute weights (MA). The estimation of the minimal effective dose (MED) and the Kullback-Leibler divergence (D KL ) between the true and the predicted distributions of the VA change from baseline were used as performance criteria to compare MS and MA. Results: The overall predictive performance of the MED was better for MA than MS (up to 0% reduction of the RMSE). When looking at the entire dose response profile, the mean D KL was reduced (up to 50%) when using MA compared to MS. Finally, regardless of the modelling approaches, AIC outperformed the others IC. Conclusions: By estimating weights on a predefine set of NLMEMs, MA adequately described the data and showed better predictive performance than MS increasing the likelihood to accurately characterize the optimal dose. 94

97 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS5.5 Modelling the controlled release data of a transdermal patch Erhardt E., Jacobs T. 2, Gasparini M. Politecnico di Torino, Turin, Italy, 2 Janssen Pharmaceutica NV, Beerse, Belgium Objectives: To obtain a model for in-vivo pharmacokinetic (PK) data of a transdermal patch by combination of a compartment model for infusion data with a surrogate for the in-vitro input function. Methods: The parameters of the compartmental model were first estimated with non-linear least-squares estimation. These results served as initial values in the following step, the extension to a non-linear mixed effects model []. The mixed model estimations were conducted in a frequentist way in R and MONOLIX and the fit of the resulting models was compared to the individual data. The chosen model was combined with the approximated input function, as obtained from in-vitro release data, in a system of ordinal differential equations (ODEs). The ODE's parameters were estimated in a Bayesian framework using the software STAN. Results: It was shown that a three-compartment model with linear elimination describes the plasma concentration data of the population adequately. Combining this model with a Weibull input function leads to concentration-time profiles comparable to the in-vivo controlled release observations. Conclusions: The developed combined in-vitro in-vivo model provides a satisfactory estimation of the PK population data of a transdermal patch. The Bayesian framework allows a natural integration of knowledge from one model into the other. This is an extension of the current IVIVC methodology where a frequentist one-stage [2] or two-stage [3] approach is the standard. References: [] Pinheiro J, Bates D (2000). "Mixed-Effects Models in S and S-PLUS." Statistics and Computing. Springer New York. [2] Jacobs T, Rossenu S, Dunne A, Molenberghs G, Straetemans R, Bijnens L. "Combined models for data from in vitro-in vivo correlation experiments." J Biopharm Stat. 2008; 8:97-2. [3] Rossenu, S, Gaynor C, Vermeulen A, Cleton A, Dunne A. "A nonlinear mixed effects IVIVC model for multi-release drug delivery systems." Journal of pharmacokinetics and pharmacodynamics 2008; 35.4: CS7 Multiplicity in Clinical Trials Wed :30-:00 Lecture Hall KR 8 CS7.2 Correlated endpoints in clinical trials: simulation, modelling and extreme correlations Leonov S., Qaqish B. 2 ICON Clinical Research, Innovation Center, North Wales, United States, 2 University of North Carolina, Gillings School of Global Public Health, Chapel Hill, United States Modelling of correlated random variables with pre-specified marginal distributions play an important role in simulation of stochastic processes in a variety of fields, of which clinical trials with multiple endpoints provide an important example. There exist efficient algorithms to address the problem of generating multivariate distributions with given marginals and given correlation structure. For model fitting as well as for simulation, it is important to know the feasible range of pairwise correlations, which can be much smaller than the interval [-; +]. We provide closed-form expressions for extreme correlations for several classes of bivariate distributions that involve both discrete and continuous endpoints, as well as an algorithm for the construction of such distributions in the discrete case. Examples of invalid ranges of correlations reported in the statistics literature are also provided. CS7.3 Optimal rejection regions for multi-arm clinical trials Niewczas J., Burman C.-F. 2, Hampson L. 3, Posch M., König F. Medical University of Vienna, Section for Medical Statistics, Vienna, Austria, 2 AstraZeneca R&D, Mölndal, Sweden, 3 AstraZeneca R&D, Cambridge, United Kingdom In phase II and phase III clinical trials it is common to compare several treatment groups to a control. To adjust for multiplicity popular adjustments include e.g. Bonferroni and Dunnett tests. For these methods marginal tests are used splitting the alpha accordingly. Instead we propose a test statistic for the intersection null hypothesis (i.e., to test if there is any treatment effect) that is a weighted combination of the marginal test statistics of the individual treatment-control comparisons. We derive optimal rejection regions for the test of the intersection hypothesis by applying the Neyman-Pearson lemma for a clinical trial comparing two treatment arms to a common control. To allow for individual treatment-control comparisons this is further embedded into a full closed test by applying a truncated version of the optimal intersection test. Bayesian decision theoretic approaches based on discrete priors are used to derive optimal weights and rejection regions in the context of frequentist testing. We show that if the difference between the assumed effects in the two treatment group becomes larger, the optimal test coincides with a classical hierarchical test in the framework of a closed test. We compare the proposed tests with more conventional ones such as the weighted Dunnett test in terms of power (e.g., disjunctive power or assurance) and efficiency. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No CS7.4 Improving the precision of oncology trials analysis using progression-free-survival as an endpoint Lin C.-J., Wason J. University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom In many oncology trials, patients are followed up until progression or death and the time at which this happens is used as the efficacy endpoint. This is known as progression-free-survival (PFS). Typical analyses consider tumour progression as a binary event, but in fact it is defined by a certain change in tumour size. This additional information on continuous tumour shrinkage at multiple times is discarded. We propose a method to make use of this information to improve the precision of analyses using PFS. We use joint modelling of the continuous tumour measurement, death and progression for other reasons (such as new tumour lesions) to construct survival curves. We present how to compute confidence intervals for quantities of interest, such as the median or mean PFS. We assess the properties of the proposed method by using simulated data and real data from a real phase II cancer trial. We also showcase a R-Shiny app to implement the proposed method. 95

98 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS7.5 On robust two-way MANOVA tests with applications to compositional data Spangl B. BOKU - University of Natural Resources and Life Sciences, Vienna, Austria We propose robust tests as alternative to the classical Wilk s lambda test in two-way MANOVA with interactions. Based on work of Todorov and Filzmoser (200) and Van Aelst and Willems (20) we extend the proposed test statistics for the one-way MANOVA to the two-way case. Monte Carlo simulations are used to investigate the power of the new tests, as well as their robustness against outliers. Finally, we illustrate the use of these robust test statistics on a real data example. CS8 Clinical Trial Design Wed :30-3:00 Lecture Hall KR 7 CS8. Sample size estimation in multi-centre trials Harden M., Friede T. Universitätsmedizin Göttingen, Institut für Medizinische Statistik, Göttingen, Germany Multi-centre trials play an important role in modern evidence based medicine. Advantages of collecting data from more than one location are numerous, including accelerated recruitment and increased generalisability of results. The planning stage is influenced by many variables, such as number of centres, centre sizes, treatment allocation and variability of observations, some of which might be difficult to estimate before realisation of the trial. A fixed- or random-effects model can be applied to account for potential clustering in the data. In the case of rare diseases, many small centres will contribute patients to the study and a random effects model is favourable to account for the heterogeneity between centres. So far, sample size estimation for random effects models is limited to balanced study designs or another fixed patient allocation which is very unlikely to be observed in trials, even if randomisation techniques are used. Since sample size estimation can be biased on false initial assumptions, we propose a sample size estimation procedure including blinded sample size reestimation for multi-centre trials comparing two treatment groups for a normally distributed endpoint, allowing for unequal sample sizes. It is assumed that a block randomisation with fixed block length is used which is stratified by study cites. The resulting sample size formula is analysed based on a simulation study. Unequal centre sizes do not reduce statistical power, but unbalanced allocation does caused by incomplete block randomisation. CS8.2 Optimizing the design of a ring-prophylactic cluster-randomized clinical trial Menten J., D'Hollander N. 2, Vandebosch A. Janssen Research & Development, Beerse, Belgium, 2 Universiteit Gent, Gent, Belgium In infectious diseases, control measures can be used around a primary case to avoid further spread of infection. Ring-vaccination was for example used in the eradication of smallpox. This approach is also used as the basis of clinical trial design for the development of interventions to prevent communicable diseases, such as ebola vaccination. Contacts of an index case are given the prophylactic agent or control in a cluster-randomized manner. The effect of the intervention is determined by the number of secondary infections in intervention versus control clusters. This approach reduces required sample sizes by focusing the intervention to those most at risk. In addition, the indirect effects of the intervention, through breaking the transmission chain, can be assessed. Cluster-randomization may also have practical advantages, may be more acceptable to communities and can avoid contamination bias. On the other hand, clustering of outcomes reduces statistical power compared to individually randomized studies. An important determinant in the efficiency of this design is the size of the clusters around the index cases. In large clusters, subjects may have limited contact with the index case and a at low risk of infection. If clusters are limited to immediate contacts, many clusters may be needed for a sufficient sample size. We explored the optimal design of a ring-prophylactic study in the development of interventions to prevent dengue infection. In a simulation study, we determined the optimal cluster size balancing the cost of adding subjects within clusters versus the cost of initiating new clusters. We assessed adaptive designs with blinded sample size re-estimation, estimating the intraclass-correlation coefficient from a subset of clusters. CS8.3 Pragmatic randomized clinical trials - a survey on designs used in practice Elsäßer A., Gamerman V. 2 Boehringer Ingelheim Pharma GmbH & Co KG., Ingelheim am Rhein, Germany, 2 Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, United States Phase III clinical trials are mainly planned to show efficacy and safety of a new treatment in a homogeneous patient population in a controlled setting. On the other hand there is a need for estimating the treatment effect in a more heterogeneous population in a usual care setting. Real world data (RWD) is often equated with observational trial data. However, real world evidence and randomisation are two fully compatible concepts as highlighted recently by the FDA []. Randomisation is key to ensuring comparable groups in trials with two or more treatment arms. Pragmatic randomised clinical trials (PrCTs) are one way to estimate the treatment's effectiveness. The term pragmatic trial is widely used but a clear definition is still missing. Authors refer to PrCTs as trials using e.g. electronic health records, cluster-randomised trials, simple trials with lean report forms, or trials involving personal devices, or data from social media. There is a broad range of features that can be involved in a PrCT, but no trial can be fully pragmatic [2]. Statisticians need to make themselves familiar with the challenges of RWD, and give key input in the design and analysis of pragmatic trials [3]. We will give an overview on different aspects of PrCTs and present a survey that we conducted to evaluate different PrCT designs in practice based on a EudraCT trial database search in order to propose a definition of PrCTs from the statistical perspective. [] Sherman RE et al.: Real-World Evidence - What Is It and What Can It Tell Us?. N ENGL J MED 206; 375(23): [2] Ford I and Norrie J: Pragmatic Trials. N ENGL J MED 206; 375: [3] Califf RM: Pragmatic clinical trials: Emerging challenges and new roles for statisticians. Clinical Trials 206; 3(5):

99 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS8.4 Blinded sample size re-estimation in bioequivalence studies: two case studies in HIV drug development Vanveggel S., De Ridder F., Crauwels H. 2, Adkison K. 3 Janssen Pharmaceutical Companies of Johnson & Johnson, Statistics and Decision Sciences, Beerse, Belgium, 2 Janssen Pharmaceutical Companies of Johnson & Johnson, Global Clinical Pharmacology, Beerse, Belgium, 3 ViiV Healthcare, Clinical Pharmacology, Research Triangle Park, United States Bioequivalence studies in drug development are conducted to establish equivalence of drug exposure between two drug products, e.g. fixed-dose combination of approved drugs versus single agents, within boundaries that take into account the physiologically intrinsic between-occasion variability. The study sample size, and associated power for bioequivalence studies in a typical 2x2 cross-over design depends on the assumed geometric mean ratio (GMR) and within-subject variability (CVw%) of the pharmacokinetic (PK) parameter(s) of interest (AUC, C max ). Estimates for these values are usually derived from previous studies, e.g. relative bioavailability studies. However, either or both of these values may be uncertain. We present a way to mitigate the risk when dealing with uncertainty about the variability by using an adaptive design, i.e. a sample size re-estimation (SSR) procedure while ignoring treatment effect (blinded), as proposed by Golkowski et al (204). Through simulations, the potential benefit over a fixed sample size design will be shown. The impact of this procedure on type error as well as power will also be discussed. This blinded SSR procedure was implemented in two recent bioequivalence studies in HIV drug development, each for a fixed-dose combination product, results of which will be presented: while for one study no increase was warranted, interim data in the other study resulted in a sample size increase of ~37%. Finally, a post-hoc comparison versus an alternative two-stage design will be presented. CS8.5 A Bayesian adaptive design for clinical trials in rare diseases Williamson S.F., Jacko P. 2, Villar S. 3, Jaki T. Lancaster University, Mathematics and Statistics, Lancaster, United Kingdom, 2 Lancaster University, Management Science, Lancaster, United Kingdom, 3 MRC Biostatistics Unit, Cambridge, United Kingdom Development of treatments for rare diseases is challenging due to the limited number of patients available for participation. Learning about treatment effectiveness with a view to treat patients in the larger outside population, as in the traditional fixed randomised design, may not be a plausible goal. An alternative goal is to treat the patients within the trial as effectively as possible. Using the framework of finite-horizon Markov decision processes and dynamic programming (DP), a novel randomised response-adaptive design is proposed which maximises the total number of patient successes in the trial. Several performance measures of the proposed design are evaluated and compared to alternative designs through extensive simulation studies using a recently published trial as motivation. For simplicity, a two-armed trial with binary endpoints and immediate responses is considered. However, further evaluations illustrate how the design behaves when patient responses are delayed, and modifications are made to improve its performance in such trials. Simulation results for the proposed design show that: (i) the percentage of patients allocated to the superior treatment is much higher than in the traditional fixed randomised design; (ii) relative to the optimal DP design, the power is largely improved upon and (iii) the corresponding treatment effect estimator exhibits only a very small bias and mean squared error. Furthermore, this design is fully randomised which is an advantage from a practical point of view because it protects the trial against various sources of bias. Overall, the proposed design strikes a very good balance between the power and patient benefit trade-off which greatly increases the prospects of a Bayesian bandit-based design being implemented in practice, particularly for trials involving rare diseases and small populations. CS9 Observational Studies Wed :30-3:00 Lecture Hall KR 8 CS9. Causal networks of dietary behaviour in 2-6 year old European children Foraita R., Didelez V., Börnhorst C., Reisch L. 2, Gwozdz W. 2, Lissner L. 3, Krogh V. 4, Siani A. 5, Veidebaum T. 6, Tornaritis M. 7, Page A. 8, Moreno L. 9, Molnar D. 0, Pigeot I., on behalf of the I.Family consortium Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany, 2 Copenhagen Business School, Department of Intercultural Communication and Management, Copenhagen, Denmark, 3 University of Gothenburg, Section for Epidemiology and Social Medicine, Gothenburg, Sweden, 4 Fondazione IRCSS Istituto Nazionale dei Tumori, Milan, Italy, 5 National Research Council, Institute of Food Sciences, Unit of Epidemiology & Population Genetics, Avellino, Italy, 6 National Institute for Health Development, Tallinn, Estonia, 7 Research and Education Institute of Child Health, Strovolos, Cyprus, 8 University of Bristol, Centre for Exercise, Nutrition & Health Sciences, Bristol, United Kingdom, 9 University of Zaragoza, Zaragoza, Spain, 0 University of Pécs, Department of Paediatrics, Pécs, Hungary Childhood obesity is a serious global health problem that might be preventable (among other methods) by improved dietary behaviour on the part of children and adolescents. However, dietary behaviour is influenced by many factors, including the family environment, early life factors, taste preference, media consumption and physical activity. Hence, the aim of the study is to investigate the association network of dietary behaviour and its potential determinants from childhood to adolescence and to reveal important variables across these networks. A total of 0,794 children from eight European countries of the IDEFICS-I.Family cohort followed for 5-7 years with participation in at least two of three surveys and not exceeding more than 50% missing values were included in the analysis. The sample was stratified into three "birth cohorts" according to the child's age at baseline. Imputation was performed to account for missing data. Causal search algorithms are applied to estimate the relationships among dietary intake and eating habits, early life factors, anthropometric characteristics, physical activity, media consumption, sleep duration, well-being, the HOMA index, pubertal status, family factors and consumer prices. We present the resulting causal graphs and discuss usefulness and limitations of causal search algorithms in practice. 97

100 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS9.2 Effects of spatial variation on the association of depressive symptoms and green vegetation in the Heinz-Nixdorf-Recall (HNR)study Djeudeu D., Ickstadt K., Moebus S. 2 TU Dortmund, Faculty of Statistics, Dortmund, Germany, 2 Institute of Medical Informatics, Biometry und Epidemiology, Center of Urban Epidemiology (CUE), Essen, Germany We analyse longitudinal data from 4,708 participants (45-75 years) of the ongoing HNR study, collected in three adjacent cities (Bochum, Essen, Mülheim/Ruhr). Participant s spatial locations (residential addresses and districts) are available. Greenness, measured by the Normalized Difference Vegetation Index (NDVI) in a 00m buffer around the residence may be negatively associated with depressive symptoms measured by CES-D scores as suggested by some studies, mainly in cross sectional design. To gain more knowledge about the association between depressive symptoms and greenness and to take the hierarchical structure of the data into account, we model the longitudinal and spatial effects and assess the impact on the estimation of covariate effects, especially of the greenness. We fit a two level hierarchical linear model (mixed-effect model) to describe the relationship between depressive symptoms and a set of selected covariates. In this model, the individuals are the higher level of the hierarchy, with the repeated measurements nested in. The results suggest that greener residential environments may decrease depressive symptoms in the population. Then we use a hierarchical linear model with the 08 districts as an additional third and highest level, with each individual nested in one of them. The effects of greeneess as well as some covariates vary across districts and the association of depressive symptoms with greenness becomes weaker. The estimated district effects remain spatially correlated: The multilevel approaches account just for between-areas correlation, neglecting spatial connections between them. We aim to extend our multilevel model to introduce spatial interaction effects. CS9.3 A simulation study on implementing marginal structural models in an observational study with switching medication based on a biomarker Kim H., Cable G. PAREXEL International, Billerica, United States Assessing treatment effectiveness in longitudinal observational data can be complex because treatments are not randomly assigned and patients can switch treatment depending on changes in confounder. Hence, there can be some confounding of the effect of treatment by a time-varying variable which is affected by previous exposure and can also influence subsequent treatment changes. Precision medicine relies on validated biomarkers to better classify patients by their probable response to treatment. Biomarkers may be time-varying dependent confounders that are affected by prior treatment in the evaluation. However, measurement errors and missing data for confounders are unavoidable with such data and the impact of switching medications based on the biomarker has received less attention. Marginal structural model estimations are often employed to obtain coefficients to create weights for each observation so that treatment exposure is not temporally confounded. We conducted simulation studies to explore bias in the estimation under various scenarios. Putting the model misspecification problem aside, bias is severe when multiple times of switching along with measurement error and missing in the covariates are allowed. A customized approach and study design is required to assess causal treatment effects where covariate-dependent treatment is designed. CS9.4 Goodness of fit test for linear regression Blagus R., Stare J. University of Ljubljana, Institute for Biostatistics and Medical Informatics, Ljubljana, Slovenia Linear regression is a fundamental statistical tool for analyzing experimental and observational data for a continuous outcome. Model misspecification seriously affects the validity and efficiency of regression analysis, therefore model checking plays an important role. We review the existing goodness-of-fit tests which are based on the cumulative sum (cumsum) of the model s residuals and propose a novel test. The tests based on cumsum processes formalize and objectify the graphical procedures where the residuals are plotted versus the fitted values where one looks for systematic patterns. The problem when using cumsum processes based on residuals is that they converge to a zero mean Gaussian processes which has a very complex covariance structure, hence the theoretical results which are available for brownian motion or brownian bridge procesess cannot be applied. We show that using permutations to obtain the p- value can provide a good alternative to the other available procedures, like for example wild bootrap proposed by Stute et al. (998) or simulation approach proposed by Su and Wei (99), which are only asymptotically valid. We show, using an extensive Monte Carlo simlation study, that our proposed test attains correct size with as few as 0 subjects, while being as powerfull as the other available alternatives, which are very liberal with such a small sample size. The results are also illustrated on some real data examples. CS9.5 Confidence interval for the odds ratio in case of joint misclassification of exposure and disease Reiczigel J. University of Veterinary Medicine Budapest, Department of Biomathematics and Informatics, Budapest, Hungary In case-control studies both the exposure and the disease may be subject of misclassification, which may seriously affect the estimate of the odds ratio (OR). Some authors proposed adjustments of the point estimate of OR but no confidence interval procedure has been published yet. Now a profile likelihood CI is proposed for the true OR and its properties are examined by simulation. The method assumes that misclassification probabilities are known, and allows different sensitivity and specificity for the exposure and outcome. Misclassification in exposure and outcome are assumed to be independent. The procedure can also be applied for testing the dependence of two diseases and for quantifying the increase or decrease of the risk of a certain disease given another disease is present. Preliminary simulation results suggest that the profile likelihood CI performs acceptable, it maintains fairly well the nominal confidence level. This research was supported by the Hungarian National Research Fund (grant number OTKA K0857). 98

101 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS20 Diagnostic Trials and Meta-Analysis Wed :30-3:00 Lecture Hall KR 9 CS20. A simple hierarchical exchangeable model for kappa Gasparini M. Politecnico di Torino, Mathematical Sciences, Torino, Italy A very natural model of rater agreement is based on infinite sequences of exchangeable ratings and independent subjects. The result is a simple hierarchical exchangeable model (not necessarily Bayesian) which has not been sufficiently explored in the rater agreement literature. For binary ratings, the population kappa parameter can then be defined properly as the (Pearson) correlation coefficient between the ratings of the same subject by any two raters. For qualitative ratings into more than two classes, kappa itself can be taken as a measure of qualitative correlation between ratings. The sampling counterparts are the kappa statistics as usually defined. First and second order asymptotic distributions can be obtained and used to solve very concrete inferential problems (convergence, confidence intervals) and to address several open issues about the interpretation of kappa raised in the past few years by some authors. CS20.2 Overall unscaled indices for quantifying agreement among multiple raters Jang J.H., Manatunga A., Long Q. 2 Emory University, Biostatistics and Bioinformatics, Atlanta, United States, 2 University of Pennsylvania, Philadelphia, United States The need to study quantify agreement or reproducibility exists is in in medical studies Several unscaled agreement indices, such as total deviation index (TDI) (Lin 2000) and coverage probability (CP) (Lin et al. 2002) are widely recognized for two reasons: () they are intuitive in a sense that interpretations are tied to the original measurement unit; (2) practitioners can readily determine whether the agreement is satisfactory by directly comparing the value of the index to a prespecified tolerable coverage probability or distance. However, these indices are only defined in the context of comparing two raters. In this presentation, we introduce a series of overall unscaled indices that can be used to evaluate agreement among multiple raters. We provide the definitions of overall indices and propose inference procedures in which bootstrap methods are used for the estimation of standard errors. We assess the performance of the proposed approaches by simulation studies. Finally, we demonstrate the application of our methods via application to a renal study. CS20.3 Hierarchical group testing with multiplex assays in heterogeneous populations Bilder C., Tebbs J. 2, McMahan C. 3 University of Nebraska-Lincoln, Department of Statistics, Lincoln, United States, 2 University of South Carolina, Department of Statistics, Columbia, United States, 3 Clemson University, Department of Mathematical Sciences, Clemson, United States Testing individuals for infectious diseases is important for disease surveillance and for ensuring the safety of blood donations. When faced with questions on how to test as many individuals as possible and still operate within budget limits, public health officials often use group testing (pooled testing) with multiplex assays (multiple-disease tests). The testing process works by amalgamating specimens from individuals (e.g., blood, urine, or saliva) into groups and then applying a multiplex assay to each group. For low disease prevalence settings, the majority of these groups will test negatively for all diseases; thus, greatly reducing the number of tests needed in comparison to individual testing with single-disease assays. For those groups that test positively for at least one disease, algorithms have been developed to retest sub-groups and/or individuals in order to distinguish the positive individuals from those who are negative. The purpose of this presentation is to provide a first-of-its-kind algorithm that incorporates individual risk information into the retesting process for multiplex assays. Through simulation and application, we show that this new algorithm reduces the number of tests needed in comparison to those procedures that do not include individual risk information, while also maintaining sensitivity and specificity levels. CS20.4 Guidance for deriving and presenting percentage study weights in meta-analysis of test accuracy studies Burke D.L., Ensor J., Snell K.I.E., van der Windt D., Riley R.D. Keele University, Research Institute for Primary Care & Health Sciences, Keele, United Kingdom Percentage study weights in meta-analysis reveal the contribution of each study toward the overall summary results, and are especially important when some studies are considered outliers or at high risk of bias. In meta-analyses of test accuracy reviews, such as bivariate meta-analyses of sensitivity and specificity, percentage study weights are usually expressed relative to the study sample size or to the standard error of study-specific estimates of logit sensitivity and logit specificity. In this presentation, we explain why these approaches give incorrect study weights, and adopt an alternative method based on a decomposition of Fisher's information matrix. This method also generalises to a bivariate meta-regression, so that percentage study weights can be derived for estimates of study-level modifiers of test accuracy, as well as for overall summary sensitivity and specificity results. We illustrate the method with two meta-analyses examining test accuracy: one of ear temperature for diagnosis of fever in children; and the other of positron emission tomography for diagnosis of Alzheimer's disease. These highlight that the percentage study weights based solely on sample size can be hugely inaccurate, and should no longer be used. We suggest that the proposed percentage weights should be presented routinely on forest and ROC plots for sensitivity and specificity, to provide transparency of the contribution of each study in test accuracy meta-analyses. CS20.5 Improved random-effects meta-analysis of limits of agreement Vock M., Mittlböck M. 2 University of Bern, Institute of Mathematical Statistics and Actuarial Science, Bern, Switzerland, 2 Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Vienna, Austria A simple type of method comparison studies is based on the analysis of within-subject differences of two measurement methods. Often, the results of a single study are presented as limits of agreement. Williamson et al. (Statistics in Medicine, 2002) introduced three procedures for calculating limits of agreement based on several such method comparison studies. The procedure that has been used most often in applications consists in the random-effects meta-analysis according to DerSimonian and Laird being applied to the mean differences and the standard deviations of differences from the different studies. When the intention is to use the resulting overall limits of agreement as a range of differences between the two measurement methods 99

102 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed that have to be expected in future applications, we show that the existing procedure does not duly take into account the between-study variance of the random-effects model. We propose a modification of the procedure that provides more realistic overall limits of agreement for the intended interpretation, and we investigate and discuss its properties. CS23 Complex Regression Models Tue :30-3:00 Lecture Hall KR 7 CS23. Bayesian variable selection for structured high-dimensional covariates with survival outcome Madjar K., Zucknick M. 2, Rahnenführer J. TU Dortmund University, Dortmund, Germany, 2 University of Oslo, Oslo, Norway In cancer research the identification of prognostic biomarkers and the prediction of a patient's risk play an important role. When a patient cohort is heterogeneous due to known subgroups such as clinical factors (e.g. tumor stages or histologies) or different studies this information should be taken into account in the model building process. We assume that there are common predictors with a similar effect on the survival outcome across subgroups, subgroup-specific predictors that are only associated with the outcome in a certain subgroup, and predictors with effects in opposite directions. Our approach favors the selection of variables with a joint effect but also allows the detection of subgroup effects. The relationship among the predictors is not known a priori and inferred through a variable selection prior. We apply our approach in the context of the Cox proportional hazard model with gene expression data as predictors and time-to-event endpoint. CS23.2 Using groups of variables for prediction Hummel M., Hielscher T., Kopp-Schneider A. German Cancer Research Center (DKFZ), Division of Biostatistics, Heidelberg, Germany In high-dimensional settings dimension reduction is an essential issue. We propose to consider grouping of variables as a way of dimension reduction, where groups can be defined either by domain knowledge (e.g. gene regulatory pathways) or data-driven (e.g. by clustering). Analyzing groups instead of individual variables potentially provides more meaning and better interpretability and might increase the sensitivity for selecting smaller but consistent effects. Further, correlation among variables complicates analyses, and therefore regarding aggregates of correlated features as analysis units can be beneficial from a statistical point of view. In this talk we focus on using variable groups for prediction and classification. For each group, a "prototype" could be chosen among group members as a predictor for the outcome of interest. Alternatively, the members of a group can be summarized e.g. by averaging or projecting onto principal component directions, and the resulting "meta variable" would serve as the predictor. We evaluate existing and new approaches for creating group representatives in the light of prediction performance. The methods are compared via simulation studies among each other and with the widely used univariable approach consisting of pre-screening and building prediction models on the selected features. An important aspect in the comparisons is whether or not a method is "supervised", in the sense that it makes use of the outcome for selection or creation of the predictors. In real datasets prediction performance is estimated by resampling techniques. Further, they help to address potential gain in interpretability when using group-based strategies. CS23.3 A marginalized two part beta regression model for microbiome compositional data Liu L., Chai H. 2,3 Northwestern University, Preventive Medicine, Chicago, United States, 2 Northwestern University, Chicago, United States, 3 Shandong University, Jinan, China Human microbial communities are associated with many human diseases. One goal of human microbial studies is to detect abundance differences across clinical conditions and treatment options. However, the microbiome compositional data (denoted by relative abundance) are highly skewed, bounded in [0, ), and often with many zeros. A two-part model is commonly used to separate zeros and positive values explicitly by two submodels: a logistic model for the probability of microbes being present in Part I, and a Beta regression model for the relative abundance conditional on the presence of the microbe in Part II. However, the regression coefficients in Part II cannot provide a marginal (unconditional) interpretation of covariate effects on the microbial abundance, which is of great interest in many applications. In this paper, we propose a marginalized two-part Beta regression model which captures the zero-inflation and skewness of microbiome data and also allows investigators to examine covariate effects on the unconditional (marginal) mean. We demonstrate its practical performance using simulation studies and apply the model to an Inflammatory Bowel Disease (IBD) study. We find significant treatment effects on the marginal mean of relative abundance of three genera, while the treatment effects are not significant in either Part I or Part II of the traditional two part models. CS23.4 Genetic mapping of multivariate phenotypes in the presence of missing data Ghosh S. Indian Statistical Institute, Human Genetics Unit, Kolkata, India Clinical end-point traits are often characterized by quantitative and/or qualitative precursors and it has been argued that it may be statistically a more powerful strategy to analyze a multivariate phenotype comprising these precursor traits to decipher the genetic architecture of the underlying complex end-point trait. We (Majumdar et al., 205) recently developed a Binomial Regression framework that models the conditional distribution of the allelic count at a SNP given a vector of phenotypes. The model does not require a priori assumptions on the probability distributions of the phenotypes. Moreover, it provides the flexibility of incorporating both quantitative and qualitative phenotypes simultaneously. However, it may often arise in practice that data may not be available on all phenotypes for a particular individual. In this study, we explore methodologies to estimate missing phenotypes conditioned on the available ones and carry out the Binomial Regression based test for association on the "complete" data. We partition the vector of phenotypes into three subsets: continuous, count and categorical phenotypes. For each missing continuous phenotype, the trait value is estimated using a conditional normal model. For each missing count phenotype, the trait value is estimated using a conditional Poisson model. For each missing categorical phenotype, the risk of the phenotype status is estimated using a conditional logistic model. We carry out simulations under a wide spectrum of multivariate phenotype models and assess the effect of the proposed imputation strategy on the power of the association test vis-à-vis the ideal situation with no missing data. 00

103 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS23.5 Modeling of mean-covariance structures in marginal structural models Qu C., Pan J. University of Manchester, Manchester, United Kingdom In epidemiological studies, marginal structural models (MSMs) are used for properly estimating the causal effect of a time-dependent treatment, especially when confounders are present. Estimating the mean structure in the marginal structural model framework has been studied for a long time period, but there has been little research conducted on modelling of variance or covariance structures. According to the generalised estimating equations (GEE) approach of Zeger and Liang (986), Hernan, Brumback and Robins (2000) suggested a selected covariance structure such as compound symmetry and AR() etc. However, questions arise whether the assumed covariance structure is indeed correct and what the consequences might be otherwise. In this research, we propose to use the inverse probability weighted generalized estimating equations (WGEE) approach to model the mean and covariance structures, simultaneously. These models allow for appropriate adjustment for confounding. The proposed WGEE approach yields unbiased estimators for both the mean and covariance parameters for longitudinal data with confounders. We demonstrate the use of the proposed approach in simulation studies and a real data analysis. CS24 Clinical Trial Designs for the Development of Targeted Therapies Tue :30-8:00 Lecture Hall HS 4 CS24. A novel adaptive enrichment clinical trial design for predictive biomarkers Parashar D., Stallard N. University of Warwick, Statistics and Epidemiology Unit, Coventry, United Kingdom There has been a surge in designing clinical trials based on the assumption that a biomarker is predictive of treatment response. Patients are stratified by their biomarker signature, and one tests the null hypothesis of no treatment effect in either the full population or the targeted subgroup. However, in order to verify the predictability of a biomarker, it is essential that hypothesis be tested in the nontargeted subgroup too. In a Phase IIB oncology trial with progression free survival (PFS) endpoint, the data obtained can inform the Phase III design whether to restrict recruitment to just the targeted subgroup or not. We propose a two-stage randomised Phase II population enrichment trial design, with PFS as the primary endpoint comparing an experimental drug with a control treatment. We adaptively test the null hypotheses of hazard ratios in both the targeted as well as the non-targeted subgroups, with strong control of the familywise error rate. It is assumed that the hazard ratio of the targeted subgroup is much less than that of non-targeted, since the drug is expected to be more beneficial for the biomarker-positive subpopulation. Simulations for an example trial in non-small cell lung cancer show that the probability of recommending an enriched Phase III trial increases significantly with the hazard ratio in the non-targeted subgroup, and illustrate the efficiency achieved. Our adaptive design testing first in the non-targeted subgroup followed by testing in the targeted subgroup for a randomised controlled trial constitutes part of the proof of a biomarker's predictability. CS24.2 A conditional error function approach for adaptive enrichment designs with multiple nested subgroups accounting for uncertainty in variance estimation Placzek M., Friede T. University Medical Center Göttingen, Department of Medical Statistics, Göttingen, Germany Adaptive enrichment designs offer an efficient way to perform a subgroup analysis while controlling the type-i-error rate. Frequently used testing strategies include, e.g. the combination test approach (Brannath et al, 2009; Jenkins et al, 20) or the conditional error function approach (Friede et al, 202; Stallard et al, 204). Here we focus on the latter and present some extensions to the CEF approach. Instead of one subgroup only, we allow for multiple nested subgroups. For normally distributed endpoints we drop the assumption of known variances across the populations and derive methods for calculating the conditional error and testing via the joint distribution of standardized test statistics. Since we estimate the variances we can no longer use simple normal distributions and hence apply some ideas of Posch et al. (2004). We show exact results where possible and analyze approximations by presenting simulation results. The proposed methods are motivated and illustrated by an example. CS24.3 Bias-adjusted estimation of treatment effect in adaptive enrichment designs Benner L., Kunzmann K., Kieser M. University of Heidelberg, Institute of Medical Biometry and Informatics, Heidelberg, Germany In recent years there has been increased interest in personalized medicine. Adaptive enrichment designs provide a useful option to deal with uncertainties regarding the treatment effect in different patient populations. For example, in oncology trials it is frequently expected that the treatment may be more efficient in a specific subgroup as compared to the full patient population. We consider a two-stage adaptive enrichment design. Thereby, patients from the full population are enrolled in the first stage. In an interim analysis, the most promising population (full population or prespecified subgroup) is selected based on the effects observed in the first stage. In a subsequent second stage, only patients from the selected population are enrolled. Finally, the treatment effect is estimated based on data observed in stage one and two. Since the selection of the target population depends on the observed effects in the interim analysis, the commonly used maximum likelihood estimator is biased. For the situation of a normally distributed outcome, we present five alternative estimators which were originally proposed in the field of treatment selection and are here transferred to the setting of enrichment designs. In a simulation study, we investigate their performance regarding bias and root mean squared error for various effect sizes, interim analysis timings, and different selection rules. CS24.4 Robustness of testing procedures for confirmatory subgroup analysis based on a continuous biomarker Graf A., Wassmer G., Friede T. 2, Gera R. 2, Posch M. Medical University of Vienna, Vienna, Austria, 2 University Medical Center Göttingen, Göttingen, Germany With the advent of personalized medicine, clinical trials studying treatment effects in sub-populations have raised more and more attention. The objectives of such studies are, besides demonstrating a treatment effect in the overall population, to identify subgroups, 0

104 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed based on biomarkers, where the treatment has a positive effect. To give an example, there is a large discussion whether biomarkers have an influence on the outcome of treatments in patients with depression. Although a number of treatment options for such patients are available, no single treatment is universally effective. Continuous biomarkers are typically dichotomized based on thresholds to define biomarker-low and biomarker-high subgroups. Since the true dependence structure of the outcome on the biomarker is unknown, several thresholds are investigated. The nested structure of the resulting subgroup test statistics is similar to the structure of the sequence of cumulative test statistics in group sequential trials. Hence it might be appropriate to use critical boundaries from group sequential designs. However, due to additional potential prognostic effects of a biomarker it is not clear whether such boundaries guarantee control of the family-wise error rate. We investigate the robustness of these testing procedures and propose hypothesis tests that control the family-wise error rate under minimal assumptions. CS24.5 Penalized likelihood ratio rest for generalized linear models with an unknown biomarker cutpoint in clinical trials Gavanji P., Chen B. 2, Jiang W. Queen's University, Mathematics and Statistics, Kingston, Canada, 2 Queen's University, Queen's Cancer Research Institute, Kingston, Canada In clinical trials, the main objective is investigating the treatment effects on patients. However, many molecularly targeted drugs or treatments tend to benefit a subset of patients more, identified by a certain biomarker. The cut-point value defining patient subsets is often unknown. We are interested in testing the biomarker main effect and treatment-biomarker interaction effect in a generalized linear model to see if the new treatment benefits all patients in the same way or not. For an unknown biomarker cut-point, the generalized linear model can be viewed as a mixture of regression models, for which the regularity conditions for traditional likelihood methods are not satisfied. To get over challenges, we first approximate the indicator function, defining the biomarker subsets, by a smooth continuous function. We then introduce a penalized likelihood method to overcome irregularities. Unlike typical penalized likelihood methods, which consider fixed penalty term with a tuning parameter, we consider a new idea of using random penalty term. Adding random penalty term and proposing a new set of regularity conditions help us to study the properties and limiting distributions of the maximum penalized likelihood estimates of the parameters. We further prove that the penalized likelihood ratio test statistic has an asymptotic chi-square distribution with 3 degrees of freedom under the null hypothesis. Through extensive simulation studies, we find that the proposed test procedure works well for hypothesis testing.. The proposed method is applied to a clinical trial of prostate cancer with the serum pro-static acid phosphatase (AP) as a biomarker. CS25 Meta-Analysis Thu :30-8:00 Lecture Hall KR 8 CS25. Some general points on the I 2 measure for heterogeneity used in meta-analysis Böhning D. University of Southampton, Southampton, United Kingdom Meta-analysis has developed to be a most important tool in evaluation research. Heterogeneity is an issue that is present in almost any meta-analysis. But heterogeneity is not equal to heterogeneity as the magnitude of heterogeneity differs across meta-analyses. In this respect, Higgins I 2 measure of heterogeneity has emerged to be one of the most used and, potentially, one of the most useful measures as it provides quantification of the amount of heterogeneity involved in a given meta-analysis. I 2 is conventionally interpreted, in the sense of a variance component analysis, as the proportion of total variance due to heterogeneity. However, this interpretation is not entirely justified as the second part involved in defining the total variation, usually denoted as s 2, is not an average of the studyspecific variances, but in fact some other function of the study-specific variances. We show that s 2 is asymptotically identical to the harmonic mean of the study-specific variances and, for any number of studies, is at least as large as the harmonic mean with the inequality being sharp if all study-specific variances agree. This justifies, from our point of view, the interpretation of explained variance, at least for meta-analyses with larger number of component studies or small variation in study-specific variances. CS25.3 The influence of mortality time-points on pooled effect estimates in critical care metaanalyses Roth D., Koenig F. 2, Herkner H. Medical University of Vienna, Department of Emergency Medicine, Vienna, Austria, 2 Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Vienna, Austria BACKGROUND: There is an on-going debate among meta-analysis methodologists and statisticians whether it is appropriate to pool mortality estimates from clinical trials that used mortality outcomes ascertained at different time-points. If the relative effects vary over time, which might especially be the case in critical care, standard pooling of studies with different follow-up times within one metaanalysis would not be justifiable. OBJECTIVES: Describe the current practice of dealing with different mortality time-points and analyze the influence of different time points on pooled effect estimates in actual Cochrane critical care meta-analyses. METHODS: The Cochrane Database of Systematic Reviews was searched for critical care-reviews. Meta-analyses were recalculated using all described strategies and influence of such strategies on deviation of pooled effect estimates compared to a "use last time-point available"-approach was analyzed using meta-regression and multilevel mixed-effects linear regression. RESULTS: 835 reviews were evaluated, 80 meta-analyses of 298 studies, representing 07,605 patients were included. 49(6%) reviews did not state any strategy, 9(%) used separate analyses for each timepoint, 9(%) used the last available, 6(8%) used a closest to defined time-point, 3 (4%) performed separate analyses for last and predefined, 2(3%) mixed some, (%) computed predefined time-points from study-data, and (%) pooled all but performed a sensitivity analysis. Among 388 recalculated meta-analyses no influence of the strategies "pool short-, middle-, long-term", "use closest to defined" and "separate" on effect estimates was found compared to "use last available". CONCLUSIONS: Reviews use a large variety of strategies to deal with different mortality time-points, however more than 50% don t even report any strategy for this problem. In summary, we found no influence of different strategies on effect estimates in critical care Cochrane reviews. 02

105 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed CS25.4 Simulation-based power calculations for planning a two-stage individual participant data meta-analysis in trials with continuous outcomes Ensor J., Burke D.L., Snell K.I.E., Hemming K. 2, Riley R.D. Keele University, Staffordshire, United Kingdom, 2 University of Birmingham, Birmingham, United Kingdom Aims: Individual Participant Data (IPD) meta-analyses are often time-consuming and costly. Thus, researchers and funders should consider the statistical power of planned IPD projects. This is non-trivial and depends on numerous factors. We propose simulationbased power calculations using a two-stage IPD meta-analysis framework, and illustrate the approach using a planned IPD metaanalysis of trials of interventions to reduce weight gain in pregnancy. Methods: The simulation approach has four steps: (i) specify an underlying statistical model to generate trials in the meta-analysis; (ii) use readily available information (e.g. from publications) and prior knowledge (e.g. number of studies promising IPD) to specify model parameter values; (iii) simulate an IPD meta-analysis dataset from the model, and apply a two-stage IPD meta-analysis to obtain the summary estimate of interest; (iv) repeat the previous step many times and estimate the power to detect a genuine effect. Results: IPD was promised from 4 trials (83 patients) to examine a treatment-bmi interaction. Using our simulation-based approach, a two-stage IPD meta-analysis has < 60% power to detect a genuine reduction of kg weight gain for a 0-unit increase in BMI. IPD from ten additional trials improves power to over 80%, assuming between-trial heterogeneity was negligible. Incorrect dichotomisation of BMI reduces power by over 20%, equivalent to throwing away IPD from ten trials. Conclusions: Simulation-based power calculations help quantify the power of an IPD meta-analysis project, and thus help establish if and how IPD meta-analyses should be initiated. Power calculations should be used routinely when planning and funding IPD metaanalysis projects. 03

106 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed EMA Symposium E0 ICH E9(R): Estimands A New Framework for Clinical Trials Fri :45-4:5 Lecture Hall HS Chair(s): Christine Fletcher, Ann-Kristin Leuchs Panelist(s): Hans Ulrich Burger, Geert Molenberghs, Steven Teerenstra Discussant(s): Mouna Akacha After nearly 20 years an addendum is being written for ICH E9 on the topic of estimands and sensitivity analyses. Estimands define treatment effects of interest taking into account that patients deviate from a clinical trial protocol for example by taking a rescue medication or switching to a different treatment. Current practices in how treatment effects are estimated result with estimands not reflective of the trial objectives. A new framework is being introduced that will align clinical trial objectives with the estimand(s) of interest which will inform the trial design and methods for estimating treatment effects including appropriate and aligned sensitivity analyses. This session will provide the latest thinking from the ICH E9 working group on the addendum. Regulators and Industry colleagues will share case studies of how the new framework will improve clinical trial design, conduct and analysis. A panel session will provide the opportunity for the audience to ask questions and allow for further discussions. E0.3 ICH E9(R): Estimands -- A new framework for clinical trials Fletcher C. Amgen Ltd, Global Biostatistical Science, Cambridge, United Kingdom In recent years, there have been different perspectives emerging between regulators and Industry in terms of what estimand, the treatment effect to be estimated, is of primary interest in a clinical trial. The choices made in the study design and planned statistical analyses of how to deal with intercurrent events, such as non-adherence, missing data, use of rescue medication, and deaths occurring in the study impacts what treatment effect is actually being estimated in the clinical trial. In addition, sensitivity analyses conducted to support the conclusions from a clinical trial can sometimes be misaligned with the estimand of interest, leading to difficulties in interpretation of the trial results. The new addendum to ICH E9 on estimands and sensitivity analyses introduces a new framework for trial design, conduct, analysis and interpretation for clinical trials. In the new framework the first step is to ensure there is a clearly defined clinical trial objective. The trial objective will then lead to defining the estimand of interest which will influence the choice of trial design. The estimand of interest will lead to defining appropriate methods for analysis, and to define sensitivity analyses that are also aligned to the estimand of interest. This presentation will review key aspects of the new addendum including examples illustrating how to use the new framework in designing clinical trials. E0.2 Estimands in oncology Teerenstra S.,2 Medicines Evaluation Board, Utrecht, Netherlands, 2 Radboud University Medical Center, Radboud Institute for Health Sciences, Department for Health Evidence, Group Biostatistics, Nijmegen, Netherlands In the framework of estimands, general types of estimands have been proposed, for example, a treatment policy estimand, and an if all (would have) adhered estimand. We will prepare the audience to think about these in the context of oncology trials for marketing authorisation. To this end, we will try to find designs and estimators for such estimands and discuss how reasonable the assumptions are that are needed to make these work. We will share some personal views on (what strengthens) the role of such estimands. E02 Panel Session: Multi-Regional Clinical Trials, ICH E7 and Subgroup Analyses Thu :30-8:00 Lecture Hall HS Chair(s): William Wang Panelist(s): Vibeke Bjerregaard, Aaron Dane, Armin Koch, Claudia Schmoor Drug development has rapidly been globalized. Multi-regional clinical trial (MRCT) for regulatory submission has widely been conducted in the ICH and non-ich regions. In order to harmonize points to consider in planning/designing MRCTs and minimize conflicting opinions, an ICH working group was established in late 204 to create an international guideline for MRCT (ICH E7). This guideline is intended to describe general principles for the planning and design of MRCTs with the aim of encouraging the use of MRCTs in global regulatory submission. The draft ICH E7 has been issued for public comments in 2Q206. This session will start with brief presentations by ICH working group members on ) background/objective/scope of the guidance 2) the basic principles and key statistical considerations and 3) public comments and preliminary responses. Then we will invite panelists and audiences on key issues in the draft guidance. One of the key discussion points will be the subgroup issues in the multi-regional clinical trials, linking the ICH E7 and the EMA sub-group guidance. E02. Comments on regulatory guidance for planning and conduct of subgroup analyses in clinical trials Schmoor C. University of Freiburg, Faculty of Medicine and Medical Center, Clinical Trials Unit, Freiburg, Germany Statistical aspects of subgroup analyses (SGA) in clinical trials are currently discussed in three draft guidelines: the EMA Guideline on the investigation of subgroups in confirmatory clinical trials, the ICH E7 'General principles for planning and design of multi-regional 04

107 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed clinical trials (MRCT)', and the EMA 'Guideline on multiplicity issues in clinical trials'. The intention of the EMA subgroup guideline is to focus on the important subgroups, to reduce the risk of performing abundant analyses, and thus to reduce the risk of erroneous conclusions. Considerations on expected heterogeneity of treatment effects and on factors defining subgroups for separate analyses should be part of every trial protocol. The guideline describes three levels of SGA: ) confirmatory SGA in case of high expectation of heterogeneity, 2) 'exploratory key' SGA in case of plausibility or evidence of heterogeneity, 3) 'truly exploratory' SGA in case of plausibility of homogeneous effects. Level ) SGA are not discussed in detail in the EMA subgroup guideline, they are covered in the EMA multiplicity guideline. The EMA subgroup guideline concentrates on SGA for investigation of the internal consistency of trial results. ICH E7 promotes MRCT with the aim to reduce the number of separate clinical trials in different regions, and to avoid unnecessary duplication of studies. This seems to suggest the application with only one well-planned and well-conducted MRCT with the opportunity and necessity to investigate consistency of results within this trial. This also entails the planning and conduct of appropriate SGA as integral part of the main statistical analyses. Comments provided by the two largest biostatistics societies in Germany - the German Region of the International Biometric Society (IBS-DR) and the German Society for Medical Informatics, Biometry and Epidemiology (GMDS) - on the guidance given in these regulatory documents for planning and conduct of SGA will be presented. E03 Qualifying Novel Methodology Fri :30-2:00 Lecture Hall HS 2 Chair(s): Hans Ulrich Burger, Stephan Lehr Panelist(s): Björn Bornkamp New methodology is an essential contribution of biostatisticians to the future development of clinical research. With the increased complexity of today s clinical development programs, the increased pressure to reduce cost and an overall changing environment (digitalized health records, data sharing ) new methodologies are likely even more important today than in the past. New methodology however comes at least in two steps. The first step concerns the generation of new ideas and the second step deals with how to reach buy in from all other stakeholders (academia, health authorities and payers organizations) that new methodology can also be applied. The second step can usually not be done by one individual or one company anymore, rather requires a large cross industry effort. This session will provide a number of examples how especially the second part may look like, what needs to be done, what kind of procedures or tactics could be followed to make a great idea at the end widely applicable and acceptable by all parties. We will also look at this from the regulatory perspective and from others in industry not involved first hand. E03. The EMA experience with the qualification of novel methodologies Manolis E. EMA, London, United Kingdom The European Medicines Agency offers scientific advice to support the qualification of innovative development methods for a specific intended use in the context of research and development into pharmaceuticals.the advice is given by the Committee for Medicinal Products for Human Use (CHMP) on the basis of recommendations by the Scientific Advice Working Party (SAWP). This qualification process leads to a CHMP qualification opinion or CHMP qualification advice. The talk will present the experience, trends and opportunities from the qualification program in EMA. E03.2 Experiences made with the qualification process of MCP-Mod, an outside experience Bedding A. Roche Products Ltd, Biostatistics, Welwyn Garden City, United Kingdom Following the qualification of MCP-Mod as an efficient method for model based dose-finding there has been an increased interest in adoption of model based methods. This presentation will explore how this qualification has impacted dose finding within Roche Products and will also give experience of other interactions with regulatory authorities. E03.3 Needs for new methodology in Alzheimer's disease Model F., Liu-Seifert H. 2 Roche, Biostatistics, Basel, Switzerland, 2 Eli Lilly, Alzheimer's Disease Global Development Platform, Indianapolis, United States Alzheimer's disease (AD) represents an unmet medical need with unprecedented urgency. The WHO estimates that 47.5 million people worldwide are diagnosed with dementia and that there are 7.7 million new cases every year. AD is the most common form of dementia, accounting for 60% - 70% of cases. One of the key challenges in AD drug development is that disease progression is slow and difficult to measure, resulting in resource-intensive clinical trials that take years and have a high risk of failure. As statisticians, we can help to address some of these key issues around clinical trial design and measurement of disease progression in order to bring effective treatments to patients and their families. The Scientific Working Group for the advancement of AD treatments was formed to tackle some of the AD specific challenges in statistical methodology. It is led by statisticians from industry, academic, and government sectors. The focus of the working group is research collaboration to develop analytical approaches in relevant areas of AD, with the primary goals of advancing the understanding of the disease and enabling the next generation of breakthrough treatments. Work streams include handling of missing data and relevant estimands in pivotal AD trials, endpoints for trials in early AD populations, trial design and analytical methods to establish disease modification, as well as biomarkers and their utilities. In this presentation we will outline some of the statistical challenges in AD clinical trials and how the AD Scientific Working Group strives to support the field in the development of novel clinical trial designs and analysis methodologies. 05

108 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed E04 Matrix Trials Thu :30-8:00 Lecture Hall HS 2 Organizer(s): Cong Chen, Anja Schiel Chair(s): Anja Schiel In this session we will explore methodological and practical challenges arising from new matrix trial designs. We will hear how these new trial designs can enable running trials in difficult disease areas and populations offering new opportunities but at the same time create methodological problems that need to be addressed to allow harnessing the full potential of these innovative designs. E04. Specifying the statistical properties of efficient basket trial designs Cunanan K., Iasonos A., Shen R., Begg C., Gonen M. Memorial Sloan Kettering Cancer Center, New York, United States In the era of targeted medicine, creative yet complex oncology clinical trial designs have emerged in response to the need to rapidly evaluate targeted agents in multiple contexts. One such class of designs has been termed "basket trials", whereby treatment allocation is biomarker-driven rather than disease-driven. In these trials, investigators are essentially screening for specific subpopulations with a given somatic mutation that respond to a drug or combination of drugs. Investigators may be inclined to expect broad efficacy across all baskets at the onset of a trial. Accounting for the correlated efficacy across baskets, using methods such as aggregation, hierarchical or mixture modeling in an adaptive design, can reduce the trial size and duration and improve power to identify individual subpopulations where the drug works, as compared to implementing independent parallel designs in each basket. However, given the design complexity and sophisticated statistical modeling, it is essential that the statistical properties of such novel designs be presented using simple measures that clinical investigators can easily interpret. In this talk, I will present important statistical properties and results from a simulation study of different adaptive designs for basket trials. E04.2 Efficient drug development through platform trials Amit O. Glaxosmithkline, collegeville, United States In 200 the I-SPY2 trial was launched as one of the first platform trials. I-SPY2 is an innovative phase 2 breast cancer trial representing a collaborative approach across many pharmaceutical companies. The benefits of this collaborative approach, which significantly reduces the cost, time and number of subjects required for efficiently bringing new drug therapies to patients is getting the attention of researchers in many different disease areas. A platform trial is a standing clinical trial involving multiple agents and sponsors, enrolling subjects over an extended period of time. Platform Trials are currently being considered across a broad range of diseases. Platform trials have the potential to drive tremendous efficiencies in the development of new therapeutics. Efficiencies are derived from many sources including use of a shared control arm and adaptive randomization. In rare or difficult to enroll populations platform trials offer the opportunity to steer important new therapies into a standing clinical trials infrastructure potentially expediting the availability of badly needed new therapies. Efficiency can also be gained through the use of statistical modeling to formally leverage both historical information and concurrent information in related populations. The various sources of efficiency will be described in more detail and examples will be provided to demonstrate their relative contribution to a more economical paradigm for clinical trials. E04.3 Control of the type I error rate in matrix trials Pétavy F. European Medicines Agency, London, United Kingdom Matrix trials have recently come under the spotlight as being an operational tool to investigate several agent and target combinations in parallel. They are of particular interest to regulatory authorities as potentially enabling a more efficient development and therefore bringing innovative medicines quicker to the market or to a larger number of patients. Characteristics of these types or trials need to be explored in order to assess their advantages and limitations. A potential issue in these trials with multiple target-treatment pairs is multiplicity. The impact of multiple comparisons is one of the most common concerns highlighted by statisticians during regulatory assessment since it can lead to false positive conclusions and in turn debatable regulatory decisions. To that end the control of the type I error rate in specific types of matrix trials will be investigated in several hypothetical scenarios. E05 Bridging the Gap: the Use of Extrapolation in Drug Development and Health Technology Assessment Fri :45-4:5 Lecture Hall HS 2 Chair(s): Lisa Victoria Hampson There is a growing interest in extrapolation as a means of harnessing existing relevant information to increase our understanding of the benefits and risks of new medicines, particularly in difficult to study patient populations, as reflected by the recent concept and reflection papers on extrapolation issued by the European Medicines Agency. With this in mind, there is a need for statisticians to have tools to synthesise existing information and explore their relevance to the research question of interest. This session will explore statistical methods relevant for synthesizing and extrapolating from existing information at different stages of the drug development process, ranging from early and late phase clinical trials to health technology assessment. The session will comprise three invited presentations which will address some of the technical and practical issues of extrapolation. Each presentation will leave plenty of time for discussion to debate the challenges of interpreting results supported by extrapolations particularly when there is uncertainty about underpinning assumptions. 06

109 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed E05. Exchangeability approaches in early development Bailey S., Neuenschwander B. 2, Wandel S. 2 Novartis Institutes for Biomedical Research, Biostatistics and Pharmacometrics, Cambridge, United States, 2 Novartis Pharma AG, Basel, Switzerland While extrapolation plays an important role in assessing consistency of disease course and treatment effect between populations, alternate approaches allow us to use ongoing clinical study data to assess similarity in patient groups within a single protocol. These approaches can potentially accelerate development of new treatments across multiple diseases or subgroups of interest. As diseases are categorized into ever smaller subgroups, the early assessment of (non)exchangeability in both safety and activity is critical to identifying clusters of subpopulations that may be subsequently studied together and, in contrast, those that require separate future development plans. We provide case studies of the exchangeability-nonexchangeability (EXNEX) approach applied to early study designs including a mixed population dose-escalation study in solid tumors, hematologic malignancies and gliomas, a phase II multiindication study with different primary endpoint measures, and extensions into paediatric and multi-regional studies. References:. Neuenschwander, B., Wandel, S., Roychoudhury, S., and Bailey, S. (206) Robust exchangeability designs for early phase clinical trials with multiple strata. Pharmaceut. Statist., 5: E05.2 Incorporating historical control data into the design and analysis of a clinical trial when the outcome data are normally distributed Bennett M., White S.R., Mander A.P. MRC Biostatistics Unit, Cambridge, United Kingdom A standard two-arm randomised controlled trial usually compares an intervention to a control treatment and patients are randomised equally to each treatment. Historical data are often used to design new trials, informing the sample size calculation but are not used in the analysis. When the historical and current control data agree, incorporating the historical data into the analysis could improve efficiency and increase the precision of parameter estimates. However, when the historical and current data are inconsistent, there is a potential for biased treatment effect estimates, inflated type I error and reduced power. When the outcome data are normally distributed, two parameters are required to adequately summarise each of the historical and current control data, the sample mean and variance. A difference in the mean or variance could indicate that the historical and current data represent different populations and the historical data should be discounted in the current trial analysis. We summarise and compare two approaches proposed in the literature for incorporating historical data, power priors and robust mixture priors. We introduce two novel weights to assess agreement based on the joint posterior distribution of the current and historical control means and variances: a probability weight based on tail area probabilities; a weight based on the equivalence of the mean and variance. We present a Bayesian design where the historical data are treated as additional information to increase the power of the current trial while maintaining the current study sample size. The operating characteristics for this design using the proposed weights are compared to the power prior and robust mixture prior approaches. An example illustrates that all methods discount the historical data when there is disagreement with the current controls but the methods differ in the rate at which the historical data are discounted and in their flexibility and implementation. E05.3 Parametric estimation of time-to-event data Schiel A. Norwegian Medicines Agency, Statistics, Oslo, Norway Due to the ever increasing pressure to get promising drugs to patients early, trials unfortunately are designed with limited follow-up and in addition might be terminated early. Time-to-event data have been a backbone of trials in the past, yet their usefulness is lost if only very limited numbers of events are observed and data remain immature. Regulators and HTA agencies both have recognized that this leads to increasing uncertainty around the estimates and the consequences this has on regulatory and reimbursement decision making. While regulators have generally relied on conventional well established analyses of time-to-event data, in the field of HTA due to the specific requirements of pharmacoeconomic modelling, methods such as parametric estimation have gained increasing popularity. The presentation will address the assumptions those estimations are based on, the caveats of their use and their potential to contribute to better informed decisions. E05.4 Using historical data to inform extrapolation decisions in children Wadsworth I., Hampson L.V.,2, Jaki T., Sills G.J. 3 Lancaster University, Mathematics and Statistics, Lancaster, United Kingdom, 2 AstraZeneca, Cambridge, United Kingdom, 3 University of Liverpool, Molecular and Clinical Pharmacology, Liverpool, United Kingdom When developing a new medicine for children, the potential to extrapolate from adult efficacy data is well recognised. However, significant assumptions about the similarity of adults and children are needed for extrapolations to be biologically plausible. One such assumption is that pharmacokinetic-pharmacodynamic (PK-PD) relationships are similar in these different age groups. In this presentation we consider how 'source' data available from historical trials completed in adults or adolescents treated with a test drug, can be used to quantify prior uncertainty about whether PK-PD relationships are similar in adults and younger children. A Bayesian multivariate meta-analytic model is used to synthesise the PK-PD data available from the historical trials which recruited adults and adolescents. The model adjusts for the biases that may arise since these existing data are not perfectly relevant to the comparison of interest, and we propose a strategy for eliciting expert prior opinion on the size of these external biases. From the fitted bias-adjusted meta-analytic model we derive prior distributions which quantify our uncertainty about the similarity of PK-PD relationships in adults and younger children. These prior distributions can then be used to calculate the probability of similar PK-PD relationships in adults and younger children which, in turn, may be used to inform decisions as to whether complete extrapolation of efficacy data from adults to children is currently justified, or whether additional data in children are needed to reduce uncertainty. Properties of the proposed methods are assessed using simulation, and their application to epilepsy drug development is considered. 07

110 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed E06 Panel Session on the Interpretation of Subgroup Analysis in Confirmatory Clinical Trials for Regulatory Decision Making Fri :30-0:00 Lecture Hall HS Chair(s): Christoph Muysers Panelist(s): Aaron Dane, Alex Dmitrienko, Armin Koch, Julien Tanniou, David Wright The session will provide an introductory overview of various subgroup analysis approaches. Furthermore, the current status of the regulatory guideline on subgroup analysis will be highlighted including the discussions which had taken place while the EMA guideline was available as draft. Further guidance e.g. on Multi Regional Controlled Trials or on Enrichment became available in addition to more advanced and specialized subgroup analyses methods. The session will summarize these aspects focussing the impact on subgroup analysis for regulatory decision making. The session finalizes with a panel discussion to reflect different aspects of subgroup analysis. The session will comprise of introductory talks of the panelists and a panel discussion. E07 Opening EMA Symposium - Multiplicity Issues in Clinical Trials: Recent Update of the European Guideline Thu :30-6:00 Lecture Hall HS Chair(s): Andreas Brandt, Frank Bretz The need for multiplicity adjustment in confirmatory clinical trials serving as a basis for regulatory approval is a generally accepted standard. The respective EMA guidance 'Points to consider on multiplicity issues in clinical trials' has been effective since A revision of this guideline accounting for new developments in this field was released for public consultation in 207. This revision will be discussed from different perspectives. After a presentation of the draft guideline and its key issues, an industry view and a view from the IBS working group Adaptive designs and multiple testing procedures on the revision will be given. As the new guidance also covers multiplicity and estimation, approaches for constructing informative confidence intervals consistent with multiple testing procedures will also be presented. E07. Informative simultaneous confidence intervals Schmidt S., Brannath W. University of Bremen, Competence Center for Clinical Trials, Bremen, Germany In multiple testing, interest is in general not only to show superiority or non-inferiority of one or more treatments, but also to quantify the treatment effects. Therefore, simultaneous confidence intervals (SCIs) are considered that control the overall coverage probability. Of course, the SCI should be consistent with the multiple testing procedure in the sense that a hypothesis is rejected if and only if its intersection with the confidence interval is empty. This is not evident for the more powerful stepwise procedures like the Bonferroni- Holm or hierarchical procedures. Existing attempts have the drawback that they are often not informative for rejected hypotheses, i.e., the confidence interval equals the whole alternative hypothesis and therefore gives no information on the effect. In my talk, I will present my research with Werner Brannath on informative simultaneous confidence intervals. These are procedures that modify a large class of multiple tests such that the SCIs are always informative for all hypotheses, while losing only little power compared to the original tests. I will present informative modifications of the Bonferroni-Holm and other stepdown procedures as well as for the class of graphical procedures such as the fixed-sequence test and the fallback procedure. E07.2 Industry viewpoint on the european multiplicity guidance Offen W. AbbVie, Data and Statistical Sciences, North Chicago, United States I will be representing the views of industry. Prior to this session I will be holding some conversations with leading multiplicity experts from industry, and thus my views will not be only my own personal views. Finally, I will point out any differences between the EU Guidance and the FDA Guidance released late in 206. E07.3 Updated regulatory considerations on multiplicity issues in confirmatory clinical trials Benda N. BfArM, Bonn, Germany Recently, a new guideline on multiplicity issues in clinical trials has been issued by EMA as well as a Guidance for Industry on multiple endpoints by FDA. Since the publication of the EMA Points to Consider (PtC) on Multiplicity Issues in Clinical Trials in 2002, methodological advances have been made in more complex multiplicity settings relating to multiple sources of multiplicity, as different dose groups or treatment regimens, interim analyses, multiple endpoints, and different subgroups. Regulatory requirements have been refined referring to the hypothesis framework in confirmatory clinical trials. The presentation will outline the regulatory principles related to multiplicity issues in drug approval and discusses changes in the new guideline document, the role of secondary endpoints and subgroup analyses, potential consistency and interpretational problems, multiplicity issues in estimation and how multiplicity procedures could be used to optimally support the risk benefit assessment. E07.4 New guidances on multiplicity - can they tame the beast? Posch M., Bauer P., König F. Medical University of Vienna, Center for Medical Statistcs, Informatics, and Intelligent Systems, Section for Medical Statistics, Vienna, Austria Multiplicity issues are an intrinsic feature of clinical trials and are an even greater challenge in regulatory decision making, which is based on evidence from multiple trials as well as other sources of information. Recently, EMA and US FDA published regulatory guidance documents focusing on multiplicity issues within a single trial. Both are founded on the concept of strong control of the familywise error rate but also explore the challenges that arise where hypothesis testing reaches its limits, as in the assessment of the 08

111 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed components of co-primary endpoints or the evaluation of safety endpoints. In this talk, we critically discuss the current guidance documents and highlight several issues related to multiplicity, where best practices have not been established yet. These include, e.g., (I) the bias of estimates after selection (e.g. when one of several doses or treatments are selected) that may distort the evaluation of the benefit risk balance and that has been widely ignored so far; (ii) multiplicity control in novel trial designs as umbrella or basket trials, where it is controversially discussed for which family of null hypothesis type I error control is required; or (iii) Bayesian and decision theoretic approaches to address multiplicity issues that have attracted increasing interest in the light of the recent discussions on the interpretation of p-values and the limitations of frequentist hypothesis testing in empirical research. This work has been supported by the EU Porject FP7 HEALTH Asterix. E07.5 Some notes on conditional bias in connection with MCPs Kunz M. Bayer AG, Pharmaceutical Statistics, Berlin, Germany Drug regulatory agencies like EMA or FDA require the pharmaceutical industry to conduct confirmatory clinical trials as the basis of the drug label which contains a characterization of the drug. The goal of these clinical trials usually is to compare the investigational product with an adequate control to allow to conclude on the presence of a treatment effect and to subsequently quantify this treatment effect. In many situations unbiased estimators for treatment effects are available. However as usually only significant trials make it into the label it may be of interest for a patient to know how this conditional treatment effect compares with the true treatment effect. One can show that the resulting conditional bias is of an acceptable magnitude if the power of a specific endpoint is 80% or higher. This is often the case for the primary endpoint of a trial, but not necessarily for additional primary or secondary endpoints which not all need to be significant to yield a successful clinical trial. Multiple comparison procedures (MCPs) that control the familywise error rate have been developed over the last decades and are e.g. discussed in regulatory guidance documents. However, in the setting of multiple statistical hypotheses the question of conditional bias in analogy to the one hypothesis setting has not yet gained attention. We are investigating this conditional bias for some MCPs often applied in clinical trials. E08 Panel Session: Data Monitoring Committees Scope, Expectations and Challenges for DMC Clinicians and Statisticians Fri :30-2:00 Lecture Hall HS Chair(s): Amit Bhattacharyya Panelist(s): Adam Crisp, Paul Gallo, Lisa LaVange, Geert Molenberghs, Frank Pétavy, Jonathan Seltzer Data Monitoring Committees (DMC) are an integral part of clinical drug development. Their use has evolved along with changing study designs and regulatory expectations, which has implications associated with statistical and ethical issues. Although there is guidance from the different regulatory agencies, there are opportunities to bring more consistency to address practical issues of establishing and operating a DMC. For example, there is wide variability in the understanding of DMC scope, regulatory requirement, expectations and EU regulatory network, independence of DMCs, DMC focus vs CSR reporting mindset, stopping for beneficial impact or harm and its implications, challenges with adaptive designs implications, statistical consideration towards futility, etc. This proposed panel of clinical and statistical experts across academic, industry and regulatory agencies will share their experience and knowledge to address the scope, expectations, relevance and challenges of DMC in order to find a common understanding on the best practices. This session will also be a good platform to hear thoughts from the regulatory agencies related to the revisions in FDA guidance and the EU regulatory network. E08. Panel session: data monitoring committees - scope, expectations and challenges for DMC clinicians and statisticians Bhattacharyya A. ACI Clinical, Biometrics, Bala Cynwyd, United States Data Monitoring Committees (DMC) are an integral part of clinical drug development. Their use has evolved along with changing study designs and regulatory expectations, which has implications associated with statistical and ethical issues. Although there is guidance from the different regulatory agencies, there are opportunities to bring more consistency to address practical issues of establishing and operating a DMC. For example, there is wide variability in the understanding of DMC scope, regulatory requirement, expectations and EU regulatory network, independence of DMCs, DMC focus vs CSR reporting mindset, stopping for beneficial impact or harm and its implications, challenges with adaptive designs implications, statistical consideration towards futility, etc. This proposed panel of clinical and statistical experts across academic, industry and regulatory agencies will share their experience and knowledge to address the scope, expectations, relevance and challenges of DMC in order to find a common understanding on the best practices. This session will also be a good platform to hear thoughts from the regulatory agencies related to the revisions in FDA guidance and the EU regulatory network. Panelists: Jonathon Seltzer, MD President, ACI Clinical, US JSeltzer@aciclinical.com Paul Gallo, PhD Senior Biometrical Fellow, Statistics Methodology Group, Novartis, US paul.gallo@novartis.com Adam Crisp, PhD Director, Metabolic & Cardiovascular Therapeutic Area Statistics Head, GSK, UK adam.x.crisp@gsk.com Geert Molenberghs, PhD Director, I-BioStat & Professor, Hasselt University & KU Leuven, Belgium geert.molenberghs@uhasselt.be Lisa LaVange, PhD Director, CDER Office of Biostatistics, FDA, US Lisa.LaVange@fda.hhs.gov Frank Pétavy 09

112 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Biostatistician, Human Medicines Evaluation, EMA, UK Frank.Petavy@ema.europa.eu E09 Biosimilar Drug Development: Statistical and Regulatory Issues Fri :30-0:00 Lecture Hall HS 2 Organizer(s): Byron Jones, Franz König Chair(s): Byron Jones Discussant(s): Andrea Laslop This session will highlight and discuss the current methodological and regulatory issues in the development of biosimilars. Given that the speakers come from academia, industry and the regulatory agencies, the issues will be covered from a range of perspectives, leading to an interesting and informative session including discussion. E09. Totality of evidence for biosimilars: what's the evidence of totality? Laslop A. Austrian Medicines & Medical Devices Agency, Scientific Office, Vienna, Austria The concept underlying the development of biosimilar medicines is a stepwise approach to demonstrate comparability with the originator. Starting at the analytical and non-clinical level, biosimilarity is established in an extensive series of physicochemical, biological and in vitro functional attributes. With highly advanced and precise techniques available this may be seen as the most sensitive part of evidence generation in the whole comparability exercise. The ensuing comparison at the clinical level aims at confirming biosimilarity in vivo, but due to lower sensitivity cannot rectify earlier failed results at the qualitative and functional level. Typically, a comparative phase I study, mainly in healthy volunteers, is conducted first in order to ascertain similar pharmacokinetics between biosimilar and reference product, before proceeding to phase III where equivalence in efficacy is demonstrated. The PK study also yields comparative pharmacodynamic data, which in some cases provide pivotal evidence for biosimilar efficacy, subject to the condition that a PD surrogate marker exists. Apart from identifying similar PK and efficacy the clinical comparability program serves to collect safety data pre-approval in order to exclude substantial differences in the profile of adverse events, particularly in immunogenicity. The decision to accept a product as being biosimilar according to European regulatory standards is generally drawn on the basis of the totality of evidence derived from a comprehensive data package as described above. Once biosimilarity has been concluded in one indication, extrapolation to other indications is expected, but needs sound scientific justification. Considering the low discriminatory potential of the clinical phase III trial its usefulness may be questioned, especially when PK/PD results together with the quality and nonclinical data establish robust evidence of biosimilarity. Likewise, waiving of the phase III study could enable the development of biosimilar orphan drugs by mitigating feasibility constraints due to the small patient population available. E09.2 Longitudinal assessment of the impact of multiple switches between a biosimilar and its reference product on efficacy parameters Mielke J., Woehling H. 2, Jones B. Novartis Pharma AG, Basel, Switzerland, 2 Sandoz Pharmaceuticals, Holzkirchen, Germany Patients, physicians and health care providers in Europe have more than ten years of experience with biosimilars. However, there are still debates if switching between a biosimilar and its reference product influences the efficacy of the treatment. In this paper we address this uncertainty by developing a formal statistical test that can be used for showing that switching has no negative impact on the efficacy of biosimilars. For that, we first introduce a linear mixed effects model that is used for defining the null hypothesis (switching influences the efficacy) and the alternative hypothesis (switching has no influence on the efficacy). Using this as the foundation of our work, we propose several approaches for testing for changes in the efficacy of the treatment due to switching and discuss the properties of these tests in an extensive simulation study. It is shown that all these methods have advantages and disadvantages and the decision regarding which method is preferred depends on the expectation of a switching assessment. E09.3 Statistical methods for the comparative assessment of quality attributes - challenging the status quo and stimulating further development Lang T., Lehr S. Austrian Medicines & Medical Devices Agency, Vienna, Austria The comparison of quality attributes potentially having an impact on clinical outcome is an integral part in many drug development programs. Examples extend from manufacturing changes, dissolution profile testing in generic drug development to the demonstration of similarity of biological medicinal products in the biosimilarity setting. As regards the latter, similarity on the quality level is the first important milestone to be achieved in a stepwise approach. The comparison of quality attributes is likely the most sensitive part of the whole biosimilarity exercise and is frequently suggested to inform on the amount of additional evidence to be generated at later development stages. Thus, a priori definitions of statistical methods to control the risk of a false similarity conclusion on the quality level become more important, the more these conclusion are expected to carry pivotal evidence in the whole comparison task within a specific biosimilar drug development. There are however numerous limitations hampering statistical inference, among those the usually small number of independent manufacturing batches, the large number of attributes on different measurement scales, the difficulty to define the maximum discrepancy in terms of a quality attribute that does not translate into a relevant impact on clinical outcome, and large assay variability. The presentation will give the European regulator's view on current practices in biosimilarity assessment on the quality level. It will sketch the framework of regulatory decision making on the basis of critical quality attributes, will illustrate some limitations of currently used approaches and, maybe most importantly, aims to stimulate further methodological advancement. The basis of the talk will be the Reflection paper on statistical methodology for the comparative assessment of quality attributes in drug development as currently published on the EMA homepage, open for comments during the ongoing public consultation phase. 0

113 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Award Sessions AS IBS-DR Young Career Awards Wed :30-:00 Lecture Hall HS 4 Chair(s): Tim Friede Promoting scientific young talents is one of the major aims of the German Region of the International Biometric Society (IBS-DR). The German Region thus offers the following two awards: the Gustav Adolf Lienert Award for early career scientists the Bernd Streitberg Award for undergraduate and graduate students. The prizes are awarded for outstanding biometric achievements. The IBS-DR is proud to award three Gustav Adolf Lienert Awards and three Bernd Streitberg Awards this year. The award winners will summarize their work in a series of short presentations. AS. A causal analysis of ventilator-associated pneumonia in intensive care Bühler A., Bluhmki T., Timsit J.-F. 2, Beyersmann J. Ulm University, Institute of Statistics, Ulm, Germany, 2 IAME-U37 University Paris Diderot, Paris, France Hospital-acquired infections increase morbidity of patients and hospital mortality. Because such infections require additional treatment, excess length of hospital stay is commonly used both as a measure of disease burden and in cost-benefit analysis for infection control. Conceptually, associational measures of excess stay are complicated by the fact that hospital-acquired infections are time-dependent exposures. In this work, we use a nested structural accelerated failure time model to investigate the infection impact on length of stay. The approach is illustrated in a real data set on ventilator-associated pneumonia in intensive care patients. We also offer a discussion whether hospital-acquired infections fall into the common causal framework of mimicking randomized treatment decisions. AS.2 Modeling recurrent events in time-to-event analysis using an example of recurrence-free survival of bladder cancer patients Weber F. TU Dortmund University, Faculty of Statistics, Dortmund, Germany In classical time-to-event analysis, e.g. when using the Cox proportional hazards model, every subject can experience at most one event. Several models have been proposed to allow for recurrent events, i.e. multiple events of the same type. In this contribution, three such models - the Andersen-Gill (AG) model, the Wei-Lin-Weissfeld (WLW) model, and the Prentice-Williams-Peterson (PWP) model - are presented and compared with regard to their appropriateness in an observational study investigating relapses of bladder cancer in non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) patients from three German hospitals. The PWP model is chosen to be the most appropriate one. Apart from basic demographic and two non-genetic clinical covariates, two genetic covariates are included: the deletion of both copies of the glutathione S-transferase µ (GSTM) gene (a binary covariate) and the N-acetyltransferase 2 (NAT2) genotype, classified as "rapid", "slow", or "ultra-slow" (referring to the metabolic capacity that is typically associated with this genotype). These two genetic covariates have been proven to be associated with bladder cancer risk, overall survival, and time to first recurrence. In order to account for the correlation of observations belonging to the same subject (analogously to a frailty model) and to include random effects for the covariate "hospital", the PWP model is extended to a "mixed PWP model". After fitting the model, the GSTM deletion and slow NAT2 genotypes suggest a slight non-significant increase in risk of recurrence. The only ultra-slow NAT2 genotype *6A/*6A shows a significant increase in risk of recurrence. A subgroup analysis for never- and ever-smokers indicates no substantial differences between these two groups concerning the effect of a GSTM deletion or the NAT2 genotype. AS.3 Analysing unmeasured baseline covariates in studies with delayed entry using a joint model Stegherr R., Bluhmki T., Bramlage P. 2, Beyersmann J. Institute of Statistics Ulm University, Ulm, Germany, 2 Institute of Pharmacology and Preventive Medicine, Mahlow, Germany The natural choice for `time zero (baseline) in a randomized clinical trial is study entry, in particular, covariate information is available at study entry. In other observational studies study entry may happen after the time origin leading to left-truncated data. One specific example is the analysis of diabetes register-based data, where a relevant timescale is `time-since-first-antidiabetic-medication. Since such data is collected in calendar time, some patients enter the study upon their first medication, but others have a known date of therapy initiation before start of data collection. A relevant baseline covariate would be glycated haemoglobin (HbAc) levels in diabetes patients. The challenge is that such data is typically measured upon study entry and, hence, not at baseline for the left-truncated patients, but, e.g., HbAc will have changed in the random and patient-specific time interval between start of medication and study entry. The problem has been summarized in a letter by Keiding and Knuiman, Statistics in Medicine, Vol.9, (990). We propose a joint model to investigate the impact of a baseline covariate, possibly unmeasured due to delayed entry, on the time to the event of interest. This contrasts with the standard use of joint models where the aim is to analyse the effect of the current value of the covariate on the hazard of an event. Our approach shows proper performance in a simulation study and was applied to data from a German diabetes register to evaluate the effect of HbAc at therapy initiation on the risk of treatment failure.

114 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed AS.4 Multivariate functional principal component analysis for data observed on different dimensional domains Happ C., Greven S. LMU Munich, Department of Statistics, Munich, Germany Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, e.g. functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN (R-package MFPCA). One main focus of the talk will be on the neuroimaging application concerning Alzheimer's disease. The goal here is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. AS.5 Meta-analysis for the comparison of two diagnostic tests to a common gold standard: a generalized linear mixed model approach Hoyer A., Kuss O. German Diabetes Center, Institute for Biometry and Epidemiology, Duesseldorf, Germany Meta-analysis of diagnostic studies is still an ongoing field of statistical research. Especially, several researchers have called for methods to compare different diagnostic tests to a common gold standard. Restricting to two diagnostic tests, in these meta-analyses the parameters of interest are the differences of sensitivities and specificities (with their corresponding confidence intervals) between the two diagnostic tests while accounting for the various associations between the diagnostic tests and across single studies. We propose a statistical model with a quadrivariate response, consisting of the sensitivities and specificities of both diagnostic tests, as a sensible approach to this task. Using a quadrivariate generalized linear mixed model (GLMM) naturally generalizes the common bivariate model of meta-analysis for a single diagnostic test. In case information on several thresholds of the diagnostic tests are available, the quadrivariate GLMM can be further generalized to yield a comparison of full ROC curves. We illustrate our model by an example where two screening tools for the diagnosis of type 2 diabetes are compared. AS.6 Comparing a stratified treatment strategy with the standard treatment in randomized clinical trials Sun H.,2, Bretz F. 3, Gerke O. 4, Vach W.,5 Insttitute of Medical Biometry and Statistics, Medical Center, University of Freiburg, Freiburg, Germany, 2 Grünenthal GmbH, Biostatistics, Aachen, Germany, 3 Novartis Phama AG, Basel, Switzerland, 4 Nuclear Medicine, Odense University Hospital, Odense, Denmark, 5 Department of Orthopaedics & Traumatology, University Hospital Basel, Basel, Switzerland The increasing emergence of predictive markers for different treatments in the same patient population allows us to define stratified treatment strategies based on multiple markers. We compare a standard treatment with a new stratified treatment strategy which divides the study population into subgroups receiving different treatments in randomized clinical trials. Because the new strategy may not be beneficial in all subgroups, the objective should be trying to demonstrate a treatment effect for a subset of subgroups, instead of each single subgroup. We consider an intermediate approach that establishes a treatment effect in a subset of patients built by joining several subgroups. The approach is based on the simple idea of selecting the subset with minimal p-value when testing the subset-specific treatment effects. We present a framework to compare this approach with other approaches to select subsets by introducing three performance measures, i.e., the probability to identify one significant subset, the expected average change in the outcome when patients will be treated by recommended treatment and the fraction of patients with an incorrect treatment recommendation. The framework give us first insights into this simple idea and an effect of preselecting subsets of a certain size. The results of a comprehensive simulation study are presented and the relative merits of the various approaches are discussed. Keywords: stratified treatment strategy, subset analysis, subgroup analysis. AS2 ROeS Arthur Linder Award Wed :05-3:30 Lecture Hall HS 3 Chair(s): Hans Ulrich Burger, Martin Posch 2

115 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Young Statisticians Sessions YSS Young Statisticians Session Thu :30-3:00 Lecture Hall KR 8 Chair(s): Andrea Berghold, Anke Hüls YSS. Predicting treatment response in personalized cancer therapy - method comparison and a neural network approach using prior information Weber D.,2, Zucknick M. 2 Heidelberg University Hospital, Institute of Medical Biometry and Informatics, Heidelberg, Germany, 2 University of Oslo, Department of Biostatistics, Institute of Basic Medical Sciences, Oslo, Norway Melanoma is the most dangerous type of skin cancer. The prediction of drug responses according to mutation profiles is essential in identifying promising treatment approaches for melanoma. Therefore, there is increasing interest in strategies and methods treating this type of data while at the same time reaching good prediction performance. We applied tree-based models, neural networks, logic regression, and generalized linear models with an elastic net penalty in their original form and combined them with different ensembling approaches. This research points out the results of prediction performance, furthermore, it deals with the variable importance of the individual genes and its connection to processes that lead to cancer. A main result is that no method consistently outperforms all competitors across all experiments. Instead, the different cancer compounds seem to favor different drug response prediction models. Moreover, we reach good prediction performance by using independent data sets for training and testing our models in the case of drugs with similar response distribution. Furthermore, we use information about genes that are linked to the drug target through pathway sharing as prior information in neural networks. We discuss whether this information can increase the prediction performance and the possibility to give a better biological interpretation of neural network models. A novel finding is that those neural networks may indicate which target genes make the greatest contribution to the drug response. Therefore, models to predict the drug responses according to the mutation profile are an achievement, which could be a step towards personalized cancer therapy. YSS.2 Open machine learning Seibold H., Casalicchio G. 2, Bossek J. 3, Lang M. 4, Kirchhoff D. 5, Kerschke P. 3, Hofner B. 6, van Rijn J.N. 7, Vanschoren J. 8, Bischl B. 2 University of Zurich, Zurich, Switzerland, 2 LMU München, Munich, Germany, 3 University of Münster, Münster, Germany, 4 TU Dortmund, Dortmund, Germany, 5 FH Dortmund, Dortmund, Germany, 6 Paul-Ehrlich-Institut, Langen, Germany, 7 University of Freiburg, Freiburg, Germany, 8 Eindhoven University of Technology, Eindhoven, Netherlands Conducting research openly and reproducibly is becoming the gold standard in academic research. Practicing open and reproducible research, however, is hard. OpenML.org (Open Machine Learning) is an online platform that aims at making the part of research involving data and analyses easier. It automatically connects data sets, research tasks, algorithms, analyses and results and allows users to access all components including meta information through a REST API in a machine readable and standardized format. Everyone can see, work with and expand other people's work in a fully reproducible way. For example one researcher can upload a data set and specify a corresponding prediction task. Interested people can then run prediction models of their choice using the program of their choice, which will possibly result in a more sophisticated prediction than what the researcher could have done alone. For developers of statistical software OpenML provides an easy way of testing algorithms and of conducting benchmarking studies. I will present the platform OpenML, the R package connecting to OpenML and a use case in which I assess the performance of a new algorithm on many different data sets. This use case highlights the straightforward usage of OpenML and shows the power of openness in connection with reproducibility. YSS.3 A systematic review of cluster-randomized trials regarding complex interventions in general practices - or: why are these trials not effective? Pregartner G., Berghold A. Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria Objective: Given that complex interventions are time-consuming and expensive to both plan and conduct, we performed a systematic review to find out how often this effort pays off and if assumptions made for sample size estimation are unrealistic. Methods: We searched the Central Register of Controlled Trials, MEDLINE and EMBASE for trials in a general practice setting with the practice as unit of randomization and a follow-up period of at least one year that compared a complex intervention to routine care and presented any patient-relevant primary outcome. We extracted information on the primary outcome(s), various quality parameters and the sample size calculation. Results: Of the 29 studies we included in the review, only four (4%) found a significant improvement with the complex intervention after we adjusted for multiple comparisons. This made us question methodological rigor and adequacy of study planning. Therefore, we compared the expected to the obtained values for intra-cluster correlation and effect of the intervention. Of the 22 studies that reported sample size calculations, 20 (9%) considered clustering. We found that intra-cluster correlation coefficients were chosen conservatively in the planning stage for 8/3 outcomes (0 studies) for which a comparison between observed and assumed values was possible. However, for all but one of the 20 outcomes (7 studies) for which an expected treatment effect was provided this expectation was overly optimistic with a median relative reduction of 68%. Conclusion: Even though clustering was adequately considered in most studies, investigators seem overly optimistic with their expectation of the interventions. YSS.4 Reduction of dimensionality by sparse subspace clustering Wilczyński S., Bogdan M., Sobczyk P. 2, Josse J. 3 University of Wrocław, Faculty of Mathematics and Computer Science, Wrocław, Poland, 2 Wroclaw University of Science and 3

116 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Technology, Faculty of Pure and Applied Mathematics, Wrocław, Poland, 3 Ecole Polytechnique, Applied Math Department, Palaiseau, France In many scientific problems such as identification of genetic pathways based on gene expression data, one of the tasks is to find a lower dimensional subspace representing a collection of points from high-dimensional space. One of the simplest methods to achieve this is to use Principal Component Analysis. However, this procedure is useful only if we assume that all points from the data come from the same lower dimensional subspace. In fact, in lots of cases more general model is needed, which assumes that variables come from a mixture model and our high-dimensional space is a union of low dimensional subspaces. The task of finding these subspaces is called subspace clustering. In my bachelor's thesis I investigate a new method of solving this problem. It is based on k-means algorithm, where clusters represent subspaces and the center of one cluster is a set of principal components. To estimate the number of clusters, modified version of Bayesian Information Criterion (BIC) is used, which takes into account a prior distribution on a number of clusters. In each of the iterative step of the algorithm, the number of principal components in a single cluster is estimated using PESEL method and the distance between data point and cluster is measured by BIC. The basic version of this algorithm is implemented in R package 'Varclust'. In my bachelor's thesis I am working on improvements of this package, aimed at enhancing its effectiveness and computational complexity. I also investigate its statistical properties, including consistency when the number of variables increases while the number of clusters and their dimensions remain constant. I will present results of the simulation studies comparing "Varclust" with other methods of subspace clustering, based on penalization in L norm. I will also compare these methods using publicly available gene expression data. YSS2 Young Statisticians Session 2 Thu :30-6:00 Lecture Hall KR 8 Chair(s): Andrea Berghold, Anke Hüls YSS2. Probabilistic longterm forecasts for model comparison and outbreak detection in infectious disease epidemiology Bracher J., Held L. University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zürich, Switzerland Two main purposes of statistical modeling in infectious disease epidemiology are forecasting and outbreak detection. A common data type are time series of weekly case counts from routine surveillance systems, often available stratified by geographical unit and age group. A well-developed framework for multivariate modeling of such data is the so-called hhh4 model (Meyer and Held, Biostatistics 206 kxw05) which is implemented in the R package "surveillance". We present some theoretical results on predictive and stationary distributions within this model class and demonstrate how they can be of practical use. Firstly, analytic formulas for multivariate path forecasts allow to apply predictive model choice criteria without the need for potentially instable simulation techniques. Secondly, longterm predictions and stationary distributions offer an interesting way to embed results from retrospective modeling into existing frameworks for outbreak detection (like the commonly used algorithm by Noufaily et al, Statist. Med. 203, ). Advantages of our approach include direct modeling of past outbreaks (rather than downweighting schemes) and straightforward generalization to multivariate outbreak detection. We exemplify our methods using data on norovirus in Berlin, Germany, available from the Robert Koch Institute. YSS2.2 Implementation of variable selection and multiple imputation in cluster analysis: an application to the German asthma registry Scheiner C., Binder H., Korn S. 2, Buhl R. 2, Jahn A. Institute for Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, Mainz, Germany, 2 Pulmonary Department, University Medical Center Mainz, Mainz, Germany This work was motivated by the German Severe Asthma Registry, where cross-sectional and longitudinal data of currently 628 patients from more than 25 centres are collected. Cluster analyses are to be performed on the cross-sectional data to identify different phenotypes of the disease. Next to the need of variable selection procedure, multiple imputation is required to prevent for loosing up to 85% of the patient data in a complete case analysis. We implement, extend and apply a method for combining variable selection with multiple imputation in K-means-cluster analysis that has been suggested by Basagana et al []. A decision criterion comparing the inner-cluster and between-cluster distances called ''CritCF'' is proposed for backward-selection of variables and for defining the final number of clusters. The asthma registry also comprises many categorical variables. Therefore, we further modify that approach by replacing the K-means-clustering algorithm with a K-medoids-algorithm. Thus distances other than the Euclidean are allowed. The comparison between imputed and observed data indicates reasonable imputation results. Although, the selection of variables strongly differs by the applied cluster algorithm (K-means and K-medoids), each of the algorithm identifies two phenotypes characterized by the severity of asthma symptoms. Overall, when comparing results with published cluster analyses for asthma disease, much less variables seem to be important for clustering than used in most publications. References: [] Basagana X, Barrera-Gomez J, Benet M, Anto JM, Garcia-Aymerich J. A Framework for Multiple Imputation in Cluster Analysis. In: American Journal of Epidemiologie 203; 77: YSS2.3 Biomarker analysis with Kalman filtering Rakovics M. Eötvös Loránd University, Department of Statistics, Budapest, Hungary The World Anti-Doping Agency is currently running the Athlete Biological Passport (ABP) programme to combat certain types of doping practices that cannot be screened for with analytical techniques. The ABP monitors selected biological variables over time and through statistical analysis, it intends to identify target athletes for specific testing. The current model used is a Bayesian network (BN) that aims to flag irregular results based on the previously collected data form the athlete. As is the case with all models, the ABP BN has certain weaknesses that can be exploited by dopers. To tackle one of these issues, focusing on the ABP's haematological module, I have upgraded the current model by adding a filtering step to the evaluation process. Based on a modified dynamic age-structure model of 4

117 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed haematopoiesis, a type of Kalman filter could be set up to produce a better estimation of the BN model parameters. As an added result, the upgraded model can be used to test the ABP's detection capabilities by the simulation of various blood doping regimes. YSS2.4 Comparing two partitions of non-equal sets of units Cugmas M., Ferligoj A. University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia Rand (97) proposed what has since become a well-known index for comparing two partitions obtained on the same set of units. The index takes a value on the interval between 0 and, where a higher value indicates more similar partitions. Sometimes, e.g. when the units are observed in two time periods, the splitting and merging of clusters should be considered differently, according to the operationalization of the stability of clusters. The Rand Index is symmetric in the sense that both the splitting and merging of clusters lower the value of the index. In such a non-symmetric case, one of the Wallace indexes (Wallace 983) can be used. Further, there are several cases when one wants to compare two partitions obtained on different sets of units, where the intersection of these sets of units is a non-empty set of units. In this instance, the new units and units which leave the clusters from the first partition can be considered as a factor lowering the value of the index. Therefore, a modified Rand index is presented. Because the splitting and merging of clusters have to be considered differently in some situations, an asymmetric modified Wallace Index is also proposed. For all presented indices, the correction for chance is described, which allows different values of a selected index to be compared. Poster Session P Poster & Wine Session Wed :00-20:00 Foyer P.02 Influence of informative priors in linear mixed models for the longitudinal analysis of quality of life data Koch R., Schumacher A. 2 University of Muenster, Institute of Biostatistics and Clinical Research, Münster, Germany, 2 University of Muenster, Department of Medicine A / Hematology and Oncology, Münster, Germany The course of clinical outcomes like the quality of life (QoL) over time is of increasing interest in clinical trials. Hence, longitudinal study designs that assess QoL are applied. Often results from former cross-sectional studies are available for specific single time points. This prior knowledge can be used in a Bayesian model approach for the analysis of the longitudinal data in a new trial. In a clinical example, the QoL of oncological patients undergoing allogeneic stem cell transplantation is measured at three time points: at beginning (T) and end of inpatient treatment (T2), and six months after discharge (T3). The EORTC QLQ-C30 is used to measure the QoL. Measurements of QoL in a similar patient collective at a time point comparable to T3 are available from a former crosssectional study. First, frequentist linear mixed models will be fitted using an unstructured residual covariance matrix with patient as subject to estimate the longitudinal change in the QoL. Missing values are regarded as missing at random. Further, Bayesian linear mixed models will be fitted using non-informative as well as informative priors based on the former study results. These results will be compared with the frequentist approach. If more historical trials contribute information to one model parameter, the meta-analytic predictive (MAP) prior or a mixture of conjugate priors can be used. In this case, the MAP approach cannot be applied because only one former trial exists. Therefore, the influence of the weighting factor for the historical informative prior knowledge will be analyzed. Additionally, the effect of the informative prior for one model parameter on the other model parameters, e.g. change between time points, will be investigated. P.03 The analysis of CNV regions in five different breeds of bulls Frąszczak M., Mielczarek M., Szyda J., Czerwik N., Minozzi G. 2, Schwarzenbacher H. 3, Diaz C. 4, Egger-Danner C. 3, Williams J. 2, Woolliams J. 5, Varona L. 4, Solberg T. 6, Rossoni A. 7, Seefried F. 8, Vicario D. 9, Giannico R. 2, Nicolazzi E. 2 Wroclaw University of Environmental and Life Sciences, Institute of Genetics, Wroclaw, Poland, 2 Fondazione Parco Tecnologico Padano, Lodi, Italy, 3 ZuchtData EDV-Dienstleistungen GmbH, Vienna, Austria, 4 Universidad de Zaragoza, Zagaroza, Spain, 5 Roslin BioCentre, Roslin, United Kingdom, 6 Norwegian University of Life Sciences, As, Norway, 7 Italian Brown Cattle Breeders' Association, Bussoleng, Italy, 8 Swiss Brown Cattle Breeders Federation, Zug, Switzerland, 9 Italian Simmental Cattle Breeders Association, Undine, Italy Copy number variations (CNVs), one of the most important sources of genetic diversity, are defined as gains (duplications) or losses (deletions) of long DNA fragments. In our study we focused on CNVs identified by two different algorithms such as a read depth (RD) and split read (SR). The material consist of whole genome DNA sequences determined for 32 bulls representing five breeds (Brown Swiss, Fleckvieh, Guernsey, Simmental, Norwegian Red). Alignment to the UMD3. reference genome was carried out using the BWA-MEM software and then CNVnator (RD) and Pindel (SR) software were used for CNV detection. CNVs identified by the CNVnator were treated as a baseline datasetwhich was then validated by identifying CNVs which overlapped by at least 70% with those detected by the SR approach. Genomic regions with CNVs detected for at least one bull were considered. If two or more CNVs overlapped at least in 50%, they were classified as the same CNV region. To determine how different animals are similar the Jaccard similarity coefficients, for each pair of bulls, were calculated. Based on the distance matrix the multidimensional scaling was performed. Nonparametric tests were applied to check whether individuals within a given group have more common CNV regions than others. Additionally, the CNV regions characteristic for particular breeds were determined. The total number of deletions identified by the CNVnator software was , while the number of duplications was After 5

118 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed validation the dataset of CNVs was reduced to 9766 deletions and duplications, what means retaining only 0.9% (2%) of previously detected deletions (duplications). The data were obtained within the project 7FP Gene2Farm while the processing of the raw data was performed at Poznan Supercomputing and Networking Center. P.04 Parallelisation in biostatistical analyses using Yin-Yang sampling Posekany A., Frühwirth-Schnatter S. 2 Danube University Krems, Department für Klinische Neurowissenschaften und Präventionsmedizin, Krems, Austria, 2 WU University of Economics and Business, Institute for Statistics and Mathematics, Vienna, Austria In recent years, Bayesian methods have become more widely applied in biomedical research and analyses. Due to their ability to integrate new findings with existing knowledge and using previous information from the literature or other studies the required number of patients for testing a treatment can be decreased compared to classical methods and Bayesian meothods can still provide reasonable results when far too few patients were included in the study compared to classical statistical inference. However, a drawback of Bayesian models for large studies analysing 0000 or patients from medical registries is its computational intensity requiring computer clusters with large memory and comptational hardware for parallelisation. For the purpose of performing such large calculations with only a standard desktop computer, we developed a methodology which allows to split the data into subsets, perform independent Bayesian inference on each subset and finally merge the results obtained from these analyses in order to arrive at a final merged result which is comparable to what one would have learned from inference on the complete data set. We termed the method Yin-Yang sampling which corrects for applying the same prior for each subset instead of using the prior information just once for the full data set. By appling yin-yang sampling steps sequentially or treewise, we recover the full sample s posterior from any given number of subsamples posteriors as long as reasonably large subsets are available containing enough information for sound inference results. For demonstration, an inference with logistic regression on a data set from the Austrian Stroke registry containing patients is shown. This provides a scenario where the full sample s inference is impossible on a desktop computer due to lack of memory, while subsample computation plus the Yin Yang merging algorithm require only minutes for computation. P.05 Lasso with splines in high-dimensional regression - a comparison with boosting Berres M., Büsch C.A. University of Applied Sciences Koblenz, Mathematics and Technology, Remagen, Germany While the lasso usually fits linear functions to a dataset, boosting allows fitting linear as well as non-linear relationships between predictors and response. Both methods can deal with high-dimensional data (p > n). The lasso procedure can be extended by replacing a predictor with a set of basic splines of the predictor. It can, hence, also model non-linear relationships in a similar manner as boosting does in fitting P-splines (Schmidt and Hothorn, 2008). Boosting with P-splines fits many parameters for each predictor (typically 20) and applies additional penalization. Basic splines, in contrast, have less parameters (4 or 5) and may allow interpretation of the sign or even the value of each parameter. The grouped lasso (Meier et al., 2008) can and should be applied, where the basic splines of each predictor form a group. We apply both methods in simulated high-dimensional data and compare them with respect to the mean squared error, the oracle property and the functional estimates. P.06 Statistical analysis of transportable NIR-spectroradiometer data Patus E., Bolgár B. 2, Kovacs T. 3, Drexler D. 3, Jung A. 4, Dinya E. Semmelweis University, Budapest, Hungary, 2 Budapest University of Technology and Economics, Budapest, Hungary, 3 Hungarian Research Institute of Organic Agriculture, Budapest, Hungary, 4 Szent Istvan University, Budapest, Hungary Due to the increasing awareness for healthy eating and a rapidly growing demand for superfoods, identification of oat cultivars with high levels of phytonutrients has also become increasingly important. Near-infrared (NIR) spectroscopy offers a cheaper alternative to standard wet chemistry analyses, however, the reliability of these methods is still not well understood. This study aims to assess the applicability of non-linear regression methods in predicting phytonutrient contents from "out of the lab" NIR measurements. 00 oat samples were obtained from different European countries and the seeds were measured for nutrient content using wet chemistry alongside transportable NIR spectroscopy. In particular, Trolox, TPC, beta-glucan and protein contents were measured. NIR spectra were obtained with ASD FieldSpec 4 Wide-Res mobile spectroradiometer device. We compare the predictive performance of Support Vector Regression, Gaussian Process Regression and Partial Least Squares Regression, preceded by Principle Component Analysis and Independent Component Analysis as preprocessing steps. We utilize root mean square error (RMSE), standard deviation ratio (SDR), residual predictive deviation-ration (RPD) and area under the ROC curve (AUC) to assess predictive performance. We conducted the analysis with R statistical software. We demonstrate that non-linear regression methods are promising tools for investigating the phytonutrient contents in oat samples and outline further research directions. P.07 Statistical methodology of design and analysis of quality control process for traditional chinese medicine Wu Y.-J., Lai Y.-H. 2, Huang B.-H. 3, Cheng L.-H., Hsiao C.-F. 2,4 Chung Yuan Christian University, Applied Mathematics, Taoyuan City, Taiwan, 2 National Health Research Institute, Division of Biostatistics and Bioinformatics,Institute of Population Health Sciences, Miaoli County, Taiwan, 3 National Taiwan University, Division of Biometry, Department of Agronomy, Taipei, Taiwan, 4 National Health Research Institute, Division of Clinical Trials Statistics, Miaoli County, Taiwan Raw materials for traditional Chinese medicine (TCM) are often from different resources and the final product may also be made by different sites. Consequently, variabilities from different resources such as site-to-site, or within site component-to-component, are expected. Therefore, test for consistency in raw materials, in-process materials, and/or final product has become an important issue in the quality control process in TCM research and development. In this paper, a statistical quality control (QC) process for raw materials and/or the final product of TCM is proposed based on a two sided b-content, g-confidence tolerance interval. More specifically, we construct the tolerance interval for a random effects model to assess the quality control of TCM products from different regions and different product batches. The products can be claimed to pass the QC process when the constructed tolerance interval is within the permitted range. Given the region and batch effects, sample sizes can also be calculated to ensure the specified measure of goodness. 6

119 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed An example is presented to illustrate the proposed approach. Keywords: Consistency, Tolerance interval, Random effects model P.08 Comparison of optimal design for constant-stress testing and step-stress testing with competing risks model Fan T.-H., Wang Y.-F. 2 National Central University, Graduate Institute of Statistics, Jhongli, Taiwan, 2 National Chung Cheng University, Department of Mathematics, Chiayi, Taiwan Accelerated life testing (ALT) is the process of testing products by subjecting them to strict conditions in order to observe more failure data in a short time period. In this study, we compare the methods of two-level constant-stress ALT (CSALT) and simple step-stress ALT (SSALT) based on competing risks of two or more failure modes with independent exponential lifetime distributions. Optimal sample size allocation during CSALT and the optimal stress change-time in SSALT are considered based on V-optimality and D- optimality, respectively. Under Type-I censoring, numerical results show that the optimal SSALT outperforms the optimal CSALT in a wide variety of settings. We theoretically also show that the optimal SSALT is better than the optimal CSALT under a set of conditions. A real example is analyzed to demonstrate the performance of the optimal plans for both ALTs. P.09 Adaptive randomization for balancing over continuous covariates in multi-arm trials with equal and unequal allocation Ono K., Iijima H., Sato N. Hokkaido University Hospital, Clinical Research and Medical Innovation Center, Sapporo, Japan In controlled clinical trials, ensuring balance in important covariates across treatment arms is considered essential for valid treatment comparisons and interpretation of the study. Pocock and Simon's minimization and its variants are commonly used for that purpose. However, these methods require categorization of continuous variables, therefore it may result in the loss of information. Although several adaptive randomization methods for balancing over continuous covariates without categorization have been proposed, none of these do not address the extension to multi-arm trials and issues arising from unequal allocation such as the loss of power due to the shift in re-randomization distribution. For the situation above, we extended the randomization method based on kernel densities proposed by Ma and Hu (203) using the fake-arm method proposed by Kuznetsova and Tymofyeyev (202). For kernel density estimation, normal kernel is considered and the bandwidth is calculated by Silverman's rule instead of Scott's rule, because highly skewed covariates may appear in clinical trials. We used an imbalance measure based on range. For the assessment of our proposed approach, we evaluated balancing properties using Kolmogorov-Smirnov distance and its percentiles. We further examined alpha error and power for continuous, binary and time-to-event outcomes, using permutation test, general linear model, logistic regression model and Cox regression model, respectively. The comparison with complete randomization, permuted block randomization and Pocock and Simon's minimization are made by simulation study. The Simulation results will be shown at the presentation. P.0 Multiresponse models for radiactivity retention in the human body Rodriguez-Diaz J.M., Sanchez-Leon G. 2 University of Salamanca, Statistics, Salamanca, Spain, 2 ENUSA Industrias Avanzadas S.A. - USAL, Salamanca, Spain Workers exposed to the incorporation of radioactive substances are routinely checked by bioassays (isotopic activity excreted via urine, measurements of isotopic activity retained in the whole body or in lungs, etc.). From these results, the quantity incorporated by the worker is inferred, using the 'retention function' or 'impulse function'. The form of this function depend on the several factors (namely the type of bioassay, the way of incorporation, the chemical form of the isotope, which defines the type of metabolism, and, for the case of inhalation, the aero dynamical size of the particles), and it is usually expressed as the solution of a system of differential equations, coming from the compartmental model which describes the human body (of a part of it). The possibility of using different types of bioassays allows for a better estimation of the parameters that characterize the solution of the system of equations. However, in some cases one of the types of bioassays turns out to be not very much informative, and in fact just provides an upper bound of the retention. Some examples will be solved, showing different behaviours. P. Bayesian adaptive randomization design by jointly modeling efficacy and toxicity as time-toevent outcomes Chang Y.-M. Tunghai Univerity, Department of Statistics, Taichung, Taiwan In cancer study, the evidence of drug efficacy, such as tumor shrinkage, is typically observable after a relatively long period of time. In contrast, toxicity is often modelled as binary outcome. Such method ignores information on when the toxicity event occurs. In a phase II clinical trial design, we proposed a Bayesian adaptive randomization procedure that accounts for both efficacy and toxicity as time-toevent outcomes. The dependence between the bivariate outcomes is induced by sharing common random effects between two models. Moreover, we allow the randomization probability to depend on patients's specific covariates. Early stopping boundaries are constructed for toxicity and futility, and a superior treatment arm is recommended at the end of trial. The simulation study is conducted to investigate the performance of the proposed method. P.2 Online surveys in clinical studies: going beyond medical borders Borzikowsky C. Institute of Medical Informatics and Statistics, Kiel, Germany Biometrical methods mainly focus on experiments and innovative test designs in order to find the best and effective therapy for patients. However, variables like acceptability of new treatment methods, practicability of home treatments, compliance to ethical guidelines, and patients' commitment regarding their psychological well-being are also important factors that should be taken into account when introducing a new therapy (i.e., especially in early phase clinical trials). In this presentation, we highlight the advantages of online surveys for clinical studies. In contrast to classical paper-and-pencil questionnaires at hospitals, the Internet and mobile devices (i.e., smartphones and tablets) enable patients to rate their feelings and 7

120 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed emotions towards a new therapy in real-life settings and without delay. Collected data are saved electronically and stored into a central database with respect to data privacy. After database lock, collected data could be easily exported in different file formats (e.g., SPSS files). In the second part of our presentation, we introduce the online survey software Evasys Survey Automation Suite to the audience and demonstrate the typical workflow of online survey studies. Particularly, we discuss the creation of questionnaires, online and offline recruitment, survey administration, data storage, and data export. In summary, online surveys could be a fruitful extension in early phase clinical trials and should be implemented in future research. P.3 One-stage individual participant data meta-analysis of continuous outcomes: comparison of approaches to account for clustering of participants within studies Legha A., Burke D., Ensor J., Riley R., Jackson D. 2 Keele University, Research Institute for Primary Care & Health Sciences, Stoke-on-Trent, United Kingdom, 2 University of Cambridge, MRC Biostatistics Unit, Cambridge, United Kingdom Individual participant data (IPD) meta-analysis involves obtaining and then synthesizing raw data from individual studies to produce summary results that inform clinical decision making. The IPD approach is increasingly popular, and has many advantages over a traditional meta-analysis of published aggregate data, such as increased power to detect treatment-covariate interactions and avoiding reliance on published results. IPD meta-analysis is often conducted using a one-stage model, which simultaneously synthesises all the IPD in a single step whilst accounting for the clustering of participants within studies. However, there are two competing approaches to account for clustering: use a separate intercept term per study (stratification by study), or place a random study effect on the intercept term. Both have potential disadvantages. When stratifying by trial, the number of parameters to estimate increases, as there is one intercept per study. The random intercept approach addresses this, but only by making a potentially strong assumption that intercepts are drawn from a normal distribution with a common mean and variance. In this presentation, we will compare these two approaches through an extensive simulation study, with regard to an IPD meta-analysis of randomised trials with a continuous outcome to summarise a treatment effect. The mean bias, mean standard error, mean square error, and mean coverage of the treatment effect estimate will be compared for each of the two clustering approaches, across a range of scenarios and estimation methods. Extension to binary outcomes will be discussed, and recommendations made for those conducting IPD meta-analyses. P.4 Remote monitoring of implantable cardiac devices: limited validity due to selective reporting Eyding D., Janatzek S., Gronemeyer S., Eikermann M. MDS e.v., Essen, Germany OBJECTIVE: To determine the patient-relevant benefit of remote monitoring (RM) of implantable cardiac devices versus usual care. DESIGN: Systematic review and meta-analysis. Data sources: Medline, Embase, and Central without time restrictions, the WHO ICTRP, and Clinicaltrials.gov. Eligibility criteria: Randomised controlled trials (RCTs) of RM versus usual care in patients with an implanted cardiac device reporting at least one patient-relevant outcome. RESULTS: In our search (Feb. 207) we identified 8 published trials, thereof 4 on pacemakers (PMs) and 4 on implantable cardioverter defibrillators with or without resynchronisation function (ICD/CRT-Ds). Additionally, we identified 3 (PM) and 0 (ICD/CRT- D) completed but unpublished trials (trials that were unpublished >2 months after primary completion). Hence, 43% of the PM and 42% of the ICD/CRT-D trials are unpublished. All completed but unpublished trials were from EU countries. No benefit was found for RM in the PM trials. Regarding ICD/CRT-Ds the meta-analysis did not show a mortality benefit of RM versus usual care (RR=0.96, 95%-CI [0.82;.4]). The barely significant (p=0.03) meta-analysis for major adverse events with reported data (from /4 published trials) favoured RM. This result was not robust and turned insignificant if missing data from published or unpublished trials were imputed. Depending on the method of imputation of unreported/unpublished data non-inferiority may be questionable. CONCLUSION: Due to a substantial amount of unpublished trials from the EU valid conclusions on the value of RM in these high-risk cardiac devices for severely ill patients are impossible. Mandatory publication of medical device trial results should be introduced also in the EU. P.5 On the importance on proper model selection when conducting a meta-analysis on rare clinical events Andersson M. AstraZeneca R&D, Bioinformatics & Information Sciences, Mölndal, Sweden Meta-analysis is an established and important statistical tool since more than 50 years that following the development of Internet data and trial libraries like Trial Throve, Embase and PubMed has become extremely popular for easily providing evidence for clinical efficacy or safety of new interventions versus standard of care. A PubMed search on March 23, 207 on "meta-analysis" and "systematic review" produced over hits with over 7000 of them from 206, as compared to 65 indexed meta-analyses in the year Numerous statistical methods and programs have been developed for conducting meta-analysis in an automated fashion. Compared to the efforts of identifying representative studies to include is often very limited effort spent on prospectively selecting a proper metaanalysis model. A simulation study based on a recent Cochrane review() investigated statistical properties of 0 commonly used meta-analysis models, comparing power, risk of type I error, precision and unbiasedness in presence and absence of heterogeneity, as well as robustness in results to new data made available after the review was published. Scenarios with a homogenous effect, no true effect, a more obvious and a less obvious source of heterogeneity were investigated. This simulation study show that there is variability between methods, both in power, precision in estimate as well as risk of type I error and robustness to new data available. Hence, a recommendation on which models to use and which to potentially avoid when evaluating rare clinical events with sampling zeroes and potential heterogeneity will be given. () Salazar CA, Malaga G, Malasquez G. Direct thrombin inhibitors versus vitamin K antagonists or low molecular weight heparins for prevention of venous thromboembolism following total hip or knee replacement. Cochrane Database of Systematic Reviews 200, Issue 4. Art. No.: CD DOI: 0.002/ CD00598.pub2. 8

121 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.6 The design of non-inferiority trial using Network Meta-Analysis to assess the assay sensitivity Hida E., Tango T. 2 Hiroshima University Hospital, Hiroshima, Japan, 2 Center for Medical Statistics, Tokyo, Japan There are two well-known two problems in non-inferiority trial to show that a test treatment is not inferior to an active reference treatment; one is a setting of the non-inferiority margin, another one is an assessment of the assay sensitivity. Therefor, three-arm noninferiority trial, including a placebo is strongly recommended by some guidelines for assessing assay sensitivity (ICH-E0(2000), EMA guidline(2005), FDA guidance(206)). It is possible to evaluate the internal validation by including a placebo treatment in the three-arm trials. On the other hand, from the background of conducting the non-inferiority trial, there are the ethical problems of daring to add a placebo treatment. And since the test treatment can be directly compared with placebo, the need to prove the non-inferiority of the test treatment to the active reference as the main objective becomes poor. For these reasons, sufficient discussions are still needed for application of three-arm non-inferiority trial to the real problems. To resolve these problems, we develop a method to assess assay sensitivity by using the Network-Meta-Analysis approach in two-arm non-inferiority trials. That is, to evaluate the substantial superiority of the active reference to placebo as the historical evidence of sensitivity to drug effects, we propose a procedure for simultaneous comparison of evidence from both current non-inferiority trial and multiple comparative trials in the same therapeutic area. P.7 Estimation of health related quality of life in presence of missing values in EQ-5D Shafiq M., Atif M. 2, Zaman Q. 3 Kohat University of Sciecne and Technology, Institute of Numerical Sciences, Kohat, Pakistan, 2 University of Peshawar, Department of Statistics, Pehawar, Pakistan, 3 University of Peshawar, Department of Statistics, Peshawar, Pakistan One of the significant problems in the analysis of clinical and observational studies is missing data and nonresponse from patients. Turning a blind eye to the missing behavior may provide biased results with overestimated standard errors. The potential impact of the problem may even have more sever impression in estimating Health Related Quality of Life index. This index is an important indicator, widely used in clinical trials for assessing effectiveness of available medical interventions. Amongst many measures available for estimation of the index, most rising approach is EQ-5D preference based health classifier. This paper suggest a probability based heuristic algorithm for imputation of missing values in EQ-5D health classifier to overcome the said problem. The proposed algorithm not only provide unbiased coefficients but are more efficient as well. P.8 The impact of missspecified random effects on logistic-mixed-models Freiwald E.,2 University Medical Center Hamburg-Eppendorf, Institute of Medical Biometry and Epidemiology, Hamburg, Germany, 2 Competence Center for Clinical Trials, University of Bremen, Bremen, Germany Mixed models are a specific class of statistical models that contain both fixed and random effects. They are commonly applied in the context of hierarchically structured data or in the situation of repeated measurements over time. Thereby, an additive random effect is assumed to be normally distributed with expectation 0. In this work, we focus on a binary logistic with a fixed group effect and a random effect for the study center. Several authors have investigated the impact of this normal assumption and the consequences of deviations from this assumption by implementing random effects following diverse others forms of distributions with expectation 0. They found that the performance of the random effect model was reasonable robust under these missspecified random effects. However, so far, no systematic shift in the expectation of the underlying random effect distribution was investigated. In this master's thesis, we investigated normally distributed random effects with expected values different from 0 and random effects, which didn t follow a normal distribution. Those distributions were created by manipulation the generating process of the random effect by placing a variance-inhomogeneity, using fixed and individual allocation probabilities or a mixture of both techniques within the random effects. A simulation study in the statistical software SAS 9.4 (SAS Institute 206, Cary, NC) was performed to evaluate the impact of these model deviation. Our results confirm that the mixed model is reasonably robust even in case of expectations shifts in the random effect. Only if very extreme shifts were simulated, the power of the test for group differences is shrunken. However, whenever such extreme effects between the study centers exist, this is easy to be detected in advance by considering center specific effects. In conclusion, logistic mixed models therefore are very robust against many kind of missspecifications of the random effect distribution. P.9 MRCTs and the drop-min data analysis Lan K.K.G., Chen F., Li G., Sotto C. 2 Janssen Pharmaceutica, R&D, Raritan, United States, 2 Janssen Pharmaceutica, R&D, Beerse, Belgium In recent years, developing pharmaceutical products via a multiregional clinical trial (MRCT) has become more popular. Many MRCT studies in the literature assume a common treatment effect across multiple regions. However, heterogeneity among regions is often the more appropriate assumption, and as such the fixed effects model for combining regional information may not be appropriate for MRCT. In this presentation, we will discuss:. The use of the fixed effects model, the continuous random effects model, and the discrete random effects model for the design and data analysis of MRCTs; numerical examples will be provided to illustrate the fundamental differences among these three models. 2. Consistency and inconsistency: We will provide examples of inconsistency, and discuss the use of drop-the-min data analysis when the region with minimum treatment effect is excluded from the MRCT data analysis. We first provide a solution formulated within the fixed effects framework, and then extend it to discrete random effects model. P.20 Modelling anti-rhd titers in plasma donors de Vos A., van der Schoot E., Rizopoulos D. 2, Janssen M.,3 Sanquin Blood Supply Foundation, Amsterdam, Netherlands, 2 Erasmus University Medical Center, Rotterdam, Netherlands, 3 University Medical Center Utrecht, Utrecht, Netherlands Anti-RhD immunised donors provide anti-rhd immunoglobulins that are used for the prevention of rhesus disease during pregnancy. These donors are periodically hyper-immunised (boostered) to retain a high titer level of anti-rhd. We analysed anti-rhd donor records from 998 to 206, consisting of 30,6 anti-rhd titers from 755 donors, encompassing 3,372 booster events. Various models were fit 9

122 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed to these data to describe the anti-rhd titers over time. A random effects model with a log-linear anti-rhd titer decline over time and a saturating titer response to boostering fits the data well. This model contains two parameters characterizing the individual donor and two general model parameters. The average individual log2 decline is 0.55 per year, i.e. a 32% decline in absolute titer, for half the donors the decline in titer is between 3% and 4% per year. The anti-rhd titer peaks around 26 days following a booster event. Boostering response reduces with higher titers at boostering; at median titer (log2 ) the mean increase per booster is log2 0.38, i.e. from an absolute titer of 2048 to 2665 (+30%), with half of all donors increasing between 6% and 65%. The model describes anti-rhd titer change per individual with only four parameters, two of which are donor specific. This information can be used to enhance the donor immunisation programme, by deriving individualized immunization policies in which boostering is adjusted to the anticipated anti-rhd decline, effectiveness of boostering and titer levels required. P.2 Modeling of multilevel dental data - comparison of theory and practice Adolf D., Jünemann R., Keller T., Lorenz K. 2, Noack B. 2 StatConsult GmbH, Magdeburg, Germany, 2 Universitätsklinikum Carl Gustav Carus der TU Dresden, Poliklinik für Parodontologie, Dresden, Germany In dental research, there is a multilevel data structure based on investigations not only on patients but on single teeth, even multiple sites per tooth and successive time points. Ordinarily, such data should be addressed with appropriate statistical methods covering the structure in terms of spatial and temporal dependencies. But usually, dental analyses are performed by using patient-wise mean values in simple statistical methods. We compare two modeling approaches for evaluating the effect of an experimental toothpaste on the reduction of dental plaque in a single-center, clinically controlled, randomized, double blind study with a two-arm parallel group design: The first one is the simple t-test between experimental toothpaste and negative control group. Here, a patient-wise mean change to baseline is analyzed, which is usually done in practice. The second approach is a three-level random intercept model using all sites per tooth at all teeth available. We present and discuss the differences in effect estimates and also refer to further hierarchical linear modeling accounting for baseline values and all time points. P.22 Using phylogenetic information for the analysis of microbial abundance data Antweiler K., Anusic T., Kropf S. Otto-von-Guericke-Universität Magdeburg, Institut für Biometrie und Medizinische Informatik, Magdeburg, Germany With modern sequencing techniques for microbial materials, the sequences are usually used to cluster sufficiently similar sequences to operational taxonomic units (OTUs) and then to detect the relative frequencies (abundancies) of these OTUs. But these sequential data also deliver information about the phylogenetic distances of OTUs that should not be wasted. It was the aim of our research project to study how the phylogenetic relationships can be used to improve the analyses of abundance data. If there were direct relations between the phylogenetic distance of two OTUs and the correlation in their abundances then these could be utilized to improve the power of multiple test procedures or multivariate tests. It seems, however, that such relations are not simply structured, so that the possibilities to improve the power are restricted. A second more realistic objective is to improve the relevance/interpretability of the results from such high-dimensional data. As an example, we applied a cluster based procedure similar to a proposal by Meinshausen in One can use an own hierarchical clustering of the phylogenetic distance data or use the already established taxonomic classification system. The hierarchical procedure tests all clusters from the root of a (sub)tree stepwise down to its leafs while keeping the probability for a failure of the first kind bounded by α. Significances are thus reliable and also biologically meaningful. To facilitate the communication of the test results further, we transferred the idea of statistical parametric mapping (commonly used in functional imaging) to the field of microbiome analysis. P.23 AccuFusion: a novel statistical algorithm for detecting gene fusions in RNA-Seq data Su X., Yao H., Yuan Y. 2 MD Anderson Cancer Center, Bioinformatics and Computational Biology, Houston, United States, 2 MD Anderson Cancer Center, Biostatistics, Houston, United States The advent of next-generation sequencing technologies allows for the detection of gene fusions at unprecedented efficiency. However, the naive approaches often produce an unmanageable number of false predictions due to several confounding factors such as nonspecific alignments. Here, we describe a multistep algorithm, AccuFusion that deals with the confounding factors in gene fusion discovery in RNA-seq data. A given paired-end read alignment is first quantified in terms of the genomic location (L) of the aligned read pair, the distance (D) between the aligned read pair of the fragment (insert) and the orientation (O) of the read pair. The specific pattern in (L, D, O) space is used as a constraint to define the discordant read pair. This algorithm begins by detecting and clustering discordant read pairs that support the same fusion event (e.g. BCR-ABL) and selects the discordant read clusters as fusion candidates. Next, a greedy strategy is applied to define the boundaries of discordant read clusters to address the existence of multiple fusion products with different fusion junctions in the same fusion partners. Specifically, the boundary for each discordant read cluster is estimated on the basis of discordant read mapping locations and orientations with fragment length as a constraint of cluster size. Meanwhile, an in-silico fragment length for each read pair within the discordant read clusters is calculated against the read cluster boundary, and any discordant read pair whose in-silico fragment length falls outside three standard deviations in the fragment size distribution is discarded. An in-silico sequence generated by using the consensus of reads within discordant read clusters for each fusion candidate is used to detect breakpointspanning reads. Those steps and other filtering metrics are used to reduce the false positive fusion candidates. In summary, AccuFusion has a high sensitivity and specificity for detecting real fusion events. P.24 Adjusting for covariates in random forest analyses of methylation data sets Vens M., Szymczak S. 2 Universitätsklinikum Hamburg-Eppendorf, Institut für Medizinische Biometrie und Epidemiologie, Hamburg, Germany, 2 Christian- Albrechts-Universität zu Kiel, Institut für Medizinische Informatik und Statistik, Kiel, Germany Machine learning methods and particularly random forests (RFs) are popular approaches for prediction based on omics data sets such as methylation studies. One challenge is that cell type heterogeneity can confound relationships between methylation levels and outcomes. However, it is unclear how to adjust for those confounders in RF analyses in order to find a good prediction model and to identify variables which are relevant for the outcome. In simulation studies, the following approaches are evaluated: 20

123 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed. Include confounders as predictor variables without any weighting 2. Include confounders as predictor variables with different weightings (e.g. always included for splitting selection) 3. Three stage approach based on residuals Prediction performance as well as sensitivity and false discovery rate of selected variables are used as evaluation criteria. We illustrate the application of all approaches using an experimental methylation data set where cell type proportions are available. P.25 Applications of multidimensional time model for probability cumulative function to biopharmaceutical statistics Fundator M. National Academy of Sciences, Division of Behavioral and Social Sciences and Education, Brooklyn, United States In time of focus on precision medicine, and new stage in drug developments, this is a perfect reason for applications of new method based on changes of Cumulative Distribution Function in relation to time change in sampling patterns, in which Multidimensional Time Model for Probability Cumulative Function can be reduced to finite-dimensional time model, which can be characterized by Boolean algebra for operations over events and their probabilities and index set for reduction of infinite dimensional time model to finite number of dimensions of time model considering the fractal-dimensional time that is arising from alike supersymmetrical properties of probability, in the directions of multidimensional data analysis, modeling, and simulation. The new method that is based on properties of Brownian motion, philosophically based on Erdos- Renyi Law for the prediction and philosophically intended to reach high level of precision. Achieving the goal of precision medicine new method is very important after the challenges posed by both, the availability of big data and complex data structures, including missing and sparse data, and complex dependence structures, such as various DNA analyses, considering, that there are more than 80 million genetic variations are currently counted in the human genome, is changing the scope of analysis of statistical and medical experts from academia, industry and government. These applications could be further extended in view of decision-theoretical approach to risk assessment in evaluation of large amounts of clinical data and analysis for the support of approval of drugs and devices. P.26 Chances and restrictions of unplanned interim analyses Eveslage M., Ligges S. 2, Ehrhardt S. 3, Faldum A. University of Münster, Institute of Biostatistics and Clinical Research, Münster, Germany, 2 Bayer AG, Wuppertal, Germany, 3 Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Baltimore, United States Adaptive designs enable researchers to perform confirmatory interim analyses in clinical trials with the possibility of design adaptations that may increases the chance of efficient study progress. The method of Müller and Schäfer [Statist Med 23: , 2004] additionally allows for unplanned interim analyses. Therefore, it is an important tool to react to unforeseen events provoking unplanned interim analyses without diminishing the validity and integrity of the trial. Several limitations can occur and it has to be assured that the execution of the interim analysis is appropriate. Chances and restrictions occurring during an unplanned interim analysis are discussed using the example of the SacBo study. The SacBo study [Ehrhardt et al., OFID 3(), 206] is a randomized, double-blind, placebo-controlled phase III trial that assesses the preventive potential of saccharomyces boulardii given additionally at start of antibiotic treatment in order to prevent the frequent side effect of antibiotic-associated diarrhea. The study was planned as a single stage trial using the Cochran-Mantel-Haenszel-χ² test for primary analysis. An unplanned interim analysis was performed due to delayed recruitment. Therefore, the conditional rejection error probability (crep) according to the method of Müller and Schäfer was evaluated. Challenges in calculating the crep which depends on an unknown nuisance parameter will be presented. Results of a simulation study assessing the conditional power are shown. Based on this example, the use of unplanned interim analyses and its limitations will be reviewed. P.27 An F approximation to the distribution of T 2 type test statistic with two-step monotone missing data Kawasaki T., Seo T. Tokyo University of Science, Tokyo, Japan In almost all statistical analyses, missing data is a constantly occurring problem. Many statistical methods have been developed to analyze data with missing values. Additionally, the monotone missing data have been widely studied in the past. For two-step monotone missing data, Kawasaki and Seo (206, Statistics) derived the asymptotic expansion of test statistic for the case where the sample size is large. The asymptotic first two moments are obtained using stochastic expansion, and proposed the Bartlett and modified Bartlett corrected statistics. In this study, we consider the problem of testing for the mean vector when the data has two-step monotone pattern missing observations. The aim of this study is to propose simple and convenient approximation with two-step monotone missing data. In particular, we approximate the distribution for the Hotelling s T 2 type test statistics by a constant times an F-distribution by adjusting the degrees of freedom. The method of adjusting the degree of freedom is estimated unknown parameters of the first and second degrees of freedom of the F distribution using the asymptotic distribution of Hotelling s T 2 type test statistic. The accuracy of approximation to the upper percentiles of Hotelling s T 2 type test statistic is investigated by Monte Carlo simulations for some selected parameters. A numerical example is also given. P.28 Modeling and application on non-destructive and destructive degradation tests Tsai C.-C. Tamkang University, Department of Mathematics, New Taipei, Taiwan Nowadays, for highly reliable products, it is difficult or impossible to obtain failure data of the products within a reasonable period of time prior to product release. Degradation tests are one way to overcome the obstacle by collecting the degradation data of the products. Degradation tests involve the measurement of the degradation of the products. By different measurement processes, degradation tests can be divided into non-destructive and destructive degradation tests. This chapter addressed a number of the two type degradation models that have been developed to describe the degradation paths of the products. And, some applications of degradation models on the two classes will also be discussed. 2

124 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.29 Comparison of different methods for fitting linear mixed-effect models to right-skewed data Hayoz S., Bigler M., Klingbiel D., Swiss Group for Clinical Cancer Research (SAKK) Swiss Group for Clinical Cancer Research (SAKK), Statistics, Bern, Switzerland In a surgical trial in which the surgeon was a stratification factor and was expected to have an important influence on the outcome, it was planned to analyze the data using a linear mixed-effects model. However, preliminary data analysis revealed that the data was not normal distributed but right-skewed with outliers. Several robust methods were applied to the data and yielded differing results. To assess which method should be applied to the final data, simulations were performed to compare bias, alpha level and power of the different methods. The results of the simulations, which were performed using R, will be presented at the conference. Unbalanced and balanced designs with underlying normal distribution as well as several right-skewed distributions were simulated and analyzed using linear mixed-effect models (R package lme4), robust linear mixed-effects models (R package robustlmm), linear quantile mixed-effects models (R packages lqmm and rqpd) and linear mixed effect models considering heavy-tailed errors (R package heavy). The robust linear mixed-effects model held the alpha level and power well and had a low bias in all the considered scenarios, whereas the other methods were biased or did not hold the alpha level well in some scenarios. In conclusion, we recommend using the robust linear mixed-effects model implemented in the R package robustlmm when dealing with right-skewed data. P.30 Providing an R-package for sampling data of cluster randomized trail to show the impact of implementation errors on effect estimation within stepped wedge design studies Trutschel D.,2, Reuther S., Verde P.E. 3 Deutsches Zentrum fuer Neurodegenerative Erkrankungen, Witten, Germany, 2 Martin-Luther-Universität Halle/Wittenberg, Halle/Saale, Germany, 3 Heinrich Heine University, Coordination Centre for Clinical Trials, Duesseldorf, Germany A simulation can be a tool to explore methodological challenges of common study designs or data analysing methods. Here, we provide an R-package for sampling multidimensional distributed data within cluster randomized trails, for stepped wedge design (SWD) trails as well as cross-over and parallel designs. The SWD is an alternative study designs when a simple parallel design is not useful or not be feasible. This design is relative new and for health care researchers in practice several methodological pitfalls are possible. The aim is to give an orientation before beginning a study to determine how sensitive their study is against common scenarios in research practice. A simulation experiment is performed investigating three factors: the intervention reach not the 00% assumed implementation, the number of missing clusters and the time point at which clustersget lost. The data within an (cross-sectional as well as longitudinal) SWD trial including the deviations from the assumed perfect situation were sampled using the R-package. Then the followed effect estimation were realized using a linear mixed-effects model. The results of the simulation study show that the SWD was not robust against a lack of implementation, identifying that a delay in implementation had the greatest influence on the estimates. The variance of the effect estimates increased with the number of lost clusters, where the time-point of cluster loss had only a marginal influence. The provided R-package is usefull to sampling data within such studies and the simulation can be adapted for other settings. P.3 Factors effecting the diagnostic performance of longitudinal biomarkers: a simulation study Konar N.M., Karaismailoglu E. 2, Karaagaoglu E. Hacettepe University, Biostatistics Department, Ankara, Turkey, 2 Kastamonu University, Biostatistics Department, Kastamonu, Turkey In health studies, often the relationship between repeated biomarker measurements and event (death, remission, rehospitalization ) is analyzed. In recent years, joint modeling approach is being utilized more frequently in different fields including cancer research and cardiovascular studies when it is of interest to analyze longitudinal (repeated biomarker measurements) and survival data (event) simultaneously. This approach can also be used to evaluate the diagnostic performance of serial biomarker measurements via timedependent AUC values. The aim of this study is to analyze the factors effecting the performance of time-dependent ROC Curves for longitudinal biomarkers. Therefore, a simulation study is conducted to analyze the effect of number of repeated biomarker measurements, censoring rate and sample size on diagnostic accuracy of longitudinal biomarker in terms of time-dependent AUC (td- AUC) values. Three different sample sizes (n=200, 500, 000), three different number of longitudinal measurements (k=3, 5, 7) and nine different censoring rate (c=0. to 0.9) scenarios were included in this simulation study. 000 bootstrap replicates are used and bootstrapped td-auc values along with 95% bootstrapped confidence intervals are given as results. For simplicity, only the effect of quadratic random effects and standard joint model on diagnostic performance is examined. Evaluating the effects of random slope only and random slope and intercept models along with evaluating the effects of different parameterization types(slope, current value+slope, cumulative effects) are considered as possible future work. P.32 Testing the robustness of negative binomial distribution on real datasets of specific therapeutic areas and on simulated data Gombos T., Singer J. Accelsiors CRO, Budapest, Hungary In controlled clinical trials, outcome variables are often count data, like number of symptoms or special events. These datasets often have a skewed distribution with an overdispersion and a high percent of zeros. The last decades brought the growing availability of parametric models for non-normally distributed data in standard statistical packages, including Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial models. The aim of this study was to assess the utility of negative binomial models through real examples and simulated results when the assumptions are not fully met (i.e. excess of zeros is added), and to identify therapeutic areas and study endpoints in medical research for which adding a zero-inflation parameter to the probability distribution has a true benefit. The fitting of the above mentioned different distributions was tested on two datasets with real data (740 patients with malignant soliter tumor treated with chemotherapy and granulocyte-colony stimulating factor, and 7 patients with hereditary angioedema). Model fit was compared through the Bayesian information criterion and using the likelihood-ratio test generalized by Voung for non-nested models. Simulated datasets were created as sampled from a mixture of a point mass at zero and a negative binomial distribution, and the performance of the three models was studied estimating the Type I and type II error rates for the comparison of two treatment arms. The robustness of the negative binomial model was concluded. 22

125 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.33 Estimation and comparison of adjusted intra-class-correlations Sehner S., Rauch G., Wegscheider K. Universitätsklinikum Hamburg-Eppendorf, Institut für Medizinische Biometrie und Epidemiologie, Hamburg, Germany To evaluate the agreement between two interval-scaled measurements the intra-class-correlation (ICC) is commonly applied. The ICC can be calculated using linear mixed models. Generally, the magnitude of measurement agreement often not only depends on the measuring instrument but also on other aspects such as the study setting or the study population under consideration. In clinical applications, the ICC may depend on covariates like patient characteristics, rater characteristics or setting. The usual approach is either to ignore this information and to calculate one common ICC or, if the covariate is discrete, to perform a separate analysis for each stratum. Whereas the first approach may result in a biased ICC estimation, the second approach does not allow a direct comparison of ICCs and goes along with a loss of information. We propose a new approach for unbiased estimation of the ICCs for different strata simultaneously in one linear mixed model. Thereby, all given information is used and a comparison of the stratawise ICCs by maximum-likelihood tests is possible. The approach is fairly general as it allows estimating ICCs for e.g. the retest-reliability and the intra-rater-reliability jointly within one single model. P.34 Assessing quality of life using structural equation modeling Dardenne N., Pétré B., Husson E., Guillaume M., Donneau A.-F. University of Liege, Department of Public Health, Liege, Belgium In many studies, quality of life (QOL) of a patient is calculated as the (weighted) sum of items without assessment of the relationships between items and the derived latent QOL. Then, multiple regressions analysis are usually applied to evaluate the effects of various factors on the latent QOL. The aim of this study was to describe how structural equation modeling should be applied to analyse appropriately such common QOL issue. As an illustration, those methods were applied to data from 455 subjects participated in 202 in a community based sample study in the French speaking part of Belgium. Volunteer participants were invited to complete a web-based questionnaire on their weight-related experience. The latent QOL was derived and direct and indirect effects of body mass index (BMI), body image discrepancy (BID), latent socio-economic (SOCIO) and latent subjective-norm (SN) variables were tested. Modeling was performed using the weighted least squares means and variance (WLSMV) estimator due to the presence of ordinal endogenous variables. The fit of models was analysed by χ² test, root mean square error of approximation (RMSEA), comparative fit index (CFI), standardized root mean square residual (SRMR) and Tucker-Lewis index (TLI). Results showed that physical dimension of QOL could be measured by 7 ordinal items and psychosocial dimension by 6 ordinal items (CFI = 0.98; TLI = 0.97; RMSEA = 0.05; SRMR = 0.050). Significant direct and indirect effects on each dimension of QOL were found for BMI and SOCIO, significant direct effects for BID and SN (p < 0.000). P.35 Analysis of differential item functioning in an international survey on medical students regarding their motivation to become a general practitioner Avian A., Poggenburg S. 2, Berghold A. Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria, 2 Medical University of Graz, Institute of General Practice and Health Services Research, Graz, Austria General practitioners (GP) have an important gatekeeping role in authorizing access to specialty care, hospital care, and diagnostic tests, and therefore ensure an efficient health care. Despite this important role other medical careers seem to be more attractive for medical students. Reasons why medical students choose the career of a GP are rarely investigated. Using an online questionnaire German students (9 universities, n=2299) and Austrian students (4 universities, n=685) were interviewed regarding their perception of a GP's work and reasons for or not becoming a GP (response levels: 5). Due to differences in the structure of the medical curriculum between these two countries and organizational differences in the daily work of a GP some items use different wording. Therefore we investigated whether differential item functioning (DIF) (i.e. different item difficulties and/or different discrimination parameters) can be observed. German students and Austrian students were matched using propensity score matching resulting in 035 students analyzed in each groups. Using the graded response model a four factor solution resulted in the best model fit. Five uniform DIFs could be detected (likelihood ratio tests: p <.0). No non-uniform DIF could be found. Maximum observed individual difference of initial and purified scores was 0.. Median ( st quartile - 3 rd quartile) absolute differences of initial and purified scores were 0.0 ( ), 0.02 ( ), 0.03 ( ) for the first, third and fourth factor respectively. DIF were found in up to five items. Although these DIFs resulted in only minor changes of student's scores, these DIFs should be considered when interpreting the student's perception of a GP's work and reasons for or not becoming a GP. P.36 Balance behavior of Wei's urn model for more than two treatment arms and unequal allocation rates - a simulation study Ofner-Kopeinig P., Errath M., Berghold A. Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria Unequal allocation rates are receiving more attention in clinical trials. They are motivated by ethical, efficiency or feasibility reasons. In clinical trials with small sample sizes Wei's urn model (UD(w,a.b)) leads to rather balanced groups with low predictability of the next treatment assignment. The generalized Urn Design is a generalization of Wei's Urn Design (Wei, Lachin 988) by Kundt (Kundt 2002). This urn model consists of w (initial urn), a (balls to add of drawn color / therapy) and b (balls to add of opposite color(s) / therapies). In the two group case the urn starts with w balls of each color as an initial urn. One color is drawn, that means one therapy is allocated, the drawn color is replaced, and a balls of the same color are added, as well as b balls of the opposite color(s). a>0 and b=0 corresponds to complete randomization. We expanded Wei's urn model according to Kundt (Kundt 2002) to allow for more than two treatment groups. Zhao (Zhao, Ramakrishnan 206) shows how to define different allocation rates via equal allocation. E.g. in the case of an allocation ratio of a:b the model is expanded to a+b treatment arms with equal allocation rates. In a simulation study we show the balance behavior of the generalized UD(w,a.b) compared to other randomization methods also for small sample sizes. 23

126 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.37 When will my trial end? Vonthein R. Universität zu Lübeck, IMBS, ZKS, Lübeck, Germany Background: Multicentre trials, especially the event driven kind, seem to be slowly recruiting initially, raising concern among DMC, sponsor or funding agency. Purpose: To extrapolate the end of a trial when trial sites are initiated not at once. To communicate projections. Methods: Assumptions about linearly growing case and event numbers are augmented by linearly growing number of recruiting trials sites. The resulting linear-quadratic line's equation is solved for the time the required case number is reached. The integral over the case-number curve is solved for the time point at which the required number of events can be expected. A shiny app illustrates projections interactively. Data from a current trial are presented as an example. Results: Recruitment time will be longer by half the initiation time. The app can interactively be adjusted for the trial at hand and the recruitment progress made. Calculations of trial duration become more precise initially and at interim monitoring. Conclusion: A well projected recruitment time becomes more precise by adding half the initiation time. Such projection and its updates can be visualized by interactive app. P.38 Weighted estimation in AMMI and GGE models Hadasch S., Piepho H.-P., Forkman J. 2 University of Hohenheim, Biostatistics, Stuttgart, Germany, 2 Swedish University of Agricultural Sciences, Crop Production Ecology, Uppsala, Sweden In multi-environment trials (MET), a set of genotypes is tested in a set of environments. Genotype-by-environment interaction is commonly assessed using the "additive main effects and multiplicative interaction" (AMMI) model or the "genotype main effects and genotype-by-environment interaction" (GGE) model. In these models, interaction effects are described by a sum of multiplicative terms, commonly estimated using singular value decomposition (SVD). When the genotype-environment means are independent and homoscedastic, the SVD provide ordinary least squares (OLS) estimates of the parameters. In a MET, the experimental design within environments must be taken into account when estimating the genotype-by-environment means. These adjustments for experimental design imply that estimated genotype-by-environment means are correlated with heterogeneous variances. In such cases, OLS as provided by SVD does not yield an optimal solution in terms of the means squared error of the genotype-by-environment means. Thus, a generalized least squares estimation, which uses the covariance matrix of the estimated genotype-by-environment means, may be more efficient for estimation of parameters of AMMI/GGE models. We propose a weighted AMMI/GGE algorithm that can take any covariance matrix of the estimated genotype-by-environment means into account. The algorithm is an extension of the criss-cross multiple regression algorithm proposed by Gabriel and Zamir (979). The performance of this algorithm was investigated with regard to mean squared error of true genotype-by-environment means. Alternative weighting schemes were compared. The data was simulated using variance components of a real MET laid out as resolvable incomplete block designs. P.39 A modified Chao estimator and bias correction for zero-truncated one-inflated count data Kaskasamkul P., Böhning D. University of Southampton, Mathematical Sciences, Southampton, United Kingdom The crucial assumption in capture-recapture is homogeneity. A heterogeneity in the capture probabilities is often occurring and ignoring heterogeneity can lead to biased estimations. Chao s estimator is popular for heterogeneity as its formula is easy. Moreover, it is asymptotically unbiased for a count distribution being a member of the power series family and also provides a lower bound if the count distribution is a mixture of the power series family. However, Chao s estimator has a serious problem of overestimation when the count data experience one-inflation. Modified Chao estimator is developed to avoid overestimation stemming from one-inflation by using the frequency of count twos and threes instead of the frequency of count ones and twos. The modified Chao estimation retains the good properties of the classical Chao estimator; asymptotically it is an unbiased estimator for a power series distribution with and without oneinflation and provides a lower bound estimator under a mixture of power series distributions with and without one-inflation. However, both classical and modified Chao estimators are biased estimators for small sample size. Three versions of bias correction for the modified Chao estimator have been developed. The frequency of count, is assumed to follow a Poisson distribution. The properties of the Poisson distribution are used to reduce the bias. To investigate the performance of the modified Chao estimator and demonstrate how well all bias reduction versions work, the geometric and the mixture of geometric distributions with and without one-inflation are considered in simulation study. The results show that the larger the one-inflation, the higher the overestimation bias of the classical Chao estimator. Conversely, the modified Chao estimator can avoid the effect of one-inflation. All bias reduction versions of modified Chao estimator show a good performance for all cases of study, especially good is the last version. P.40 An adaptive robust test in detecting genetic association using a random-effects model Lee J.-Y. Feng-Chia University, Department of Statistics, Taichung, Taiwan In detecting association between disease and genetics, the efficiency of test always affected by the assumption of genetic models. CATT is an optimal and well-known choice when the underlying genetic model is known. However, the true correlation in disease and genetic is not easy to identify clearly, and hence, robust tests are preferable. In this paper, we propose an adaptive robust test without model restriction conditions. The test derived from a random-effect model and calculated the selected 2 2 tables which are derived from the usual 2 3 table. Simulation results shows that the proposed test had good performance when using the underlying genetic model and effect directions are known. To overcome the power loss under various genetic models and different effect directions, a datadriven procedure engaged to construct a robust association test. Finally, two cancer studies are used to demonstrate applications of the proposed test. P.4 Sample size estimation in Receiver Operating Characteristic (ROC) analysis for clustered data Iijima H. Hokkaido University Hospital, Sapporo, Japan ROC analysis plays central role in estimating discriminative power of a biomarker. Sample size estimation in ROC was first proposed by Obuchowski et.al[]. The method proposed by Obuchowski is applicable only for non-clustered data, which is a single biomarker data 24

127 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed collected from an individual subject. However, as is often the case in histological study in cancer, multiple tumor tissues are collected from a single subject. In the situation, a new sample size estimation method is desired to fully make use of all the information from the tumor tissue samples; however, none of methods has been proposed for the appropriate sample size estimation in ROC analysis for clustered data. ROC analysis for clustered data is proposed in regards of Generalized estimation equation. In this study, we proposed a sample size estimation method for clustered data by extending Obuchowski's method to a clustered data. The proposed method is assessed by a Monte Carlo simulation, and the simulation result will be shown at the presentation.. Obuchowski NA, McClish DK. Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Stat Med 997; 6: P.42 real time implementation of therapeutic drug monitoring to individualize treatment of infection in patients with acute kidney injury Weeks H., Jarrett R., Fissell W. 2, Shotwell M. Vanderbilt University Medical Center, Biostatistics, Nashville, United States, 2 Vanderbilt University Medical Center, Medicine, Nashville, United States Infection is the leading cause of death among patients with acute kidney injury. Due to dramatic pharmacokinetic heterogeneity, continuous assessment of pharmacodynamic target attainment (PDTA) may be critical for effective antibiotic therapy and mitigation of toxicity risks. However, contemporary therapeutic drug monitoring methodologies are difficult to implement in real time. Furthermore, they poorly account for individual pharmacokinetic heterogeneity and often fail to provide uncertainty estimates for summaries of PDTA, which are important for clinical decision making. Using a Bayesian compartmental model and prior pharmacokinetic data, we developed statistical methodology and an associated web application that facilitate assessment of individual pharmacokinetics in real time. Users of the application enter dosing characteristics (frequency, duration, and rate of infusion) for a given patient and may update the model with drug concentration measurements, which indicate how patient-specific pharmacokinetics are affecting response to treatment. The application provides an estimate of PDTA with a measure of statistical uncertainty using Laplace and delta method approximations. Additionally, users can enter and manipulate characteristics for future doses to examine the projected impact on PDTA. A tool of this nature allows physicians to tailor dosing to an individual in order to improve the probability of effective and safe treatment. In evaluating the performance of our methodology, approximations are slightly anti-conservative. As a result, interval estimates of PDTA may be optimistic, i.e. too narrow. Alternative statistical methods are under study that may offer a more suitable tradeoff between computational efficiency and accuracy when estimating statistical uncertainty. P.43 A predictive model of one year mortality in admitted patients by acute heart failure Quintana J.M.,2, Antón-Ladislao A.,2, García-Gutierrez S.,2, Lafuente I.,2, Morillas M.J. 3, Hernández E. 4, Rilo I. 5, Murga N. 6, Quirós R. 2,7, Lara A. 8 Hospital Galdakao-Usansolo, Unidad de Investigación, Galdakao, Spain, 2 Red de Investigacion en Servicios Sanitarios y Enfermedades Cronicas (REDISSEC), Galdakao, Spain, 3 Hospital Galdakao-Usansolo, Sº Cardiología, Galdakao, Spain, 4 Hospital Santa Marina, Sº Cardiología, Bilbao, Spain, 5 Hospital Universitario Donostia, Sº Cardiología, Donostia, Spain, 6 Hospital Universitario Basurto, Sº Cardiología, Bilbao, Spain, 7 Hospital Costa del Sol, Sº Medicina Interna, Marbella, Spain, 8 Hospital Universitario de Canarias, Sº de Cardiología, San Cristóbal de la Laguna, Spain Objectives: Patients with a hospital admission by acute heart failure (AHF) have a high mortality rate. The goal of this study was to develop clinical prediction rules for one year mortality. Methods: Prospective cohort study of 323 admitted patients in 7 hospitals with a diagnosis of AHF. Variables were collected at the emergency department (ED), admission, discharge, and until one year afterwards. Patients reported outcomes measures (PROMs) were also collected at baseline fulfilling the Minnesota questionnaire. Statistical analysis: development of the predictive models using multivariate Cox and logistic regression in a derivation and validation sample. Main outcome was death at one year after the ED index visit. Results: The Cox regression model included as predictors, age, previous admissions and/or ED visits, sodium, urea, etiological diagnosis of coronary disease, haemoglobine level, systolic blood pressure and the Minnesota general score at baseline. A risk score from 0-27 points and 3 severity categories were created. Mortality rates at year were 5.7%, 7.89% and 47.09% (p< 0.000), respectively, in the derivation sample. AUCs of the model in the derivation and validation samples were 0.76 ( ) and 0.76 ( ) respectively; Hosmer-Lemeshow test p values were 0.23 and 0.3. The same risk score replicates by logistic regression (AUC: 0.73( ) and 0.7( ) in the derivation and validation samples, respectively). The inclusion of the hospital did not affect the results of the AUCs. Conclusions: This clinical prediction rule classifies AHF patients in severity categories for medium term mortality to better address clinical decisions. P.44 Challenges regarding the use and interpretation of the population attributable risk (PAR) in epidemiological studies Fritz J., Ulmer H. Medical University of Innsbruck, Department of Medical Statistics, Informatics and Health Economics, Innsbruck, Austria The population attributable risk (PAR), that is the reduction in incidence that would be observed if a population were entirely unexposed, compared with its current exposure pattern, is a commonly used measure for the impact of risk factors on a specific disease with immediate clinical relevance. However, readers of articles in clinical journals seem to struggle with the correct interpretation of PARs presented; and apparently, also authors sometimes seem to be unsure about the concept of PARs. Issues frequently occurring are (i) unclear dissociations from classical diagnostic and predictive measures (sensitivity, specificity, and predictive values); (ii) PARs for single risk factors vs. multivariable PARs; (iii) estimation of PARs in case-control studies where odds ratios do not match relative risks; and (iv) arbitrary cut-off selection when risk factors are continuous. We discuss these issues based on recent examples from the cardiovascular research literature (INTERHEART, INTERSTROKE studies), illustrate them with examples from own data, and demonstrate that a high PAR does not necessarily mean a high predictive ability of a risk factor. 25

128 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.45 A simple way to unify multicriteria decision analysis (MCDA) and stochastic multicriteria acceptability analysis (SMAA) using a Dirichlet distribution in benefit-risk assessment Saint-Hilary G., Cadour S. 2, Robert V. 2, Gasparini M. Politecnico di Torino, Dipartimento di Scienze Matematiche (DISMA) Giuseppe Luigi Lagrange, Torino, Italy, 2 Institut de Recherches Internationales Servier (IRIS), Department of Biostatistics, Suresnes, France Quantitative methodologies have been proposed to support decision making in drug development and monitoring. In particular, Multicriteria Decision Analysis (MCDA) and Stochastic Multicriteria Acceptability Analysis (SMAA) are useful tools to assess the benefitrisk ratio of medicines according to the performances of the treatments on several criteria, accounting for the preferences of the decision makers regarding the relative importance of these criteria. However, even in its probabilistic form, MCDA requires the exact elicitations of the weights of the criteria by the decision makers, which may be difficult to achieve in practice. SMAA allows for more flexibility and can be used with unknown or partially known preferences, but it is less popular due to its increased complexity and the high degree of uncertainty in its results. We propose a simple model as a generalization of MCDA and SMAA, by applying a Dirichlet distribution to the weights of the criteria and by making its parameters vary. This unique model permits to fit both MCDA and SMAA, and allows for a more extended exploration of the benefit-risk assessment of treatments. The precision of its results depends on the precision parameter of the Dirichlet distribution, which could be naturally interpreted as the strength of confidence of the decision makers in their elicitation of preferences. P.46 Uncertainties and coping strategies in the regulatory review of orphan medicinal products Zafiropoulos N., Koenig F., Pignatti F. 2, Guizzaro L. 2, Posch M. Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Vienna, Austria, 2 European Medicines Agency, London, United Kingdom There is relevant literature, in various disciplines, regarding the classification of uncertainty that decision makers face, as well as the coping strategies employed to address it. Although uncertainty is a crucial element in the scientific assessment of new drug applications, there has been little work to shed light on the way regulatory decision makers cope with it in evaluating the benefit-risk of medicines []. Since 20, the EMA has introduced a new template for documenting the benefit-risk assessment of new drug applications, distinguishing between key benefits and risks, and their associated uncertainties, which are systematically published as part of the European Public Assessment Reports (EPARs). The purpose of this work was to develop a classification of the type and source of uncertainties identified in the benefit-risk assessment of oncology medicinal products, and the coping strategies used to address them. The classification was applied retrospectively on assessment reports produced by EMA for new oncology products (and new indications) approved since 20 (n=69). The profile of uncertainties and coping strategies identified in the set of orphan products (n=52) was compared to the set of non-orphan products in order to explore any systematic differences. We anticipate that this classification may have wider application, in a prospective manner, to aid the scientific assessment. The knowledge gained will support future initiatives in modelling benefit-risk assessment decisions and identifying thresholds of acceptability of uncertainties in different situations. References: [] Benefit Risk Methodology. Work Package -5 reports. EMA Webpage (Last accessed May 207). Funding: This work has been funded by the FP7-HEALTH-203-INNOVATION- project Advances in Small Trials Design for Regulatory Innovation and Excellence (ASTERIX). Grant Agreement No Website: P.47 Transmission based association test for multivariate phenotype using quasi likelihood Kulkarni H., Ghosh S. Indian Statistical Institute, Human Genetics Unit, Kolkata, India The classical transmission disequilibrium test (TDT) [Spielman et al. 993] based on the trio design is an alternative to the population based case-control design to detect genetic association as it protects against population stratification. Since the manifestation of most diseases are governed by multiple precursor traits, it is important to study the multivariate phenotype comprising these precursors to improve the statistical powers of the test procedure. We modify the classical Transmission disequilibrium test for quantitative traits based on logistic regression [Waldman et al. 999, Haldar and Ghosh 205] to incorporate multivariate phenotypes. We adopt a quasi likelihood approach [Wedderbur 974] based on Generalized Linear Regression [McCullagh and Neldar 989] to develop a test of association for multivariate phenotypes. Since the Generalized Estimating Equation (GEE) approach [Gourieroux, Monfort, and Trognon 984; Liang and Zeger 986] used for solving the quasi likelihood equation is highly influenced by outliers, we use a modified Resistance Generalized Estimating Equation approach (RGEE) [Preisser and Qaqish 999, Preisser and Qaqish, 996; Hall, Zeger, and Bandeen-Roche,996] to down weight the outliers. We also explore a modified model that includes information on allelic transmission from both parents. We perform extensive simulations under a wide spectrum of genetic models and different correlation structures between the phenotypes. We compare our method with the FBAT test procedure [Lake et al. 2002] as well as the univariate approach. We find that the proposed method that incorporates information on both parents is more powerful than FBAT and the univariate approach. P.48 Projecting cancer incidence rates and case numbers: a probabilistic approach using data from German cancer registries ( ) Stock C., Brenner H., Mons U. 2 German Cancer Research Center (DKFZ), Division of Clinical Epidemiology and Aging Research, Heidelberg, Germany, 2 German Cancer Research Center, Cancer Prevention Unit, Heidelberg, Germany Projections of cancer incidence rates and case numbers, either stratified or standardized according to age and sex, are of great interest for healthcare planning and research. Historically, these projections have often been based on age-period-cohort and joinpoint regression models assuming Poisson distributed outcomes. A drawback in applications of these two approaches has been that the projections were usually deterministic, i.e. they did not allow statements of uncertainty, which would be desirable for communicating results to health policy decision makers. Although Bayesian age-period-cohort (APC) approaches have been proposed which allow for probabilistic projections, consideration of alternatives may sometimes be warranted or even necessary. This is the case especially in situations where long-term nationwide cancer incidence data for the development oft projection models are not available, which 26

129 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed complicates the use of APC models, or where it appears unreasonable to assume age, period and cohort effects. We propose a modeling strategy for projection of vital rates data based on Bayesian Poisson and negative binomial models with linear and (restricted) cubic spline effects of year and age. It is applied to German cancer registry data which is available nationwide from 999 to 203 to project stratum-specific and standardized cancer incidence rates along with uncertainty intervals in We further combine the predictive distributions of incidence rates with probabilistic estimates of population projections in Monte Carlo analyses to obtain estimates of future incident case numbers and corresponding uncertainty intervals. The validity of this method is evaluated using data for selected cancers from the Surveillance, Epidemiology, and End Results (SEER) Program in the United States that is available from 973 to 203. We discuss the benefits and problems of the approach and conclude that its application may facilitate communication of epidemiological research to health policy decision makers. P.49 Mortality prediction in stroke patients with a flexible parametric survival model Owczarek A., Wajda J. 2, Świat M. 3,4, Brzozowska A. 5, Olszanecka-Glinianowicz M. 6, Chudek J. 5,7 Medical University of Silesia in Katowice, Department of Statistics, Faculty of Pharmacy and Laboratory Medicine in Sosnowiec, Sosnowiec, Poland, 2 Regional Specialist Hospital No. 3 in Rybnik, Dialysis Center in Rybnik, Rybnik, Poland, 3 Regional Specialist Hospital No. 3 in Rybnik, Department of Neurology with Stroke Unit, Rybnik, Poland, 4 Jan Dlugosz University in Czestochowa, Częstochowa, Poland, 5 Medical University of Silesia in Katowice, Pathophysiology Unit, Department of Pathophysiology, Medical Faculty in Katowice, Katowice, Poland, 6 Medical University of Silesia in Katowice, Health Promotion and Obesity Management Unit, Department of Pathophysiology Medical Faculty in Katowice, Katowice, Poland, 7 Medical University of Silesia in Katowice, Department of Internal Medicine and Oncological Chemotherapy, Medical Faculty in Katowice, Katowice, Poland Background: Stroke is a third leading cause of death and the most common cause of disability in the population older than 45 years. Objectives: The aim of the study was to assess factors associated with an increased risk mortality in patients with ischemic stroke. Methods: Serum samples from patients with acute ischemic stroke were used. The Modified Rankin Scale, comorbidities, previous ischemic cerebrovascular episodes, cardiovascular risk factors and serum levels of ipth, CRP and 25-OH-D 3 as well as follow-up data were obtained from hospital medical records and PESEL register. The Cox proportional hazard (PH) regression with Schoenfeld residuals used for PH assessment and a flexible parametric survival (FPS) model, both with stepwise backward elimination based on AIC criterion, were used to assess death risk factors. Results: In the Cox PH regression: Rankin Scale 4 (HR = 2.90; p < 0.0), Age 72 years (HR = 2.47; p < 0.00), serum levels of 25- OH-D 3 (HR = 2.07; p < 0.05) and diabetes (HR =.54; p < 0.) proved to be risk factors of death in long term observation. Because the first factor did not fulfill PH assumption, FPS model was used, in which the Rankin Scale 4 (HR = 3.09; p < 0.00), Age 72 years (HR = 2.46; p < 0.00), serum levels of 25-OH-D 3 (HR = 2.09; p < 0.0) and diabetes (HR =.57; p < 0.05) proved to be risk factors of death. For factors fulfilled PH assumption HR values were almost identical. On the contrary, in the FPS model the HR for Rankin Scale 4 factor was not only higher but also more significant. Conclusions: The FPS model is a valuable tool for the analysis of censored survival data in case of nonproportional hazard factors in the Cox survival regression. P.50 Is there a relationship between ambient heat in the warm season and the sudden infant death syndrome? Heinzl H., Waldhoer T. 2 Medical University of Vienna, Section of Clinical Biometrics, CeMSIIS, Vienna, Austria, 2 Medical University of Vienna, Department of Epidemiology, Center for Public Health, Vienna, Austria Canadian authors reported recently (Auger et al., Environmental Health Perspectives, 205) that high maximum ambient temperatures were strongly correlated with increased numbers of sudden infant death syndrome (SIDS) cases in Montreal, Canada, during the warm season (April through October). Vienna is an Austrian city roughly comparable to Montreal with regard to size of population, number of SIDS cases, and temperatures in the warm season. In an external validation attempt, the same statistical approach as in the Canadian study was applied to data from Vienna. The Viennese results were negative, although the Viennese study was powerful enough to detect much smaller effects than the reported Montrealean ones. Various conceivable reasons for the difference in the Canadian and Austrian results will be discussed. Among others, they include statistical, scientific and data quality issues. P.5 Modeling associations between sick leave and workload, life satisfaction and personality traits. The results from a large, cross-sectional Norwegian Registry based study (HUNT 3) Cvancarova Småstuen M., Gjerstad P. 2 Oslo and Akershus University College of Applied Sciences, Public Health, Oslo, Norway, 2 NTNU, Trondheim, Norway Objectives: Little is known about how workload, life satisfaction and personality traits are associated with being on sick leave for the general working population and for leaders. In Norway, several high quality data registries are available that can be used to generate "real life" data. All individuals living in Norway are given a unique ID number which makes it possible to trace individuals and link data from different sources. Methods: Data were obtained from the Health Study in Nord-Trøndelag, Norway (HUNT), which is one of the largest health studies ever performed. A total of individuals attended, 5.8% were females. Using information on type industry and occupation, we identified 95 responders who were leaders; 430 were leaders in public and 52 in the private sector. Of those, 455 (48%) reported being on sick leave during the previous 2 months. Level of neuroticism was assessed using a questionnaire. Associations between being on sick leave (yes vs no) and selected covariates (gender, age at inclusion, level of life satisfaction, lengthy working hours and neuroticism) were modelled using multivariate logistic regression. Results: Our data revealed that leaders of both genders who reported low levels of life satisfaction and high levels of neuroticism were more likely to be on sick leave, especially for those working in private sector. Conclusion: Large registries are useful source of information, however such data can be interpreted meaningfully only with close cooperation between statisticians and professionals. Moreover, there is a need for standardised, validated and culturally adapted questionnaires. 27

130 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.52 Longitudinal analysis of medication use in patients with severe obesity using Prescription Registry data in Norway. Results from a large study with 0 years follow-up Cvancarova Småstuen M.,2, Jakobsen G.S. 2, Hertel J.K. 2, Sandbu R. 2, Hjelmesæth J. 2 Oslo and Akershus University College of Applied Sciences, Public Health, Oslo, Norway, 2 Morbid Obesity Center, Vestfold Hospital Trust, Medicine, Tønsberg, Norway Background: Bariatric surgery is associated with remission and prevention of obesity related comorbidities, however the comparative long-term effect of surgical and medical tertiary care obesity treatment on drug treated comorbidities remains unknown. Objective: To compare remission and new-onset of drug treated hypertension, diabetes and dyslipidemia in a cohort of treatment seeking patients undergoing surgical or medical weight loss treatment. Materials and methods: Patients treated for severe obesity (N=995) in a tertiary care centre were included from 2005 to 200. Followup data from the Norwegian Prescription Registry (NorPD) were retrieved from Unique ID-numbers in Norway make it possible to link data from registries. In NorPD, each collected prescription is registered and information extraction requires high level of data management. Patients were categorized as users or non-users based on retrieved prescriptions the year before inclusion, and remission and new-onset disease were defined in each group per year of follow up. Differences between treatment groups and possible differences between time trajectories (interaction between group and time) were analysed using mixed models for repeated measurements with binary outcomes (registered use or not). All analyses were stratified by previous drug use and some also by gender. Mortality risk for both groups was modelled using Cox regression. Results: Our data revealed large differences between treatment groups. Conclusion: NorPD is a useful tool that enables researchers to monitor use, changes and trends in use of drugs and can be used to analyse effects of treatments on comorbidities. However, close collaboration between clinicians and statisticians is warranted. P.53 Association between mortality from top5 cardiovascular diseases in Poland, sex, age and marital status Cicha-Mikołajczyk A. Institute of Cardiology, Department of Epidemiology, Cardiovascular Disease Prevention and Health Promotion, Warsaw, Poland Background: Cardiovascular diseases (CVDs) are the most common cause of death in Poland - in they caused 46% of all deaths. The top5 leading causes of death were heart failure (I50), atherosclerosis (I70), chronic ischemic heart disease (I25), acute myocardial infarction (I2) and cerebral infarction (I63). Aim: The aim of the study was to investigate the association between mortality from TOP5 CVDs, sex, age and marital status. Methods: Data were obtained from Central Statistical Office (GUS). The dataset was restricted to top5 causes of CVD deaths from and age between years (n=379328). Single correspondence analysis was performed to investigate the association between CVD deaths and age or marital status for both sexes. Multiple correspondence analysis (MCA) was conducted by sex, age and marital status in two subgroups which included infarctions (I2, I63) and other CVDs (I25, I50, I70). Results: Single correspondence analysis revealed that death due to acute myocardial infarction (AMI) was strongly associated with being men aged and women aged It also uncovered that AMI caused the greatest mortality in married women. MCA confirmed the association between death due to atherosclerosis and being aged and association between death due to chronic ischemic heart disease (IHD) and being men, married or in age Conclusions: The risk of CVD death due to chronic IHD and atherosclerosis increases with age, while premature mortality due to AMI is observed in men aged P.54 Robust likelihood analysis of current count and current status data Wen C.-C. Tamkang University, Department of Mathematics, New Taipei, Taiwan We develop a joint analysis approach for recurrent and nonrecurrent event processes subject to case interval censorship, which are also known in literature as current count and current status data, respectively. We use a shared frailty to link the recurrent and nonrecurrent event processes, which are assumed to follow a nonhomogeneous Poisson process and a proportional odds model, respectively, conditional on the frailty. These models are semiparametric in that the effects of the covariates on the incidence of events are modeled parametrically while the baseline functions in the event processes are modeled nonparametrically. We propose the pseudo- and sufficient likelihoods for estimation of the joint regression models for the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty unspecified. This method is robust in the sense that it makes no distribution assumption on the frailty. The resulting estimators for the regression parameters and the baseline functions are shown to be consistent with rates of square and cubic roots of the same size, respectively. Asymptotic normality with closed-form asymptotic variance is derived for the estimator of the regression parameters. Simulation results reveal satisfactory finite-sample performances of the estimators. We apply the proposed method to the fracture-osteoporosis survey data to identify risk factors for fracture and osteoporosis in elders while accounting for association between the two events within a subject. P.55 Distance correlation methods for model-free feature screening in high-dimensional survival data Edelmann D., Benner A. German Cancer Research Center (DKFZ), Biostatistics, Heidelberg, Germany Székely, Rizzo and Bakirov (2007) and Székely and Rizzo (2009) introduced the powerful concept of distance correlation as a measure of dependence between random variables. Different to Pearson correlation, which only measures linear dependence, distance correlation can detect any kind of dependence including nonlinear or even nonmonotone associations. However, this concept can not be readily applied to measuring dependence between patient characteristics and survival, since it is not clear how to deal with censored observations. In this talk, we present two different versions of distance correlation for censored survival data. The first version uses only the covariate data of the full observations accounting for censored observations via inverse probability of censoring weights (IPCW). In particular, we show that the true distance correlation can be approximated by an IPCW-weighted U- statistic. The second version uses all observations, imputing the difference in survival time between two observations, when at least one of the observations is censored. The imputation is based on an ad-hoc estimate of the expectation of the survival time difference. 28

131 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Both concepts directly induce variable screening methods for high-dimensional survival models. We compare the adapted methods with existing screening methods in a simulation study including both classical models and non-standard models. Finally, we apply the proposed techniques on detecting associations between survival and DNA methylation data. P.56 Subsampling strategies for clinical cohort studies focusing on expensive covariates Becker N., Ohneberg K. 2, Edelmann D., Benner A., Schumacher M. 2 German Cancer Research Center (DKFZ), Division of Biostatistics, Heidelberg, Germany, 2 University of Freiburg, Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center, Freiburg, Germany In clinical cohort studies, additional expensive or resource intensive markers can often be collected retrospectively only for a subset of patients. Then, two major questions arise: ) how to perform subset selection and 2) how to derive consistent and stable estimators of the effect of expensive markers on the clinical endpoint. A simple and frequently used approach is an outcome-dependent sampling. Especially for survival endpoints, one often applies an intuitive ad hoc strategy called 'extreme group' sampling. The subjects are then selected from "poor prognosis" (early deaths) and "good prognosis" (long term survivors) subsets. Note that this approach generates a biased sample since the patients with the mid-term survival time (survivors and censored individuals) are omitted. Another intuitive but unbiased approach is simple random sampling. Reasonable alternatives from the family of outcome-dependent sampling are nested case-control and case-cohort sampling designs. However, these cohort sampling designs are intended for the situation of rare events, which is seldom the case in clinical cohort studies. Here we have to deal with non-rare event situations where the number of events might even exceed the number of patients in the selected subset. In this talk we propose modifications of nested case-control and case-cohort sampling designs for non-rare events. We compare the modified cohort sampling designs with random and extreme group sampling in a simulation study and a real data application with respect to estimation bias and variance. We demonstrate the uncontrolled estimation bias of extreme group sampling. On the other hand, we encourage the use of modifications of nested case-control and case-cohort sampling in clinical cohort studies. P.57 Application of finite mixture models for survival analysis with competing risks Shimokawa A., Miyaoka E. Tokyo University of Science, Mathematics, Tokyo, Japan In this study, we focus on the estimation of the cause-specific survival function of a risk in some competing risks under the assumption of dependent censoring. It is well known that the dependent censoring is non identifiable without additional information. Therefore, a sensitivity analysis for assessing the changes of estimates by dependency between failure and censoring is important things. In our setting, the survival function of failure time is modeled by a finite mixture model of conditional survival functions given failure causes. In addition to this, we assume that the joint distribution of the failure and censoring times which are conditional on some failure cause is given by a known copula. Under this assumption, we propose an iteration algorithm based on EM algorithm for estimating the parameters. The performance of these methods are examined by simulation studies. Additionally, we show the results of applying the methods to actual data. P.58 Quantile differences between two survival curves Mittlböck M., Heinzl H. Medical University of Vienna, Section of Clinical Biometrics, CeMSIIS, Vienna, Austria The Kaplan-Meier curve is a consistent estimator of the survival function. It is frequently used for the visual comparison of censored survival times between different groups of patients. The diagonal upper-left-to-lower-right nature of the plot, however, makes it difficult to visually assess the differences between quantiles of two Kaplan-Meier curves. In order to overcome this problem, a simple quantile-differences-versus-probability plot is suggested. Two options to visualize the variability of the estimator are considered, (a) adding a bundle of resampled bootstrap step functions and (b) adding approximate bootstrap confidence bands. Key features of the method will be illustrated and discussed. P.59 Multi-state modeling and simulation of patient trajectories after allogeneic hematopoietic stem cell transplantation (allo-hsct) to inform drug development James D., Ng J., Wei J. 2, Vandemeulebroecke M. 3 Novartis, East Hanover, United States, 2 Novartis, Shanghai, China, 3 Novartis, Basel, Switzerland Objectives: To characterize patient trajectories through states of disease after allo-hsct, quantifying the transition rates into various event states and identifying patient characteristics associated with differential transition rates. This activity was conducted to investigate drug development scenarios for the prevention of Graft-versus-Host-Disease (GvHD) after allo-hsct. Methods: Multi-state models were built on data from the Center for International Blood and Marrow Transplant Research (CIBMTR). Events of interest included acute GvHD (agvhd), chronic GvHD (cgvhd), relapse of the underlying disease, and death. Six timecontinuous, finite-state Markovian models of increasing complexity were built on a sub-set of patients matching the specific target indication. Ten candidate baseline covariates were considered. Selection of a final model was based on stepwise covariate selection, goodness-of-fit diagnostics, and clinical relevance. In a second step, trial scenarios were simulated based on the final model and assuming various putative drug effects on top of the background transition hazards to quantify 4 composite endpoints of interest. Results: A final 5-state, 0-transition model was selected. It included 5 baseline covariates affecting 5 transition rates. State probabilities were estimated for target patients, e.g., at 2 months, acute myeloid leukemia recipients of matched related donor allo- HSCT are estimated to have transition probabilities 0.024, 0.039, 0.00 and 0.227, from the initial state to agvhd, cgvhd, relapse and death, respectively. Simulations from this model allowed to compare the operating characteristics of a future clinical trial, assuming that the investigational drug reduces selected transition rates to a specified extent, and to compare the trial's power among 4 composite endpoints with various sample sizes. Conclusion: Multi-state models provide a rich framework for exploring complex progressive conditions such as the patient journey after allo-hsct. They can help characterize a background disease pattern, and drug development strategies can then be informed by simulations in which this background pattern is varied. 29

132 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed P.60 The impact of chronological bias for the log rank test Rückbeil M., Hilgers R.-D. RWTH Aachen University, Department of Medical Statistics, Aachen, Germany Background: The enrolment and treatment phase of a clinical trial usually stretches over a certain period of time. This creates the risk of chronological bias in terms of time dependent covariates. For the consideration of chronological bias in trials with a time-to-event outcome there exists only a parametric model for exponentially distributed outcome. In practice, however, the assumption of exponentially distributed outcome is usually too restrictive, especially since the most commonly used test methods are either nonparametric or semiparametric. Methods: We introduce a semiparametric model to consider a patient related time trend for the case of a time-to-event outcome variable. Based on this model we give a theoretical approximation for the shifted test statistic in the presence of chronological bias. We conduct a case study where two treatments are compared using a log rank test and no treatment effect is present to illustrate this shift and to investigate the impact of chronological bias on the type I error rate. Special attention is paid to the influence of censoring. Results and Conclusions: Our simulation results show that the type I error rate may be affected by the presence of chronological bias and that our approximation obtains reasonable estimates of the shifted test statistic. Censoring reduces the increase of type I error rate. Since the presented model depends on the random allocation of patients, our results can further be used to compare different randomization procedures with respect to their susceptibility to chronological bias. P.6 Identifying early indicators for future citation counts of scientific manuscripts: A case study in transplantology Kossmeier M.,2, Heinze G. 2 School of Psychology, University of Vienna, Department of Basic Psychological Research and Research Methods, Vienna, Austria, 2 Section for Clinical Biometrics, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria Citations are widely used as a measure for the scientific impact of publications, authors, journals and institutions. The goal of the present study was to identify early indicators for future citation counts of manuscripts submitted to the scientific journal Transplant International for publication. We considered a comprehensive set of 2 manuscript-, author- and peer-review-related variables available early in the peer review process to explain and predict future citation counts in the fixed timeframe two years after the year of publication. A multiple negative binomial regression showed that submitted manuscripts to Transplant International with more pages (count multiplier per additional page e β =.073, 95% CI [.05,.34]), an earlier calendar month of publication (count multiplier per additional month e β = 0.954, 95% CI [0.925, 0.984]), and reviews compared to original articles (count multiplier for reviews e β =.956, 95% CI [.246, 3.072]) were predicted to attract significantly more citations. For a reduced predictive model, nine variables were selected as prognostic factors. Although this predictive model attained an acceptable performance, hypothetical acceptance decisions based on model predictions using information available in the early peer review process performed considerably worse than the actual editorial decision. Incorporating predictions of such models as additional decision criterion might still have the potential to increase the prognostic validity of the editorial decision, especially for cases where the ultimate decision to accept or reject a submitted manuscript is made under high uncertainty. 30

133 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Author Index Aalen O.O. I9.4 Abrahamowicz M. I4/TCS04.2 TCS06a. TCS06a.2 Abrams S. CS08.3 Adamec V. TCS008.4 Adkison K. CS8.4 Adolf D. P.2 Akacha M. I/TCS030.2 Alberti C. TCS05.3 Alexander B. TCS004.3 Allignol A. CS06. Al-Sarraj R. CS00.2 Alsop J. CS3. Altman D. CS3.2 TCS06b. TCS062. Amit O. E04.2 Ances B. TCS022. Anderson K.M. I05/TCS036.4 Andersson M. P.5 Ando Y. CS3.2 TCS04.2 Ankolekar S. TCS06. Antonijevic Z. TCS003.2 Antón-Ladislao A. P.43 Antweiler K. P.22 Anusic T. P.22 Archer G. CS3.5 Arlett P. TCS07.3 Arnold M. CS00. Asakura K. I0. TCS006.3 Asterix Project TCS058.2 Atif M. P.7 Aurousseau A. CS06. Austin M. TCS06.2 Avian A. P.35 Azevedo C. CS00. Bacskai M. CS0.4 Bailey S. E05. Balázs T. CS0.4 Ballarini N. I06/TCS03.3 Ballerstedt S. CS07. Balser J. I09/TCS040.3 Banerjee A. TCS066.2 Barbati G. I/TCS030.5 Barnett H. I08/TCS026. Barrett J. CS06.3 Bathke A. I04/TCS057.2 TCS002.3 TCS045. Bathke A.C. TCS002.4 Bauer A. CS5.2 Bauer P. E07.4 Bayes-Genis A. I/TCS030.4 Becker K. CS03.2 Becker N. P.56 Beckman R. TCS003. Bedding A. E03.2 Begg C. E04. Behr S. TCS07.2 Belleli R. TCS07.2 Benda N. E07.3 Bender R. TCS069.3 Benner A. CS09.3 P.55 P.56 Benner L. CS24.3 Bennett M. E05.2 Berger M. TCS0.2 Berger T. TCS049.4 Berghold A. P.35 P.36 YSS.3 Berlin C. TCS07.2 Berres M. P.05 Berry D. TCS004.2 Best N. CS5. Beyersmann J. AS. AS.3 CS05.4 CS06. CS06.4 TCS037. Bhattacharyya A. E08. Biedermann S. CS05.5 CS4. Biggs K. CS3.2 Bigler M. P.29 Bilder C. CS20.3 Billingham L. TCS02.3 Binder H. I/TCS030.3 I/TCS030.4 I9.2 TCS045.2 YSS2.2 Bischl B. CS03.4 YSS.2 Blagus R. CS9.4 I4/TCS04.3 I4/TCS04.4 Bliss R. I09/TCS040.3 Bluhmki T. AS. AS.3 CS06.4 Bogdan M. TCS005. YSS.4 Bohning D. CS00. Böhning D. CS25. P.39 Bolgár B. P.06 Bommert A. CS03.3 Borg L. CS02.2 Börnhorst C. CS9. Bornkamp B. I06/TCS03.4 TCS025.3 Borowski M. CS03.2 Borzikowsky C. P.2 Bossek J. YSS.2 Bossert S. TCS025.4 Bouaziz O. TCS005.3 Boulesteix A.-L. CS03.4 I02.2 I02.3 I0.3 Bracher J. YSS2. Bramlage P. AS.3 Brandt A. TCS043.4 Brannath W. CS09. E07. I06/TCS03. TCS059.3 TCS069.2 Braun J. I0. Brenner H. P.48 TCS064.2 Brensing J. TCS00. Bretz F. AS.6 TCS05. TCS034. Brinks R. CS08.5 Brnabic A.J.M. TCS08.2 TCS08.3 Brock K. TCS02.3 Brown S. TCS02.3 Brückner M. I04/TCS057.4 Brunner E. TCS002. TCS045. TCS045.3 Brzozowska A. P.49 Brzyski D. TCS022. Buatois S. CS5.4 Buhl R. YSS2.2 Bühler A. AS. Burger H.U. TCS03.2 Burke D. P.3 Burke D.L. CS20.4 CS25.4 Burkholder I. TCS00. Burman C.-F. CS7.3 TCS003.3 TCS055.5 Burne R. I4/TCS04.2 Burnett T. TCS055.3 Büsch C.A. P.05 Busen H. I0.3 Bußwinkel L. CS00.4 Butler D. CS02.3 Buyse M. TCS02. Cable G. CS9.3 Cadour S. P.45 Callegaro A. TCS042.2 Candore G. TCS07.3 Casalicchio G. YSS.2 Chai H. CS23.3 Chakravartty A. TCS035. Chang M. I09/TCS040.3 I8/TCS046.3 TCS047.3 Chang Y.-C. TCS050. Chang Y.-M. P. Chang Y.-W. TCS04.3 Chatterjee A. CS4.2 Chen B. CS24.5 Chen C.-T. TCS047.2 Chen F. P.9 Chen M. H. I05/TCS036.2 Cheng L.-H. P.07 Cheng M. TCS050.2 Cheung L. CS05.2 Chiang C. TCS04. Chick S. TCS055. Christie J. CS3.5 Chudek J. P.49 Cicha-Mikolajczyk A. P.53 Clow F. TCS050.2 Coates E. CS3.2 Cocks N. CS02.3 Collignon O. TCS052. Collins G. TCS06a. COMBACTE MAGNET Consortium CS06.4 Comets E. TCS05.3 Cook A. I7.3 Coppenolle H. TCS049.5 Crauwels H. CS8.4 Cremaschi A. CS. Cugmas M. YSS2.4 Cui L. I8/TCS046.2 Cullis B. CS02.2 CS02.3 K3. Cunanan K. E04. Cvancarova Småstuen M. P.5 P.52 Czerwik N. P.03 Dahlberg M. TCS024. Dai L. TCS04.4 Dallow N. TCS038. Dane A. TCS069. Daniel R. TCS024.2 Daniels M. CS4.2 TCS023. Dardenne N. P.34 De Angelis D. I3I5I6.3 De Bin R. I02.2 I02.3 De Ridder F. CS8.4 TCS063.3 de Vos A. P.20 Deffner V. TCS06a.3 Dejardin D. TCS052.4 Delmar P. TCS052.4 Deng Q. TCS066. Devijver E. TCS005.2 D'Hollander N. CS8.2 Di Lenarda A. I/TCS030.5 Di Scala L. TCS070.4 Diaz C. P.03 Didelez V. CS9. I9. I9.3 Dimairo M. CS3.2 Dinya E. P.06 Dirnagl U. K2. Djeudeu D. CS9.2 Dmitrienko A. I0.4 Dobi B. CS0.5 Dodd K.W. TCS06a.3 Dölger E. CS2.3 Dong G. CS07. Donneau A.-F. P.34 Drexler D. P.06 Du Y. TCS055.4 DuMouchel W. TCS07. Dunger-Baldauf C. TCS037.3 Edelmann D. P.55 P.56 Egger-Danner C. CS02.5 P.03 Ehrhardt S. P.26 Eikermann M. P.4 Elsäßer A. CS8.3 Endriss V. TCS025.4 Englert S. TCS00.3 Ensor J. CS20.4 CS25.4 P.3 Erhardt E. CS5.5 Errath M. P.36 Evans S. TCS006.3 Evans S.R. I0. Eveslage M. P.26 Eyding D. P.4 Fabian T. TCS042.2 Faldum A. P.26 TCS058.4 TCS059. Fan S. TCS048.2 Fan T.-H. P.08 Faschingbauer F. I0.2 Fedorov V. TCS05.4 Feifel J. CS05.4 Feng S. CS09.2 Ferligoj A. YSS2.4 Fissell W. P.42 Fitzgerald M. CS2.5 Fleischer F. TCS054.5 Fletcher C. E0.3 Fletcher P. TCS02.3 Foraita R. CS9. I9. Forkman J. CS00.2 P.38 Forman J. CS4.3 Forster M. TCS055. Fraszczak M. P.03 TCS022.4 Freedman L.S. TCS06a.3 Freise F. TCS05.2 TCS060.3 Freiwald E. P.8 Frey N. CS5.4 Friede T. CS8. CS24.2 CS24.4 I03/TCS007.4 I7.2 I7.4 TCS006. TCS05.3 Friedrich S. TCS002. Frigessi A. CS. Fritsch A. I06/TCS03.2 Fritz J. P.44 Frommlet F. TCS022.2 Frühwirth-Schnatter S. P.04 Fundator M. P.25 Furukawa T.A. TCS050.3 Futschik A. I0.2 Gaasterland C. TCS065.2 Gallo P. TCS006.2 Gallopin M. TCS005.2 Gamerman V. CS8.3 Gao X. TCS045.4 García-Gutierrez S. P.43 Gasparini M. CS5.5 CS20. P.45 TCS06.3 Gasperoni F. I/TCS030.5 Gavanji P. CS24.5 Gefeller O. CS0. Geng J. TCS066. Genomika Polska TCS022.4 Gera R. CS24.4 Gerhard D. TCS068.3 Gerke O. AS.6 Geroldinger A. CS05. I02.4 Geys H. I08/TCS026. Ghosh S. CS23.4 P.47 Giannico R. P.03 Gilbert P. TCS042. Gjerstad P. P.5 Gleiss A. CS05.3 Glimm E. I7.4 TCS070.3 Goeman J. I4/TCS04.4 Goepp V. TCS005.3 Gombos T. P.32 Gonen M. E04. Goni J. TCS022. Görlich D. CS03.2 Gosho M. TCS050.3 Götte H. CS2.3 TCS038.2 Gould L. TCS048. Graczyk M. TCS008.3 Graf A. CS24.4 Gravestock I. I03/TCS007.3 Grayling M. TCS034.2 Greenland S. I02.4 Greven S. AS.4 Grieve A. I08/TCS026.3 Grill S. TCS069.2 Grini P. TCS005.4 Gronemeyer S. P.4 Guettner A. TCS063. Guillaume M. P.34 Guizzaro L. P.46 Guldbrandsen B. TCS022.4 Gwozdz W. CS9. Hadasch S. P.38 Halabi S. TCS006.4 Hamada C. CS.2 Hamasaki T. CS3.2 I0. TCS006.3 Hampson L. CS7.3 Hampson L.V. E05.4 I7. Hapfelmeier A. I0.3 Happ C. AS.4 Happ M. TCS045. 3

134 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed Harbron C. CS2. Harden M. CS8. Harezlak J. TCS022. Hartung K. CS00.3 Hayoz S. P.29 Hee S.W. TCS055.2 Hees K. TCS059.2 Heimann G. TCS07.2 Heinze G. CS05. I02.4 P.6 Heinzl H. P.50 P.58 Heinzmann D. TCS043.3 Held L. I03/TCS007.3 YSS2. Hemming K. CS25.4 Hendriks J. TCS042.3 Hengelbrock J. CS06.2 Hens N. CS08.3 Herath A. TCS03. Herkner H. CS25.3 Hernández E. P.43 Heron J. I2.3 Hertel J.K. P.52 Herwig C. CS03.5 Herzog S. CS08.3 Heussen N. TCS049.4 Heyse J. I0.4 Hicks K. CS3.5 Hida E. P.6 Hielscher T. CS23.2 Hilgers R.-D. P.60 TCS049.4 TCS058. Hind D. CS3.2 Hjelmesæth J. P.52 Hlávka Z. I0.3 Hoang K. TCS054.5 Hoffelder T. CS08.2 Hoffmeister M. TCS064.2 Hofner B. CS0. YSS.2 Höhle M. CS06.2 Holling H. TCS060.3 Hornung R. I0.3 Hothorn L. TCS068.2 Hothorn T. I06/TCS03.4 Houiwing- Duistermaat J. I02. Hoy M. CS03.2 Hoyer A. AS.5 CS08.5 Hsiao C.-F. P.07 TCS04. TCS047.2 Hsu Schmitz S.-F. CS07.3 Hua S. TCS043. Huang B.-H. P.07 Huang W.-S. TCS050. Hubin A. TCS005.4 TCS022.2 Huebner M. TCS06a. Hughes G. TCS035. Hüls A. CS02.4 Hummel M. CS23.2 Hušková M. I0.3 Husson E. P.34 I.Family consortium CS9. Iasonos A. E04. Ickstadt K. CS02.4 CS9.2 I06/TCS03.2 IDeAl Project TCS058. Idelevich E.A. CS03.2 Ieva F. I/TCS030.5 Iijima H. P.09 P.4 IJzerman-Boon P. CS08. InSPiRe TCS055.2 Ioannidis J.P.A. K.2 Iorio A. I/TCS030.5 Izem R. TCS032. Jacko P. CS8.5 Jackson D. P.3 Jackson L. I7.3 Jacobs T. CS5.5 I08/TCS026. Jahn A. I/TCS030.3 YSS2.2 Jahn-Eimermacher A. I/TCS030.4 Jaki T. CS09. CS.3 CS3.2 CS8.5 E05.4 I08/TCS026. Jakobsen G.S. P.52 James D. P.59 Janatzek S. P.4 Jang J.H. CS20.2 Janitza S. I0.3 Janssen M. P.20 Jarrett R. P.42 Jennison C. I7. TCS055.3 Ji Y. TCS004. Jiang Q. TCS032. Jiang W. CS24.5 Jilma B. TCS065.2 Jin B. TCS043. TCS066.2 Jobjörnsson S. TCS055.5 Jones B. E09.2 Jörgens S. TCS059.5 Josse J. YSS.4 Julious S. CS3.2 Jünemann R. P.2 Jung A. P.06 Jung K. TCS00.2 Junker R.R. TCS002.3 Kadziola Z. TCS08.2 TCS08.3 Kager J. CS03.5 Kaiser A. TCS054.2 Kammers K. I2.4 Karaagaoglu E. P.3 Karaismailoglu E. P.3 Karas M. TCS022. Kaskasamkul P. P.39 Katki H. CS05.2 Kawasaki T. P.27 Keller T. P.2 Keogh R. CS06.3 TCS074. Keogh R.H. TCS06a.3 Kerschke P. YSS.2 Kesselmeier M. TCS02.4 Kieser M. CS07.2 CS2.2 CS2.3 CS24.3 I04/TCS057.3 I/TCS030.4 TCS038.2 TCS059.2 TCS064.2 Kim H. CS9.3 Kim S. TCS028.4 Kimber A. CS05.5 Kipnis V. TCS06a.3 Kirby S. I08/TCS026.4 Kirchhoff D. YSS.2 Kirchner M. CS2.2 CS2.3 TCS038.2 Klau S. I02.3 Klein S. TCS054. Klingbiel D. P.29 Koch R. P.02 Koch R.J. TCS009.4 Koenig F. CS3.2 CS25.3 I06/TCS03.3 P.46 Komann M. TCS064.3 Konar N.M. P.3 Konietschke F. TCS002. TCS002.2 TCS045.4 König F. CS7.3 E07.4 I04/TCS057.4 TCS034. TCS055.5 TCS059.5 König J. TCS054.3 Konstantinides S. TCS045.2 Konstantinou M. CS05.5 Kopp-Schneider A. CS3.3 CS23.2 TCS02.5 Korn S. YSS2.2 Kos M. TCS022.3 Kosmidis I. I4/TCS04. Kossmeier M. P.6 Kovacs T. P.06 Krahnke T. TCS025.4 Krämer U. CS02.4 Krogh V. CS9. Kropf S. P.22 Krukas M. TCS032. Kruppa J. TCS00.2 Krzykalla J. CS09.3 Kübler J. TCS07.4 Kuechenhoff H. TCS06a.3 Kulkarni H. P.47 Kulmann H. TCS033.3 Kunert J. TCS060.2 Kunz M. E07.5 Kunzmann K. CS24.3 I04/TCS057.3 Kupas K. TCS069.4 Kuppler J. TCS002.3 Kuss O. AS.5 TCS064. Kutil R. TCS002.3 Kwiecien R. TCS058.4 TCS059. Lafuente I. P.43 Lai Y.-H. P.07 Lan K.K.G. P.9 TCS047.2 Lang M. CS03.3 YSS.2 Lang T. E09.3 Lang Z. CS0.4 CS06.5 Lange T. CS07.4 Lara A. P.43 Laslop A. E09. Laubach E. TCS009.3 Laux G. TCS064.2 Lee B. TCS048.2 Lee J.J. I05/TCS036. Lee J.-Y. P.40 Lee K.M. CS4. Leek J.T. I2.4 Legha A. P.3 Lehr S. E09.3 Lentz F. TCS05.3 Leon L. TCS035.2 Leonov S. CS7.2 Lesaffre E. TCS052.4 Levenson M. TCS032. Li D. CS07. Li E. CS09.2 Li G. P.9 Li X. TCS035. Li Y. TCS029.3 Liao S. TCS043. Ligges S. P.26 Lin C.-J. CS7.4 Lin J. TCS047.3 Lin T.-L. TCS049.2 Linero A. TCS023. Lissner L. CS9. Liu L. CS23.3 Liu R. I3I5I6.2 Liu-Seifert H. E03.3 Llewellyn L. TCS02.3 Loingeville F. TCS060. Lomp H.-J. TCS033. Long Q. CS20.2 Looby M. TCS063. Lorenz K. P.2 Lu Y. TCS048.2 Luedtke A. TCS042. Lusa L. CS05. Lüschen A. I06/TCS03. Ma H. TCS032. Madan J. TCS055.2 Madden L. I3I5I6. Madjar K. CS23. Magirr D. TCS052.3 Manatunga A. CS20.2 Mander A. CS3.2 TCS034.2 Mander A.P. E05.2 Manju M.A. CS08. Manolis E. E03. Marchenko O. TCS032. Martin S. TCS066.2 Maruo K. TCS050.3 Maruotti A. CS00. Matthias R.A. I2.4 Mayer M. CS02. Mayr A. CS0. CS0.5 I0.2 McMahan C. CS20.3 Mehran R.J. TCS029.4 Mehta C. I05/TCS036.3 Meißner W. TCS064.3 Mejza I. TCS009.2 Mejza S. TCS009.2 Menon S. I09/TCS040. TCS066.2 Menten J. CS8.2 Mentré F. CS5.4 TCS060. TCS063.2 Mercaldo N. TCS028.3 Michel V. TCS009. Middleton G. TCS02.3 Mielczarek M. P.03 TCS022.4 Mielke J. E09.2 Mielke T. TCS025. Miller F. TCS003.3 Miller S. CS3.5 Mills H. I2.3 Minozzi G. P.03 Mitra R. CS4. Mittlböck M. CS20.5 P.58 Miyaoka E. P.57 Möbius T.W.D. CS0.3 Model F. E03.3 Moder K. TCS009.5 Moebus S. CS9.2 Möhring J. CS00.3 TCS009.3 TCS009.4 Molenberghs G. TCS068. Molnar D. CS9. Mondal S. TCS035. Monnet A. TCS035.3 Mons U. P.48 Moreno L. CS9. Morillas M.J. P.43 Morita S. TCS029.2 Mörk E. TCS024. Möst L. CS.4 Mozgunov P. CS.3 Mueller P. TCS004.2 TCS029.4 Mueller W. TCS008.2 Mukhopadhyay P. TCS035. Mullins D. TCS0.2 Murga N. P.43 Mütze T. I7.4 Muysers C. TCS033.3 Nagashima K. TCS050.3 Nehmiz G. TCS054.4 Nelson J. I7.3 Nembrini S. CS0.3 Neuenschwander B. CS2.4 E05. TCS029. Ng J. P.59 Nguyen T.T. TCS060. Nicholl J. CS3.2 Nicolazzi E. P.03 Niewczas J. CS7.3 Nijs S. TCS042.3 Nikolakopoulos S. TCS065.3 Noack B. P.2 Nold M. CS05. Noma H. TCS050.3 Nuel G. TCS005.3 Offen W. E07.2 Ofner-Kopeinig P. P.36 Ogutu J.O. CS09.4 Ohneberg K. P.56 O'Kelly M. CS5.3 Olszanecka-Glinianowicz M. P.49 Ondra T. TCS055.5 Ono K. P.09 Owczarek A. P.49 Ozga A.-K. CS07.2 Page A. CS9. Pallmann P. CS3.2 Pan J. CS23.5 Pan Q. CS05.2 Parashar D. CS24. Parke T. CS2.5 Parkinson J.H. TCS002.3 Patel K. TCS052.4 Patel N. TCS06. Patus E. P.06 Pauly M. TCS002. TCS002.2 Paux G. I0.4 Pearce M. TCS055.2 Perperoglou A. TCS06a. TCS06a.2 Perthame E. TCS005.2 Pertile P. TCS055. Pétavy F. E04.3 Pétré B. P.34 Petto H. TCS08.2 Phillippo D. TCS08. Picchini U. CS4.3 Piepho H.-P. CS00.4 CS09.4 P.38 TCS009. TCS009.3 TCS009.4 Pigeot I. CS9. Pignatti F. P.46 Pilz J. TCS008. Pinheiro J. TCS066.3 Pinheiro L. TCS07.3 Placzek M. CS24.2 Poggenburg S. P.35 Popat S. TCS02.3 Posch M. CS7.3 CS24.4 E07.4 I04/TCS057.4 I06/TCS03.3 P.46 TCS034. TCS055.5 TCS059.5 TCS065.2 Posekany A. P.04 Pregartner G. YSS.3 Preussler S. CS2.2 Probst P. CS03.4 I0.3 Proschan M. CS3.2 I04/TCS057. Prus M. TCS060.4 PSI Modelling and Simulation Special Interest Group CS5.3 Putter H. TCS070.2 Qanbari S. TCS022.4 Qaqish B. CS7.2 Qu C. CS23.5 Quintana J.M. P.43 32

135 CEN ISBS 207 Conference Courses Keynotes Invited Invited / Topic-Contributed Topic-Contributed QUIPS investigators TCS064.3 Quirós R. P.43 Rahnenführer J. CS03.3 CS08.4 CS23. TCS06b.3 TCS06b.3 Rakonczai P. CS0.4 CS06.5 Rakovics M. YSS2.3 Randolph T. TCS022. Rauch G. CS07.2 I/TCS030.4 P.33 Reichelt M. CS02. Reiczigel J. CS9.5 Reinsch N. CS02. Reisch L. CS9. Relton C. I2.3 Renfro L. I8/TCS046.4 Retout S. CS5.4 Reuther S. P.30 Rieder V. CS08.4 Riehl J. I06/TCS03.2 Rigat F. CS3.5 CS5. Riley R. P.3 Riley R.D. CS20.4 CS25.4 Rilo I. P.43 Ring A. TCS069.2 Ristl R. TCS065. TCS065.2 Ritz C. TCS068.4 Riviere M.-K. TCS060. Rizopoulos D. P.20 TCS023.4 Robert V. P.45 TCS06.3 Rochon J. CS07.4 Rodriguez Girondo M. I02. Rodriguez-Diaz J.M. P.0 Roes K. TCS058.2 TCS065.2 TCS065.3 Rogers J. I/TCS030. Rohmeyer K. TCS059.3 Rooney D. CS0.4 Rosenblum M. TCS055.4 Rosenkranz G. I06/TCS03.3 TCS065.2 Rossoni A. P.03 Roth D. CS25.3 Rousson V. CS06. Röver C. I03/TCS007.2 I03/TCS007.4 Roy D. TCS066. Roy J. TCS023.3 Roychoudhury S. CS2.4 Rückbeil M. P.60 Rücker G. CS0.2 Rufibach K. TCS035.3 TCS038.3 Russek-Cohen E. TCS032. Sailer M.O. TCS054.5 Saint-Hilary G. P.45 TCS06.3 Sanchez-Kam M. TCS032. Sanchez-Leon G. P.0 Sandbu R. P.52 Sander O. TCS063. Sato N. P.09 Sauerbrei W. I02.2 I02.3 TCS06a. TCS06a.2 Saure D. TCS08.3 Saville B. TCS028.2 Scagnetto A. I/TCS030.5 Schacht A. TCS08.3 Schall R. TCS069.2 Scheiner C. YSS2.2 Schemper M. CS05.3 Scherag A. TCS02.4 TCS064.3 Scheuer N. CS2.4 Schiel A. E05.3 Schikowski T. CS02.4 Schildcrout J. TCS028.3 Schimek M.G. TCS07.3 Schmid C.H. TCS0. Schmid M. CS0. CS0.2 I0.2 TCS06a. TCS06a.2 Schmidli H. I03/TCS007. I7.4 Schmidt P. TCS009.4 Schmidt R. TCS058.4 TCS059. Schmidt S. E07. Schmidtmann I. TCS045.2 Schmoor C. E02. Schneeweiss S. TCS0.2 Schritz A. TCS052. Schueler A. TCS050.4 Schuemie M. TCS0.3 Schulz C. CS3.4 Schumacher A. P.02 Schumacher M. P.56 Schwabe R. TCS060.3 Schwarzenbacher H. CS02.5 P.03 Schwarzer G. CS0.2 Schwender H. CS02.4 Schwenke C. TCS069.4 Schwenke S. TCS069.4 Scott J. CS3.2 Seaman S. CS06.3 Seefried F. P.03 Sehner S. P.33 Seibold H. I06/TCS03.4 YSS.2 Sekula P. TCS06b.2 Senn S. TCS052. Seo T. P.27 Shafiq M. P.7 Shaw P. TCS06a.3 TCS074. Shen R. E04. Shimokawa A. P.57 Shotwell M. P.42 TCS028. Siani A. CS9. Sills G.J. E05.4 Simianer H. TCS022.4 Simon N. I8/TCS046. Sinagra G. I/TCS030.5 Singer J. P.32 Sinha D. TCS023.2 Slattery J. TCS07.3 Smith A. CS02.2 CS02.3 K3. TCS065.4 Snell K.I.E. CS20.4 CS25.4 Sobczyk P. YSS.4 Solberg T. P.03 Sotto C. P.9 Spangl B. CS7.5 Stallard N. CS24. TCS05.3 TCS055.2 TCS055.5 TCS058.3 Stare J. CS9.4 Stegherr R. AS.3 Stelzer I.V. CS03.5 Stensrud M.J. I9.4 Stock C. P.48 TCS064.2 Storvik G. TCS005.4 TCS022.2 STRATOS TCS06a. STRATOS TG2 TCS06a.2 STRATOS TG4 TCS06a.3 STRATOS TG5 TCS06b. TCS06b.2 Strelkowa N. TCS025.4 Strohmaier S. I9.4 Stucke-Straub K. TCS054.5 Su X. P.23 TCS029.3 Suchocki T. CS02.5 TCS022.4 Suderman M. I2.3 Sun H. AS.6 TCS02.2 Svendova V. TCS07.3 Swiat M. P.49 Swiss Group for Clinical Cancer Research P.29 Szczesnia R. CS06.3 Szyda J. CS02.5 P.03 TCS022.4 Szymczak S. P.24 Tallant G. TCS05.2 Tan M. TCS047. Tang D. TCS043.2 Tango T. P.6 Taskén K. CS. Taub M.A. I2.4 Tebbs J. CS20.3 Teerenstra S. E0.2 Teng Z. TCS047.3 Teuscher F. CS02. Thall P. TCS029.4 Thomas M. I06/TCS03.4 Thornton S. CS03. Tietze S. CS0. Tilling K. I2.3 Timm J. CS3.4 Timsit J.-F. AS. CS06.4 Ting N. TCS048.4 Tissier R. I02. Todd S. CS3.2 Toenges G. I/TCS030.3 Tolboom J. TCS042.3 Tooze J.A. TCS06a.3 Tornaritis M. CS9. Trippa L. TCS004.3 Trutschel D. P.30 Trutschnig W. TCS002.3 Tsai C.-C. P.28 Tsimberidou A. TCS004.2 Tsou H.-H. TCS047.2 Tymofyeyev Y. TCS025.2 Ueckert S. CS5.4 Uhlmann L. TCS064.2 Ulmer H. P.44 Umlauft M. TCS002.2 Unkel S. I03/TCS007.4 Urach S. TCS065.2 Ursino M. TCS05.3 Vach W. AS.6 TCS02.2 van den Heuvel E. CS08. Van der Lee H. TCS065.2 van der Schoot E. P.20 van der Windt D. CS20.4 van Houwelingen H. TCS070.2 van Rijn J.N. YSS.2 Van Rosmalen J. TCS052.4 Vandebosch A. CS8.2 TCS042.3 Vandemeulebroecke M. CS07. P.59 Vanschoren J. YSS.2 Vansteelandt S. K4. TCS024.2 Vanveggel S. CS8.4 Varona L. P.03 Veidebaum T. CS9. Vens M. P.24 Verde P.E. P.30 Verducci J. TCS07. Verweij P. TCS070.4 Vicario D. P.03 Vikman U. TCS024. Villar S. CS8.5 Vock M. CS20.5 Volkmann A. I02.2 von Brömssen C. CS00.2 von Bubnoff N. TCS02.2 Vonk R. CS09. I08/TCS026.2 TCS054.2 Vonthein R. P.37 Voß F. TCS037.2 Vo-Thanh N. CS00.4 Vradi E. CS09. Wadsworth I. E05.4 Waernbaum I. TCS024. Wajda J. P.49 Waldhoer T. P.50 Waldmann E. CS0.5 Walton M. CS3.2 Wandel S. E05. TCS052.2 Wanderer J. TCS028. Wang D. CS07. Wang J. I09/TCS040.2 Wang S. TCS047. Wang W. TCS032.2 Wang Y.-F. P.08 Warne C. TCS052.4 Wason J. CS3.2 CS7.4 TCS059.4 Wassmer G. CS24.4 TCS059.5 Weber D. YSS. Weber F. AS.2 Weeks H. P.42 Wegmann F. TCS042.3 Wegscheider K. P.33 Wei J. P.59 Weinhold L. CS0. Weir C. CS3.2 Welchowski T. CS0.2 Wellman R. I7.3 Wen C.-C. P.54 Wherton D. TCS02.3 White S.R. E05.2 Wienke A. TCS070. Wiesenfarth M. CS3.3 TCS02.5 Wilczynski S. YSS.4 Williams E. TCS009. Williams J. P.03 Williamson S.F. CS8.5 Willke R. TCS0.2 Winterfeld U. CS06. Witt O. CS3.3 Witt R. CS3.3 Witte J. I9.3 Wittek P. TCS05. Wittkowski K. TCS065.5 Wockner L.F. I9.2 Woehling H. E09.2 Wolfsegger M.J. CS5.2 Wong W.K. TCS028.4 Wood A. CS06.3 Woolliams J. P.03 Wright D. TCS033.2 Wright M.N. CS0.3 Wu J. TCS066.2 Wu Y.-J. P.07 Xia Q. TCS04.3 Xie M.-G. CS03. Xu D. CS4.2 Xu Y. TCS004.2 TCS029.4 Yada S. CS.2 Yang B. I8/TCS046.2 Yang H. TCS049. Yao H. P.23 Yau L. TCS070.3 Yu O. I7.3 Yuan A. TCS047. Yuan Y. P.23 TCS048.3 Zaccaria C. TCS07.3 Zafiropoulos N. P.46 Zaman Q. P.7 Zapf A. CS0.4 I7.2 Zarnecki A. TCS022.4 Zehetmayer S. I0.2 Zempléni A. CS0.5 Zhang B. TCS047.3 Zhang L. I8/TCS046.2 Zhigljavsky A. TCS05.3 Zhou Y. I8/TCS046.2 TCS047. Zimmermann G. TCS002.4 Zink R. TCS032. Zohar S. TCS05.3 Zöller D. I9.2 Zucknick M. CS. CS23. TCS07.2 YSS. Zukowski K. TCS022.4 Zyprych-Walczak J. I2. 33

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

More information

EBM Cheat Sheet- Measurements Card

EBM Cheat Sheet- Measurements Card EBM Cheat Sheet- Measurements Card Basic terms: Prevalence = Number of existing cases of disease at a point in time / Total population. Notes: Numerator includes old and new cases Prevalence is cross-sectional

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Population Selection with a Shortterm Endpoint: Problems and Solutions

Population Selection with a Shortterm Endpoint: Problems and Solutions Population Selection with a Shortterm Endpoint: Problems and Solutions Gernot Wassmer, PhD Institute for Medical Statistics University of Cologne, Germany BIMIT Workshop Bremen, 20. November 2014 Introduction

More information

TUTORIAL on ICH E9 and Other Statistical Regulatory Guidance. Session 1: ICH E9 and E10. PSI Conference, May 2011

TUTORIAL on ICH E9 and Other Statistical Regulatory Guidance. Session 1: ICH E9 and E10. PSI Conference, May 2011 TUTORIAL on ICH E9 and Other Statistical Regulatory Guidance Session 1: PSI Conference, May 2011 Kerry Gordon, Quintiles 1 E9, and how to locate it 2 ICH E9 Statistical Principles for Clinical Trials (Issued

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department of Mathematics and Statistics Degree Level Expectations, Learning Outcomes, Indicators of

More information

10:20 11:30 Roundtable discussion: the role of biostatistics in advancing medicine and improving health; career as a biostatistician Monday

10:20 11:30 Roundtable discussion: the role of biostatistics in advancing medicine and improving health; career as a biostatistician Monday SUMMER INSTITUTE FOR TRAINING IN BIOSTATISTICS (SIBS) Week 1: Epidemiology, Data, and Statistical Software 10:00 10:15 Welcome and Introduction 10:20 11:30 Roundtable discussion: the role of biostatistics

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

More information

Transferability of Economic Evaluations in Clinical Trials

Transferability of Economic Evaluations in Clinical Trials Transferability of Economic Evaluations in Clinical Trials Henry Glick Institutt for helseledelse og helseøkonomi November 25, 2008 The Problem Multicenter and multinational studies are the norm for the

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Methods for Meta-analysis in Medical Research

Methods for Meta-analysis in Medical Research Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University

More information

The Product Review Life Cycle A Brief Overview

The Product Review Life Cycle A Brief Overview Stat & Quant Mthds Pharm Reg (Spring 2, 2014) Lecture 2,Week 1 1 The review process developed over a 40 year period and has been influenced by 5 Prescription User Fee Act renewals Time frames for review

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

An Alternative Route to Performance Hypothesis Testing

An Alternative Route to Performance Hypothesis Testing EDHEC-Risk Institute 393-400 promenade des Anglais 06202 Nice Cedex 3 Tel.: +33 (0)4 93 18 32 53 E-mail: research@edhec-risk.com Web: www.edhec-risk.com An Alternative Route to Performance Hypothesis Testing

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Effect estimation versus hypothesis testing

Effect estimation versus hypothesis testing Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Effect estimation versus hypothesis testing PD Dr. C. Schindler Swiss Tropical and Public Health Institute

More information

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation. MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise

More information

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial In this white paper, we will explore the consequences of missing data in the ATLAS ACS 2-TIMI 51 Trial and consider if an alternative approach

More information

Course Course Name # Summer Courses DCS Clinical Research 5103 Questions & Methods CORE. Credit Hours. Course Description

Course Course Name # Summer Courses DCS Clinical Research 5103 Questions & Methods CORE. Credit Hours. Course Description Course Course Name # Summer Courses Clinical Research 5103 Questions & Methods Credit Hours Course Description 1 Defining and developing a research question; distinguishing between correlative and mechanistic

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

SMALL CLINICAL TRIALS:

SMALL CLINICAL TRIALS: I N S T I T U T E O F M E D I C I N E Shaping the Future for Health SMALL CLINICAL TRIALS: ISSUES AND CHALLENGES Clinical trials have a long history of well established, documented, and validated methods

More information

Introduction. Ethical Issues in the Statistical Analysis of Clinical Research Data. Introduction. Introduction. Statistical Quality or Ethical Degree

Introduction. Ethical Issues in the Statistical Analysis of Clinical Research Data. Introduction. Introduction. Statistical Quality or Ethical Degree Ethical Issues in the Statistical Analysis of Clinical Research Data Roger J. Lewis, MD, PhD Department of Emergency Medicine Harbor-UCLA Medical Center Torrance, California Introduction Types of data-centered

More information

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

More information

Bayesian probability theory

Bayesian probability theory Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Module 223 Major A: Concepts, methods and design in Epidemiology

Module 223 Major A: Concepts, methods and design in Epidemiology Module 223 Major A: Concepts, methods and design in Epidemiology Module : 223 UE coordinator Concepts, methods and design in Epidemiology Dates December 15 th to 19 th, 2014 Credits/ECTS UE description

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

project funding and funding that is sufficient to cover the true cost of trials

project funding and funding that is sufficient to cover the true cost of trials Impact of clinical trials on health and Australia s scientific outputs Clinical trials are a vital link in the chain between new discoveries related to human biology and the actual delivery of good health.

More information

Curriculum for to the PhD Program in Pharmacy Administration

Curriculum for to the PhD Program in Pharmacy Administration Curriculum for to the PhD Program in Pharmacy Administration Course Hours Course Title BSE 5113 3 Principles of Epidemiology BSE 5163 3 Biostatistics Methods I* BSE 5173 3 Biostatistics Methods II* BSE

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse

More information

Missing data in randomized controlled trials (RCTs) can

Missing data in randomized controlled trials (RCTs) can EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

More information

Bayesian Phase I/II clinical trials in Oncology

Bayesian Phase I/II clinical trials in Oncology Bayesian Phase I/II clinical trials in Oncology Pierre Mancini, Sandrine Micallef, Pierre Colin Séminaire JEM-SFES - 26 Janvier 2012 Outline Oncology phase I trials Limitations of traditional phase I designs

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Master of Science in Marketing Analytics (MSMA)

Master of Science in Marketing Analytics (MSMA) Master of Science in Marketing Analytics (MSMA) COURSE DESCRIPTION The Master of Science in Marketing Analytics program teaches students how to become more engaged with consumers, how to design and deliver

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

School of Public Health and Health Services Department of Epidemiology and Biostatistics

School of Public Health and Health Services Department of Epidemiology and Biostatistics School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Curriculum - Doctor of Philosophy

Curriculum - Doctor of Philosophy Curriculum - Doctor of Philosophy CORE COURSES Pharm 545-546.Pharmacoeconomics, Healthcare Systems Review. (3, 3) Exploration of the cultural foundations of pharmacy. Development of the present state of

More information

Bayesian Statistical Analysis in Medical Research

Bayesian Statistical Analysis in Medical Research Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology)

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology) 452 REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology) (See also General Regulations) M.57 Admission requirements To be eligible for admission to the courses

More information

DEVELOPING AN ANALYTICAL PLAN

DEVELOPING AN ANALYTICAL PLAN The Fundamentals of International Clinical Research Workshop February 2004 DEVELOPING AN ANALYTICAL PLAN Mario Chen, PhD. Family Health International 1 The Analysis Plan for a Study Summary Analysis Plan:

More information

200627 - AC - Clinical Trials

200627 - AC - Clinical Trials Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2014 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research MASTER'S DEGREE

More information

GRADUATE RESEARCH PROGRAMMES (MSc and PhD) AY2015/2016 MODULE DESCRIPTION

GRADUATE RESEARCH PROGRAMMES (MSc and PhD) AY2015/2016 MODULE DESCRIPTION GRADUATE RESEARCH PROGRAMMES (MSc and PhD) AY2015/2016 MODULE DESCRIPTION CORE MODULES CO5102 Principles of Epidemiology (for both MSc and PhD) This module introduces students to the tools for describing

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics

More information

PharmaSUG 2013 - Paper IB05

PharmaSUG 2013 - Paper IB05 PharmaSUG 2013 - Paper IB05 The Value of an Advanced Degree in Statistics as a Clinical Statistical SAS Programmer Mark Matthews, inventiv Health Clinical, Indianapolis, IN Ying (Evelyn) Guo, PAREXEL International,

More information

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc. An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

More information

TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

More information

Master of Mathematical Finance: Course Descriptions

Master of Mathematical Finance: Course Descriptions Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology)

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology) 463 REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology) (See also General Regulations) M.57 Admission requirements To be eligible for admission to the courses

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

2019 Healthcare That Works for All

2019 Healthcare That Works for All 2019 Healthcare That Works for All This paper is one of a series describing what a decade of successful change in healthcare could look like in 2019. Each paper focuses on one aspect of healthcare. To

More information

Bayesian Adaptive Methods for Clinical Trials

Bayesian Adaptive Methods for Clinical Trials CENTER FOR QUANTITATIVE METHODS Department of Biostatistics COURSE: Bayesian Adaptive Methods for Clinical Trials Bradley P. Carlin (Division of Biostatistics, University of Minnesota) and Laura A. Hatfield

More information

How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

More information

Using Criteria to Appraise a Meta-analyses

Using Criteria to Appraise a Meta-analyses Using Criteria to Appraise a Meta-analyses Paul Cronin B.A., M.B. B.Ch. B.A.O., M.S., M.R.C.P.I.,.F.R.C.R. Department of Radiology, Division of Cardiothoracic Radiology, University of Michigan, Ann Arbor,

More information

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government

More information

Data Analysis, Research Study Design and the IRB

Data Analysis, Research Study Design and the IRB Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB

More information

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

More information

Wiener Biometrische Sektion (WBS) der Internationalen Biometrischen Gesellschaft Region Österreich Schweiz (ROeS) http://www.meduniwien.ac.

Wiener Biometrische Sektion (WBS) der Internationalen Biometrischen Gesellschaft Region Österreich Schweiz (ROeS) http://www.meduniwien.ac. Wiener Biometrische Sektion (WBS) der Internationalen Biometrischen Gesellschaft Region Österreich Schweiz (ROeS) http://www.meduniwien.ac.at/wbs/ WBS Sommer Seminar 2014 jointly organized with CeMSIIS-ReUSe

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course Prerequisite: Stat 3201 (Introduction to Probability for Data Analytics) Exclusions: Class distribution:

More information

Adaptive designs for time-to-event trials

Adaptive designs for time-to-event trials Adaptive designs for time-to-event trials Dominic Magirr 1, Thomas Jaki 1, Franz König 2 and Martin Posch 2 1 Department of Mathematics and Statistics, Lancaster University 2 Institut für Medizinische

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form. General Remarks This template of a data extraction form is intended to help you to start developing your own data extraction form, it certainly has to be adapted to your specific question. Delete unnecessary

More information

THE LINCOLN INSTITUTE OF HEALTH

THE LINCOLN INSTITUTE OF HEALTH THE LINCOLN INSTITUTE OF HEALTH Background The Chair in the Care of the Older Person will be part of the new Lincoln Institute of Health, a cross disciplinary research collaboration linking schools, colleges

More information

University of Hawai i Human Studies Program. Guidelines for Developing a Clinical Research Protocol

University of Hawai i Human Studies Program. Guidelines for Developing a Clinical Research Protocol University of Hawai i Human Studies Program Guidelines for Developing a Clinical Research Protocol Following are guidelines for writing a clinical research protocol for submission to the University of

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

Abstract Title Page. Authors and Affiliations: Avi Feller Harvard University Department of Statistics

Abstract Title Page. Authors and Affiliations: Avi Feller Harvard University Department of Statistics Abstract Title Page Title: Compared to what? Estimating Causal Effects for Latent Subgroups to Understand Variation in the Impacts of Head Start by Alternate Child Care Setting Authors and Affiliations:

More information

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE STATISTICAL PRINCIPLES FOR CLINICAL TRIALS E9 Current

More information

University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies

University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies Semester/Year of Graduation University of Maryland School of Medicine Master of Public Health Program Evaluation of Public Health Competencies Students graduating with an MPH degree, and planning to work

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Programme du parcours Clinical Epidemiology 2014-2015. UMR 1. Methods in therapeutic evaluation A Dechartres/A Flahault

Programme du parcours Clinical Epidemiology 2014-2015. UMR 1. Methods in therapeutic evaluation A Dechartres/A Flahault Programme du parcours Clinical Epidemiology 2014-2015 UR 1. ethods in therapeutic evaluation A /A Date cours Horaires 15/10/2014 14-17h General principal of therapeutic evaluation (1) 22/10/2014 14-17h

More information

Clinical Study Design and Methods Terminology

Clinical Study Design and Methods Terminology Home College of Veterinary Medicine Washington State University WSU Faculty &Staff Page Page 1 of 5 John Gay, DVM PhD DACVPM AAHP FDIU VCS Clinical Epidemiology & Evidence-Based Medicine Glossary: Clinical

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

The Role and Importance of Clinical Trial Registries & Results Databases. Deborah A. Zarin, M.D. Director, ClinicalTrials.gov.

The Role and Importance of Clinical Trial Registries & Results Databases. Deborah A. Zarin, M.D. Director, ClinicalTrials.gov. The Role and Importance of Clinical Trial Registries & Results Databases Deborah A. Zarin, M.D. Director, ClinicalTrials.gov February 2015 Outline Background Current Policies About ClinicalTrials.gov Registering

More information

University of Michigan Dearborn Graduate Psychology Assessment Program

University of Michigan Dearborn Graduate Psychology Assessment Program University of Michigan Dearborn Graduate Psychology Assessment Program Graduate Clinical Health Psychology Program Goals 1 Psychotherapy Skills Acquisition: To train students in the skills and knowledge

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Guidance for Industry

Guidance for Industry Guidance for Industry E9 Statistical Principles for Clinical Trials U.S. Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research (CDER) Center for Biologics

More information

Introduction to method validation

Introduction to method validation Introduction to method validation Introduction to method validation What is method validation? Method validation provides documented objective evidence that a method measures what it is intended to measure,

More information