Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation

Size: px
Start display at page:

Download "Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation"

Transcription

1 Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation christina/googleproposal.pdf Christina Knudson Bio I m a doctoral candidate at the University of Minnesota s School of Statistics. I am ABD and about a year away from graduating. I graduated cum laude from Carleton college with a BA in Mathematics. I was born and raised in Decorah, Iowa, which is one of the top 20 small towns in America (according to Smithsonian Magazine). I started coding in the spring of 2007, first in Python and then Java. I started using R during the summer of 2007 at the Summer Institute for Training in Biostatistics, then I continued programming with R during my summer internship at the National Institutes of Health in Most of my graduate coursework has been in R, and I have taught undergraduate classes in R at the University of Minnesota as well. Part of my work for my doctoral thesis is an R package that fits Generalized Linear Mixed Models (GLMMs) using Monte Carlo Likelihood Approximation (MCLA). I have written part of my package already and I plan to expand and generalize it this summer. Contact Information Student name: Christina Knudson Link id: cknudson05 Student postal address: 1901 Minnehaha Ave, Apt 317, Minneapolis MN, Telephone: s: [email protected], [email protected], [email protected] Website: christina/ Student Affiliation Institution: University of Minnesota Program: Statistics 1

2 Stage of completion: Early 2015 Contact to verify: or Advisors: Charles Geyer and Galin Jones Schedule Conflicts During August 3 through 7, I will attend the Joint Statistical Meetings to present my R package. Mentors Mentor names: Charles Geyer and Galin Jones Mentor s: [email protected] and [email protected] Mentor link ids: cjgeyer I have been in touch with my mentors about my package. I meet with each of them at least weekly, and sometimes I talk to Charlie several times per week. Background GLMMs are popular in many fields from ecology to economics. The popularity of GLMMs is apparent through a Google search, which yields 242,000 results. The challenge for researchers is finding an easy-to-implement and reliable method for fitting and testing GLMMs. For very simple problems with just a few random effects, the likelihood can be approximated by numerical integration. Most models have crossed random effects, which numerical integration cannot handle. Thus, a commonly used method is penalized quasi-likelihood (PQL), which is implemented in packages such as lme4, nlme, and MASS. However, PQL relies on approximations of unknown accuracy to approximate the likelihood and suffers from problematic inferential properties, such as parameter estimates that tend to be too low (McCulloch and Searle, 2001). Since the likelihood is approximated to an unknown accuracy by the quasi-likelihood, any inference performed on the approximated likelihood will also produce results with an unknown level of accuracy. Without bootstrapping, a PQL user cannot know how valid their confidence intervals or likelihood ratio test results are. The popularity of PQL despite its inadequacies shows that there is a high demand for tools to fit GLMMs. 2

3 Monte Carlo Likelihood Approximation (MCLA) is another tool for fitting GLMMs. This method approximates the likelihood either through Markov Chain Monte Carlo (MCMC) or Ordinary Monte Carlo (OMC), and the resulting likelihood approximation is used to fit and test GLMMs (Geyer and Thompson, 1992). Because MCLA approximates the entire likelihood, any type of likelihood-based inference can be performed. Inference such as maximum likelihood or likelihood-ratio testing is standard for many simpler models, but MCLA is the only method that can perform these techniques for GLMMs. Moreover, MCLA is supported by a rigorous theoretical foundation supplied by Geyer (1994) and Sung and Geyer (2007). Despite MCLA s solid theoretical underpinnings, it is not yet a widely-used technique. MCLA via MCMC is too difficult for most users because they do not know when the Markov chain has run long enough to produce reliable answers. Sung and Geyer s (2007) version of MCMLA via OMC is more user-friendly but is limited to smaller problems. My current work performs MCLA via OMC with an improved importance sampling distribution. Rather than selecting an importance sampling distribution independently of the data, my package uses an importance sampling distribution that is similar to the true distribution of the random effects. The importance sampling distribution is specified based on the data. With this importance sampling distribution, my package performs MCLA for GLMMs with a Poisson or Bernoulli response using the canonical link. The package assumes the random effects are independently drawn from a normal distribution with mean 0 and unknown variances. There can be any number of fixed or random effects. The package is in the testing stage and is nearing completion for the setting described earlier in this paragraph. This package is part of my doctoral thesis in statistics, which I am earning at the University of Minnesota with Professors Charles Geyer and Galin Jones as my co-advisors. Goals and objectives for Google Summer of Code My goals are (1) to rewrite sections of my package in C to improve its speed, (2) write functions to perform likelihood ratio tests for comparing nested models, (3) write additional functions to fit models with correlated random effects. 3

4 Details I consider my goals separately, since completion of one goal does not rely on completion of the other goals. (1) Two steps stand out in my package as time-consuming: the step that decides the parameters for the importance sampling distribution and the step that maximizes the likelihood approximation. Thus, I will need to rewrite these two functions in C. The main obstacle here will be coding in C, with which I do not have extensive experience. Because I have written functioning R code that performs these steps, I will be able to compare the R results to the C results to verify my functions are correct. I have been working with a couple of data sets, including the benchmark Booth and Hobert (1999) data set with known maximum likelihood estimates, so I will be able to test my code on these data sets. Because I can rewrite one function in C without affecting the other functions, I should be able to write these functions in either order. I think it would be better to rewrite the step that maximizes the likelihood approximation first, since that function is more computationally-intensive and is also more important. The function that chooses the parameter values of the importance sampling distribution can be rewritten in C second because it is less timeconsuming than the other function as it is. The equations for these functions are detailed in my design document, which is on my website at knud0158/ designdoc.pdf. (2) Hypothesis testing for nested models can be split into three cases: the nested models differ in their fixed effects but have the same variance components, the nested models differ by one variance component and possibly some fixed effects, the nested models differ by two or more variance components and possibly by some fixed effects. I have worked out the details for calculating the test statistics and p-values for the first two cases in my design document at knud0158/designdoc.pdf. Coding the last case will take longer because I will need to determine the test statistic and its sampling distribution. Part of the challenge will be writing a function so that, given two models, the code will know which test statistics and pvalues to calculate and report. My advisor Charlie has written a function in his 4

5 aster package that also does model comparison, so I will look to that for guidance. I will be able to test my code on the Coull and Agresti (2000) flu data set by modeling the log odds of catching the flu over four years. The model will have a few variance components that I will be able to test: a variance component for a subject-specific random effect, another for a year-to-year random effect, and another for the decreased chance of getting flu when a strain of flu virus reappears in a later year. (3) The covariance matrix for the random effects in my currently working code is diagonal, meaning the random effects are independently drawn based on one of possibly many variance components. I would like to generalize the covariance matrix in order to fit models with locationdependence. This generality would make my R package more usable and practical. To fit these types of models, I would like to code an additional variance structures with exponential decay based on the distance between observations. I have not written the details of how I will execute these changes into my design document yet, but I have written my current package with these future changes in mind. I will test my code on the Caffo et al. (2005) automobile theft data set by modeling the number of cars stolen in a Baltimore neighborhood based on the distance to sites of other car thefts. I will compare my results to the results achieved through Monte Carlo EM. Proposed Timeline May 19 to May 26: look at Charlie s aster package code for model comparison. Design and write code for my own package to determine how the two models differ. May 26 to June 2: write the hypothesis testing code for the first two cases detailed earlier. June 2 to June 15: test and correct the hypothesis testing code. June 16: Complete documentation for hypothesis testing function and submit updated R package to CRAN. June 16 to June 30: write C function that maximizes the likelihood approximation. June 30 to July 7: test the newly-written C function and compare it to my R results. 5

6 July 7: submit updated R package to CRAN. July 7 to July 14: write C function that selects the importance sampling distribution. July 14 to July 21: test the newly-written C function and compare it to my R results. July 21: submit updated R package to CRAN. July 21 to July 28: write function for new variance structure and incorporate into package. July 28 to August 2: test the new variance structure and compare results to those reported by Caffo et al. (2005). August 8 to August 11: document the new variance structure. August 11: submit final version of fully-updated R package to CRAN. I expect to complete all work by August 11. If something starts to take longer than predicted, then I may need to postpone the new variance structure to the fall, since the first two goals are more important. References Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B, 61: Caffo, B., Jenk, W., and Jones, G. (2005). Ascent-based monte carlo em. Journal of Royal Statistical Society, Series B, 67: Coull, B. and Agresti, A. (2000). Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics, 56: Geyer, C. J. (1994). On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B, 61:

7 Geyer, C. J. and Thompson, E. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B, 54: McCulloch, C. and Searle, S. (2001). Generalized, Linear, and Mixed Models. John Wiley and Sons, New York. Sung, Y. J. and Geyer, C. J. (2007). Monte Carlo likelihood inference for missing data models. Annals of Statistics, 35:

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

Program description for the Master s Degree Program in Mathematics and Finance

Program description for the Master s Degree Program in Mathematics and Finance Program description for the Master s Degree Program in Mathematics and Finance : English: Master s Degree in Mathematics and Finance Norwegian, bokmål: Master i matematikk og finans Norwegian, nynorsk:

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

Rouch, Jean. Cine-Ethnography. Minneapolis, MN, USA: University of Minnesota Press, 2003. p 238

Rouch, Jean. Cine-Ethnography. Minneapolis, MN, USA: University of Minnesota Press, 2003. p 238 Minneapolis, MN, USA: University of Minnesota Press, 2003. p 238 http://site.ebrary.com/lib/uchicago/doc?id=10151154&ppg=238 Minneapolis, MN, USA: University of Minnesota Press, 2003. p 239 http://site.ebrary.com/lib/uchicago/doc?id=10151154&ppg=239

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - [email protected] Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Ph.D. Biostatistics 2014-2015 Note: All curriculum revisions will be updated immediately on the website http://www.publichealth.gwu.

Ph.D. Biostatistics 2014-2015 Note: All curriculum revisions will be updated immediately on the website http://www.publichealth.gwu. Columbian College of Arts and Sciences and Milken Institute School of Public Health Ph.D. Biostatistics 2014-2015 Note: All curriculum revisions will be updated immediately on the website http://www.publichealth.gwu.edu

More information

Confidence Intervals for Spearman s Rank Correlation

Confidence Intervals for Spearman s Rank Correlation Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence

More information

Master programme in Statistics

Master programme in Statistics Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster

More information

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process LECTURE 16 Readings: Section 5.1 Lecture outline Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process Number of successes Distribution of interarrival times The

More information

Graduate Course Offerings in Transportation Engineering at Villanova University

Graduate Course Offerings in Transportation Engineering at Villanova University Graduate Course Offerings in Transportation Engineering at Villanova University Civil and Environmental Engineering Department 800 Lancaster Ave. Villanova, PA 19085 Dr. Leslie McCarthy, PE [email protected],

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning.

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning. PROBABILITY AND STATISTICS Ma 527 Course Description Prefaced by a study of the foundations of probability and statistics, this course is an extension of the elements of probability and statistics introduced

More information

Use of deviance statistics for comparing models

Use of deviance statistics for comparing models A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter

More information

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

More information

Total Credits: 30 credits are required for master s program graduates and 51 credits for undergraduate program.

Total Credits: 30 credits are required for master s program graduates and 51 credits for undergraduate program. Middle East Technical University Graduate School of Social Sciences Doctor of Philosophy in Business Administration In the Field of Accounting-Finance Aims: The aim of Doctor of Philosphy in Business Administration

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Orthogonal Distance Regression

Orthogonal Distance Regression Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,

More information

RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/2015 16:33:48. QMS 102 Course ID 000923

RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/2015 16:33:48. QMS 102 Course ID 000923 RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 QMS 102 Course ID 000923 Business Statistics I Business Statistics I This course consists of an introduction to business statistics including methods of

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

A study on the bi-aspect procedure with location and scale parameters

A study on the bi-aspect procedure with location and scale parameters 통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a

More information

Laura F. Boehm Vock. Voice: (715) 308-2405. Email: [email protected] Website: pages.stolaf.edu/boehm/

Laura F. Boehm Vock. Voice: (715) 308-2405. Email: boehm@stolaf.edu Website: pages.stolaf.edu/boehm/ Laura F. Boehm Vock Home Address 700 Douglas Avenue Apartment 703 Minneapolis, MN 55403 Voice: (715) 308-2405 University Address Department of Mathematics, Statistics and Computer Science St. Olaf College

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Graduate Certificate in Systems Engineering

Graduate Certificate in Systems Engineering Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007) COURSE DESCRIPTION Title Code Level Semester Credits 3 Prerequisites Post requisites Introduction to Statistics ECON1005 (EC160) I I None Economic Statistics (ECON2006), Statistics and Research Design

More information

Dealing with large datasets

Dealing with large datasets Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor

More information

STAT 360 Probability and Statistics. Fall 2012

STAT 360 Probability and Statistics. Fall 2012 STAT 360 Probability and Statistics Fall 2012 1) General information: Crosslisted course offered as STAT 360, MATH 360 Semester: Fall 2012, Aug 20--Dec 07 Course name: Probability and Statistics Number

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Factorial Invariance in Student Ratings of Instruction

Factorial Invariance in Student Ratings of Instruction Factorial Invariance in Student Ratings of Instruction Isaac I. Bejar Educational Testing Service Kenneth O. Doyle University of Minnesota The factorial invariance of student ratings of instruction across

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Curriculum - Doctor of Philosophy

Curriculum - Doctor of Philosophy Curriculum - Doctor of Philosophy CORE COURSES Pharm 545-546.Pharmacoeconomics, Healthcare Systems Review. (3, 3) Exploration of the cultural foundations of pharmacy. Development of the present state of

More information

List of Ph.D. Courses

List of Ph.D. Courses Research Methods Courses (5 courses/15 hours) List of Ph.D. Courses The research methods set consists of five courses (15 hours) that discuss the process of research and key methodological issues encountered

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Discipline Degree Learning Objectives Accounting 1. Students graduating with a in Accounting should be able to understand

More information

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses : Overview of Statistical and Path Modeling Analyses Prof. Dr. Christian M. Ringle (Hamburg Univ. of Tech., TUHH) Prof. Dr. Jӧrg Henseler (University of Twente) Dr. Geoffrey Hubona (The Georgia R School)

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is The Applied and Computational Mathematics Program at The Johns Hopkins University James C. Spall The Applied and Computational Mathematics Program emphasizes mathematical and computational techniques of

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series

Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series Brochure More information from http://www.researchandmarkets.com/reports/2220051/ Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series Description: Written by leading

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

Model Selection and Claim Frequency for Workers Compensation Insurance

Model Selection and Claim Frequency for Workers Compensation Insurance Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

School of Public Health and Health Services Department of Epidemiology and Biostatistics

School of Public Health and Health Services Department of Epidemiology and Biostatistics School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated

More information

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

M.Ed. in Educational Psychology: Research, Statistics, and Evaluation

M.Ed. in Educational Psychology: Research, Statistics, and Evaluation M.Ed. in Educational Psychology: Research, Statistics, and Evaluation The M.Ed. program in Research, Statistics, and Evaluation prepares students for advanced graduate study in educational research or

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

More information

Imputing Values to Missing Data

Imputing Values to Missing Data Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

More information

Analysis of Financial Time Series

Analysis of Financial Time Series Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed

More information

Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03

Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03 Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03 Calcutta Ver 1.0 Contents Broad Contours Who Should Attend Unique Features of Program Program Modules Detailed

More information

Masters in Financial Economics (MFE)

Masters in Financial Economics (MFE) Masters in Financial Economics (MFE) Admission Requirements Candidates must submit the following to the Office of Admissions and Registration: 1. Official Transcripts of previous academic record 2. Two

More information

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS

ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING 200 2 ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff

More information

Master of Arts in Mathematics

Master of Arts in Mathematics Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

R2MLwiN Using the multilevel modelling software package MLwiN from R

R2MLwiN Using the multilevel modelling software package MLwiN from R Using the multilevel modelling software package MLwiN from R Richard Parker Zhengzheng Zhang Chris Charlton George Leckie Bill Browne Centre for Multilevel Modelling (CMM) University of Bristol Using the

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Likelihood: Frequentist vs Bayesian Reasoning

Likelihood: Frequentist vs Bayesian Reasoning "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

Proposal for Undergraduate Certificate in Large Data Analysis

Proposal for Undergraduate Certificate in Large Data Analysis Proposal for Undergraduate Certificate in Large Data Analysis To: Helena Dettmer, Associate Dean for Undergraduate Programs and Curriculum From: Suely Oliveira (Computer Science), Kate Cowles (Statistics),

More information