Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation
|
|
|
- Loren Nicholson
- 9 years ago
- Views:
Transcription
1 Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation christina/googleproposal.pdf Christina Knudson Bio I m a doctoral candidate at the University of Minnesota s School of Statistics. I am ABD and about a year away from graduating. I graduated cum laude from Carleton college with a BA in Mathematics. I was born and raised in Decorah, Iowa, which is one of the top 20 small towns in America (according to Smithsonian Magazine). I started coding in the spring of 2007, first in Python and then Java. I started using R during the summer of 2007 at the Summer Institute for Training in Biostatistics, then I continued programming with R during my summer internship at the National Institutes of Health in Most of my graduate coursework has been in R, and I have taught undergraduate classes in R at the University of Minnesota as well. Part of my work for my doctoral thesis is an R package that fits Generalized Linear Mixed Models (GLMMs) using Monte Carlo Likelihood Approximation (MCLA). I have written part of my package already and I plan to expand and generalize it this summer. Contact Information Student name: Christina Knudson Link id: cknudson05 Student postal address: 1901 Minnehaha Ave, Apt 317, Minneapolis MN, Telephone: s: [email protected], [email protected], [email protected] Website: christina/ Student Affiliation Institution: University of Minnesota Program: Statistics 1
2 Stage of completion: Early 2015 Contact to verify: or Advisors: Charles Geyer and Galin Jones Schedule Conflicts During August 3 through 7, I will attend the Joint Statistical Meetings to present my R package. Mentors Mentor names: Charles Geyer and Galin Jones Mentor s: [email protected] and [email protected] Mentor link ids: cjgeyer I have been in touch with my mentors about my package. I meet with each of them at least weekly, and sometimes I talk to Charlie several times per week. Background GLMMs are popular in many fields from ecology to economics. The popularity of GLMMs is apparent through a Google search, which yields 242,000 results. The challenge for researchers is finding an easy-to-implement and reliable method for fitting and testing GLMMs. For very simple problems with just a few random effects, the likelihood can be approximated by numerical integration. Most models have crossed random effects, which numerical integration cannot handle. Thus, a commonly used method is penalized quasi-likelihood (PQL), which is implemented in packages such as lme4, nlme, and MASS. However, PQL relies on approximations of unknown accuracy to approximate the likelihood and suffers from problematic inferential properties, such as parameter estimates that tend to be too low (McCulloch and Searle, 2001). Since the likelihood is approximated to an unknown accuracy by the quasi-likelihood, any inference performed on the approximated likelihood will also produce results with an unknown level of accuracy. Without bootstrapping, a PQL user cannot know how valid their confidence intervals or likelihood ratio test results are. The popularity of PQL despite its inadequacies shows that there is a high demand for tools to fit GLMMs. 2
3 Monte Carlo Likelihood Approximation (MCLA) is another tool for fitting GLMMs. This method approximates the likelihood either through Markov Chain Monte Carlo (MCMC) or Ordinary Monte Carlo (OMC), and the resulting likelihood approximation is used to fit and test GLMMs (Geyer and Thompson, 1992). Because MCLA approximates the entire likelihood, any type of likelihood-based inference can be performed. Inference such as maximum likelihood or likelihood-ratio testing is standard for many simpler models, but MCLA is the only method that can perform these techniques for GLMMs. Moreover, MCLA is supported by a rigorous theoretical foundation supplied by Geyer (1994) and Sung and Geyer (2007). Despite MCLA s solid theoretical underpinnings, it is not yet a widely-used technique. MCLA via MCMC is too difficult for most users because they do not know when the Markov chain has run long enough to produce reliable answers. Sung and Geyer s (2007) version of MCMLA via OMC is more user-friendly but is limited to smaller problems. My current work performs MCLA via OMC with an improved importance sampling distribution. Rather than selecting an importance sampling distribution independently of the data, my package uses an importance sampling distribution that is similar to the true distribution of the random effects. The importance sampling distribution is specified based on the data. With this importance sampling distribution, my package performs MCLA for GLMMs with a Poisson or Bernoulli response using the canonical link. The package assumes the random effects are independently drawn from a normal distribution with mean 0 and unknown variances. There can be any number of fixed or random effects. The package is in the testing stage and is nearing completion for the setting described earlier in this paragraph. This package is part of my doctoral thesis in statistics, which I am earning at the University of Minnesota with Professors Charles Geyer and Galin Jones as my co-advisors. Goals and objectives for Google Summer of Code My goals are (1) to rewrite sections of my package in C to improve its speed, (2) write functions to perform likelihood ratio tests for comparing nested models, (3) write additional functions to fit models with correlated random effects. 3
4 Details I consider my goals separately, since completion of one goal does not rely on completion of the other goals. (1) Two steps stand out in my package as time-consuming: the step that decides the parameters for the importance sampling distribution and the step that maximizes the likelihood approximation. Thus, I will need to rewrite these two functions in C. The main obstacle here will be coding in C, with which I do not have extensive experience. Because I have written functioning R code that performs these steps, I will be able to compare the R results to the C results to verify my functions are correct. I have been working with a couple of data sets, including the benchmark Booth and Hobert (1999) data set with known maximum likelihood estimates, so I will be able to test my code on these data sets. Because I can rewrite one function in C without affecting the other functions, I should be able to write these functions in either order. I think it would be better to rewrite the step that maximizes the likelihood approximation first, since that function is more computationally-intensive and is also more important. The function that chooses the parameter values of the importance sampling distribution can be rewritten in C second because it is less timeconsuming than the other function as it is. The equations for these functions are detailed in my design document, which is on my website at knud0158/ designdoc.pdf. (2) Hypothesis testing for nested models can be split into three cases: the nested models differ in their fixed effects but have the same variance components, the nested models differ by one variance component and possibly some fixed effects, the nested models differ by two or more variance components and possibly by some fixed effects. I have worked out the details for calculating the test statistics and p-values for the first two cases in my design document at knud0158/designdoc.pdf. Coding the last case will take longer because I will need to determine the test statistic and its sampling distribution. Part of the challenge will be writing a function so that, given two models, the code will know which test statistics and pvalues to calculate and report. My advisor Charlie has written a function in his 4
5 aster package that also does model comparison, so I will look to that for guidance. I will be able to test my code on the Coull and Agresti (2000) flu data set by modeling the log odds of catching the flu over four years. The model will have a few variance components that I will be able to test: a variance component for a subject-specific random effect, another for a year-to-year random effect, and another for the decreased chance of getting flu when a strain of flu virus reappears in a later year. (3) The covariance matrix for the random effects in my currently working code is diagonal, meaning the random effects are independently drawn based on one of possibly many variance components. I would like to generalize the covariance matrix in order to fit models with locationdependence. This generality would make my R package more usable and practical. To fit these types of models, I would like to code an additional variance structures with exponential decay based on the distance between observations. I have not written the details of how I will execute these changes into my design document yet, but I have written my current package with these future changes in mind. I will test my code on the Caffo et al. (2005) automobile theft data set by modeling the number of cars stolen in a Baltimore neighborhood based on the distance to sites of other car thefts. I will compare my results to the results achieved through Monte Carlo EM. Proposed Timeline May 19 to May 26: look at Charlie s aster package code for model comparison. Design and write code for my own package to determine how the two models differ. May 26 to June 2: write the hypothesis testing code for the first two cases detailed earlier. June 2 to June 15: test and correct the hypothesis testing code. June 16: Complete documentation for hypothesis testing function and submit updated R package to CRAN. June 16 to June 30: write C function that maximizes the likelihood approximation. June 30 to July 7: test the newly-written C function and compare it to my R results. 5
6 July 7: submit updated R package to CRAN. July 7 to July 14: write C function that selects the importance sampling distribution. July 14 to July 21: test the newly-written C function and compare it to my R results. July 21: submit updated R package to CRAN. July 21 to July 28: write function for new variance structure and incorporate into package. July 28 to August 2: test the new variance structure and compare results to those reported by Caffo et al. (2005). August 8 to August 11: document the new variance structure. August 11: submit final version of fully-updated R package to CRAN. I expect to complete all work by August 11. If something starts to take longer than predicted, then I may need to postpone the new variance structure to the fall, since the first two goals are more important. References Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B, 61: Caffo, B., Jenk, W., and Jones, G. (2005). Ascent-based monte carlo em. Journal of Royal Statistical Society, Series B, 67: Coull, B. and Agresti, A. (2000). Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics, 56: Geyer, C. J. (1994). On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B, 61:
7 Geyer, C. J. and Thompson, E. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B, 54: McCulloch, C. and Searle, S. (2001). Generalized, Linear, and Mixed Models. John Wiley and Sons, New York. Sung, Y. J. and Geyer, C. J. (2007). Monte Carlo likelihood inference for missing data models. Annals of Statistics, 35:
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Statistics in Applications III. Distribution Theory and Inference
2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied
Program description for the Master s Degree Program in Mathematics and Finance
Program description for the Master s Degree Program in Mathematics and Finance : English: Master s Degree in Mathematics and Finance Norwegian, bokmål: Master i matematikk og finans Norwegian, nynorsk:
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009
Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths
Quantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
Statistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
Rouch, Jean. Cine-Ethnography. Minneapolis, MN, USA: University of Minnesota Press, 2003. p 238
Minneapolis, MN, USA: University of Minnesota Press, 2003. p 238 http://site.ebrary.com/lib/uchicago/doc?id=10151154&ppg=238 Minneapolis, MN, USA: University of Minnesota Press, 2003. p 239 http://site.ebrary.com/lib/uchicago/doc?id=10151154&ppg=239
GLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - [email protected] Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Simple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
Ph.D. Biostatistics 2014-2015 Note: All curriculum revisions will be updated immediately on the website http://www.publichealth.gwu.
Columbian College of Arts and Sciences and Milken Institute School of Public Health Ph.D. Biostatistics 2014-2015 Note: All curriculum revisions will be updated immediately on the website http://www.publichealth.gwu.edu
Confidence Intervals for Spearman s Rank Correlation
Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence
Master programme in Statistics
Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster
LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process
LECTURE 16 Readings: Section 5.1 Lecture outline Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process Number of successes Distribution of interarrival times The
Graduate Course Offerings in Transportation Engineering at Villanova University
Graduate Course Offerings in Transportation Engineering at Villanova University Civil and Environmental Engineering Department 800 Lancaster Ave. Villanova, PA 19085 Dr. Leslie McCarthy, PE [email protected],
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning.
PROBABILITY AND STATISTICS Ma 527 Course Description Prefaced by a study of the foundations of probability and statistics, this course is an extension of the elements of probability and statistics introduced
Use of deviance statistics for comparing models
A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter
There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month
Total Credits: 30 credits are required for master s program graduates and 51 credits for undergraduate program.
Middle East Technical University Graduate School of Social Sciences Doctor of Philosophy in Business Administration In the Field of Accounting-Finance Aims: The aim of Doctor of Philosphy in Business Administration
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Orthogonal Distance Regression
Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,
RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/2015 16:33:48. QMS 102 Course ID 000923
RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 QMS 102 Course ID 000923 Business Statistics I Business Statistics I This course consists of an introduction to business statistics including methods of
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
A study on the bi-aspect procedure with location and scale parameters
통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a
Laura F. Boehm Vock. Voice: (715) 308-2405. Email: [email protected] Website: pages.stolaf.edu/boehm/
Laura F. Boehm Vock Home Address 700 Douglas Avenue Apartment 703 Minneapolis, MN 55403 Voice: (715) 308-2405 University Address Department of Mathematics, Statistics and Computer Science St. Olaf College
How To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
Graduate Certificate in Systems Engineering
Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)
COURSE DESCRIPTION Title Code Level Semester Credits 3 Prerequisites Post requisites Introduction to Statistics ECON1005 (EC160) I I None Economic Statistics (ECON2006), Statistics and Research Design
Dealing with large datasets
Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor
STAT 360 Probability and Statistics. Fall 2012
STAT 360 Probability and Statistics Fall 2012 1) General information: Crosslisted course offered as STAT 360, MATH 360 Semester: Fall 2012, Aug 20--Dec 07 Course name: Probability and Statistics Number
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Factorial Invariance in Student Ratings of Instruction
Factorial Invariance in Student Ratings of Instruction Isaac I. Bejar Educational Testing Service Kenneth O. Doyle University of Minnesota The factorial invariance of student ratings of instruction across
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Curriculum - Doctor of Philosophy
Curriculum - Doctor of Philosophy CORE COURSES Pharm 545-546.Pharmacoeconomics, Healthcare Systems Review. (3, 3) Exploration of the cultural foundations of pharmacy. Development of the present state of
List of Ph.D. Courses
Research Methods Courses (5 courses/15 hours) List of Ph.D. Courses The research methods set consists of five courses (15 hours) that discuss the process of research and key methodological issues encountered
APPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels
Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Discipline Degree Learning Objectives Accounting 1. Students graduating with a in Accounting should be able to understand
2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses
: Overview of Statistical and Path Modeling Analyses Prof. Dr. Christian M. Ringle (Hamburg Univ. of Tech., TUHH) Prof. Dr. Jӧrg Henseler (University of Twente) Dr. Geoffrey Hubona (The Georgia R School)
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is
The Applied and Computational Mathematics Program at The Johns Hopkins University James C. Spall The Applied and Computational Mathematics Program emphasizes mathematical and computational techniques of
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series
Brochure More information from http://www.researchandmarkets.com/reports/2220051/ Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series Description: Written by leading
An introduction to Value-at-Risk Learning Curve September 2003
An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk
171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Model Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
School of Public Health and Health Services Department of Epidemiology and Biostatistics
School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated
SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION
STATISTICS IN MEDICINE, VOL. 8, 795-802 (1989) SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION F. Y. HSIEH* Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, Bronx, N Y 10461,
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
M.Ed. in Educational Psychology: Research, Statistics, and Evaluation
M.Ed. in Educational Psychology: Research, Statistics, and Evaluation The M.Ed. program in Research, Statistics, and Evaluation prepares students for advanced graduate study in educational research or
START Selected Topics in Assurance
START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson
Likelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
Confidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
QUALITY ENGINEERING PROGRAM
QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and
Imputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data
Analysis of Financial Time Series
Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed
Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03
Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03 Calcutta Ver 1.0 Contents Broad Contours Who Should Attend Unique Features of Program Program Modules Detailed
Masters in Financial Economics (MFE)
Masters in Financial Economics (MFE) Admission Requirements Candidates must submit the following to the Office of Admissions and Registration: 1. Official Transcripts of previous academic record 2. Two
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
Bayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING 200 2 ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff
Master of Arts in Mathematics
Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
R2MLwiN Using the multilevel modelling software package MLwiN from R
Using the multilevel modelling software package MLwiN from R Richard Parker Zhengzheng Zhang Chris Charlton George Leckie Bill Browne Centre for Multilevel Modelling (CMM) University of Bristol Using the
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Likelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Sensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
Proposal for Undergraduate Certificate in Large Data Analysis
Proposal for Undergraduate Certificate in Large Data Analysis To: Helena Dettmer, Associate Dean for Undergraduate Programs and Curriculum From: Suely Oliveira (Computer Science), Kate Cowles (Statistics),
