BayesX - Software for Bayesian Inference in Structured Additive Regression
|
|
|
- Clinton Mason
- 10 years ago
- Views:
Transcription
1 BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich joint work with Stefan Lang (University of Innsbruck) & Andreas Brezger (University of Munich)
2 Structured Additive Regression Structured Additive Regression Regression in a general sense: Generalised linear models, Multivariate (categorical) generalised linear models, Regression models for duration times (Cox-type models, multi-state models). Common structure: Model a quantity of interest in terms of categorical and continuous covariates, e.g. E(y u) = h(u γ) (GLM) or λ(t u) = λ 0 (t) exp(u γ) (Cox model) BayesX - Software for Bayesian Inference in Structured Additive Regression 1
3 Structured Additive Regression General idea of structured additive regression: Replace usual parametric predictor with a flexible semiparametric predictor containing Nonparametric effects of time scales and continuous covariates, Spatial effects, Interaction surfaces, Varying coefficient terms (continuous and spatial effect modifiers), Random intercepts and random slopes. BayesX - Software for Bayesian Inference in Structured Additive Regression 2
4 Example: Car insurance data from two insurance companies in Belgium. Structured Additive Regression Sample of approximately policyholders. Aims: Separate risk analyses for claim size and claim frequency to predict risk premium from covariates. Variables of primary interest: Claim size y i or claim frequency h i of policyholders. Covariates: vage vehicles age page policyholders age hp vehicles horsepower bm bonus-malus score s district in Belgium v Vector of categorical covariates BayesX - Software for Bayesian Inference in Structured Additive Regression 3
5 Geoadditive models: Structured Additive Regression Gaussian model for log-costs log(y): log(y) N(η, σ 2 ) with η = f 1 (vage) + f 2 (page) + f 3 (bm) + f 4 (hp) + f spat (s) + v ζ. Poisson model for frequencies h i : h P o(exp(η)) with η = f 1 (vage) + f 2 (page) + f 3 (page)sex + f 3 (bm) + f 4 (hp) + f spat (s) + v ζ. BayesX - Software for Bayesian Inference in Structured Additive Regression 4
6 Results for claim frequency: Structured Additive Regression age of the policyholder age of the policyholder (deviation for males) bonus malus score horse power of the car BayesX - Software for Bayesian Inference in Structured Additive Regression 5
7 Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression 6
8 Spatial effect for claim size: Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression 7
9 Model Components and Priors Model Components and Priors Penalised splines. Approximate f(x) = ξ j B j (x) by a weighted sum of B-spline basis functions. Employ a large number of basis functions to enable flexibility. Penalise differences between parameters of adjacent basis functions to ensure smoothness BayesX - Software for Bayesian Inference in Structured Additive Regression 8
10 Bivariate penalised splines. Model Components and Priors Varying coefficient models. Effect of covariate x varies smoothly over the domain of a second covariate z: f(x, z) = x g(z) Spatial effect modifier Geographically weighted regression. BayesX - Software for Bayesian Inference in Structured Additive Regression 9
11 Spatial effect for regional data: Markov random fields. Model Components and Priors Bivariate extension of a first order random walk on the real line. Define appropriate neighbourhoods for the regions. Assume that the expected value of f spat (s) is the average of the function evaluations of adjacent sites. f(t+1) E[f(t) f(t 1),f(t+1)] f(t 1) τ 2 2 t 1 t t+1 BayesX - Software for Bayesian Inference in Structured Additive Regression 10
12 Model Components and Priors Spatial effect for point-referenced data: Stationary Gaussian random fields. Well-known as Kriging in the geostatistics literature. Spatial effect follows a zero mean stationary Gaussian stochastic process. Correlation of two arbitrary sites is defined by an intrinsic correlation function. Can be interpreted as a basis function approach with radial basis functions. BayesX - Software for Bayesian Inference in Structured Additive Regression 11
13 All effects can be cast into one general framework. Model Components and Priors All vectors of function evaluations f j can be expressed as f j = Z j ξ j with design matrix Z j and regression coefficients ξ j. Generic form of the prior for ξ j : p(ξ j τ 2 j ) (τ 2 j ) k j 2 exp ( 1 2τ 2 j ξ jk j ξ j ). K j 0 acts as a penalty matrix, rank(k j ) = k j d j = dim(ξ j ). τ 2 j 0 can be interpreted as a variance or (inverse) smoothness parameter. BayesX - Software for Bayesian Inference in Structured Additive Regression 12
14 Bayesian Inference Bayesian Inference Fully Bayesian inference: All parameters (including the variance parameters τ 2 ) are assigned suitable prior distributions. Typically, estimation is based on MCMC simulation techniques. Usual estimates: Posterior expectation, posterior median (easily obtained from the samples). Empirical Bayes inference: Differentiate between parameters of primary interest (regression coefficients) and hyperparameters (variances). Assign priors only to the former. Estimate the hyperparameters by maximising their marginal posterior. Plugging these estimates into the joint posterior and maximising with respect to the parameters of primary interest yields posterior mode estimates. BayesX - Software for Bayesian Inference in Structured Additive Regression 13
15 Bayesian Inference MCMC-based inference: Assign inverse gamma prior to τ 2 j : p(τ 2 j ) ( ) 1 (τj 2)a j+1 exp b j τj 2. Proper for a j > 0, b j > 0 Common choice: a j = b j = ε small. Improper for b j = 0, a j = 1 Flat prior for variance τj 2, b j = 0, a j = 1 2 Flat prior for standard deviation τ j. Conditions for proper posteriors in structured additive regression are available. Gibbs sampler for τj 2 : Sample from an inverse Gamma distribution with parameters a j = a j rank(k j) and b j = b j ξ jk j ξ j. BayesX - Software for Bayesian Inference in Structured Additive Regression 14
16 Bayesian Inference Metropolis-Hastings update for ξ j : Propose new state from a multivariate Gaussian distribution with precision matrix and mean P j = Z jw Z j + 1 τ 2 j K j and m j = P 1 j Z jw (ỹ η j ). IWLS-Proposal with appropriately defined working weights W observations ỹ. and working Efficient algorithms make use of the sparse matrix structure of P j and K j. For binary or categorical regression models: Efficient implementation based on latent variable representation. BayesX - Software for Bayesian Inference in Structured Additive Regression 15
17 Bayesian Inference Empirical Bayes inference. Consider the variances τ 2 j as unknown constants to be estimated from their marginal posterior. Consider the regression coefficients ξ j as correlated random effects with multivariate Gaussian distribution Use mixed model methodology for estimation. Problem: In most cases partially improper random effects distribution. Mixed model representation: Decompose where ξ j = X j β j + V j b j, p(β j ) const and b j N(0, τ 2 j I kj ). β j is a fixed effect and b j is an i.i.d. random effect. BayesX - Software for Bayesian Inference in Structured Additive Regression 16
18 This yields a variance components model with pedictor Bayesian Inference η = Xβ + V b where in turn p(β) const and b N(0, Q). Obtain empirical Bayes estimates / penalized likelihood estimates via iterating Penalized maximum likelihood for the regression coefficients β and b. Restricted Maximum / Marginal likelihood for the variance parameters in Q: L(Q) = L(β, b, Q)p(b)dβdb max Q. Involves a Laplace approximation to the marginal likelihood (corresponding to REML estimation of variances in Gaussian mixed models). BayesX - Software for Bayesian Inference in Structured Additive Regression 17
19 BayesX BayesX BayesX is a software tool for estimating structured additive regression models. BayesX - Software for Bayesian Inference in Structured Additive Regression 18
20 Stand-alone software with Stata-like syntax. BayesX Developed by Andreas Brezger, Thomas Kneib and Stefan Lang with contributions of seven colleagues. Computationally demanding parts are implemented in C++. Graphical user interface and visualisation tools are implemented in Java. Currently, BayesX only runs under Windows, a Linux version as well as a connection to R are work in progress. BayesX - Software for Bayesian Inference in Structured Additive Regression 19
21 Resources: BayesX Reference Manual, Methodology Manual, Tutorial Manual. Some publications: Brezger, Kneib & Lang (2005). BayesX: Analyzing Bayesian structured additive regression models. Journal of Statistical Software, 14 (11). Brezger, A. & Lang, S. (2006). Generalized additive regression based on Bayesian P-splines. Computational Statistics and Data Analysis 50, Fahrmeir, L., Kneib, T. & Lang, S. (2004). Penalized structured additive regression for space-time data: a Bayesian perspective. Statistica Sinica 14, BayesX - Software for Bayesian Inference in Structured Additive Regression 20
22 R provides functionality for generalised additive models and a number of extensions such as varying coefficient models, interaction surfaces, spatio-temporal effects, etc. What is the gain in using BayesX? BayesX BayesX provides functionality for more general response types: Ordered and unordered categorical responses, Continuous survival time models extending the Cox model, Multi-state models. Inference can be based on mixed model methodology or MCMC. Not all possibilities and combinations are supported by both inferential concepts. BayesX - Software for Bayesian Inference in Structured Additive Regression 21
23 Categorical responses with unordered categories: BayesX Multinomial logit and multinomial probit models, Category-specific and globally-defined covariates, Non-availability indicators can be defined to account for varying choice sets. Response families in BayesX: multinomial, multinomialprobit. Categorical responses with ordered categories: Ordinal as well as sequential models, Logit and probit models, Effects can be category-specific or constant over the categories. Response families in BayesX: cumlogit, cumprobit, seqlogit, seqprobit. BayesX - Software for Bayesian Inference in Structured Additive Regression 22
24 Continuous survival times: BayesX Cox-type hazard regression models, Joint estimation of the baseline hazard rate and the covariate effects, Time-varying effects and time-varying covariates, Arbitrary combinations of right, left and interval censoring as well as left truncation. Response family: cox. Multi-state models: Describe the evolution of discrete phenomena in continuous time, Model in terms of transition intensities, similar as in the Cox model. Response family: multistate. BayesX - Software for Bayesian Inference in Structured Additive Regression 23
25 Summary & Future Plans Summary & Future Plans Take home message: BayesX is a user-friendly software that allows for the routine estimation of a broad class of semiparametric regression models even in non-standard situations. Future plans: Linux version and connection to R. Interval censoring for multi-state models. Bayesian regularisation-priors and model choice (with LASSO-type priors). Bayesian treatment of measurement error in structured additive regressions models. BayesX - Software for Bayesian Inference in Structured Additive Regression 24
Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour
Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Flexible geoadditive survival analysis of non-hodgkin lymphoma in Peru
Statistics & Operations Research Transactions SORT 36 (2) July-December 2012, 221-230 ISSN: 1696-2281 eissn: 2013-8830 www.idescat.cat/sort/ Statistics & Operations Research c Institut d Estadística de
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Predictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
How To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Multilevel Modelling of medical data
Statistics in Medicine(00). To appear. Multilevel Modelling of medical data By Harvey Goldstein William Browne And Jon Rasbash Institute of Education, University of London 1 Summary This tutorial presents
Statistics & Probability PhD Research. 15th November 2014
Statistics & Probability PhD Research 15th November 2014 1 Statistics Statistical research is the development and application of methods to infer underlying structure from data. Broad areas of statistics
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Master programme in Statistics
Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Lecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
LÉVY-DRIVEN PROCESSES IN BAYESIAN NONPARAMETRIC INFERENCE
Bol. Soc. Mat. Mexicana (3) Vol. 19, 2013 LÉVY-DRIVEN PROCESSES IN BAYESIAN NONPARAMETRIC INFERENCE LUIS E. NIETO-BARAJAS ABSTRACT. In this article we highlight the important role that Lévy processes have
Dirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
Gaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany [email protected] WWW home page: http://www.tuebingen.mpg.de/ carl
Missing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Probabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Smoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
Bayesian Estimation of Joint Survival Functions in Life Insurance
ISBA 2, Proceedings, pp. ISBA and Eurostat, 21 Bayesian Estimation of Joint Survival Functions in Life Insurance ARKADY SHEMYAKIN and HEEKYUNG YOUN University of St. Thomas, Saint Paul, MN, USA Abstract:
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
Doctor of Philosophy Program in Statistics International Program Faculty of Science and Technology Thammasat University New Curriculum Year 2004
Doctor of Philosophy Program in Statistics International Program Faculty of Science and Technology Thammasat University New Curriculum Year 2004 1. Curriculum Title Doctor of Philosophy Program in Statistics
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
APPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
Imputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data
State Space Time Series Analysis
State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State
DURATION ANALYSIS OF FLEET DYNAMICS
DURATION ANALYSIS OF FLEET DYNAMICS Garth Holloway, University of Reading, [email protected] David Tomberlin, NOAA Fisheries, [email protected] ABSTRACT Though long a standard technique
Applying MCMC Methods to Multi-level Models submitted by William J Browne for the degree of PhD of the University of Bath 1998 COPYRIGHT Attention is drawn tothefactthatcopyright of this thesis rests with
Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds
Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
A revisit of the hierarchical insurance claims modeling
A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Exam C, Fall 2006 PRELIMINARY ANSWER KEY
Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14
Multivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
Bayesian Hidden Markov Models for Alcoholism Treatment Tria
Bayesian Hidden Markov Models for Alcoholism Treatment Trial Data May 12, 2008 Co-Authors Dylan Small, Statistics Department, UPenn Kevin Lynch, Treatment Research Center, Upenn Steve Maisto, Psychology
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Dynamic Linear Models with R
Giovanni Petris, Sonia Petrone, Patrizia Campagnoli Dynamic Linear Models with R SPIN Springer s internal project number, if known Monograph August 10, 2007 Springer Berlin Heidelberg NewYork Hong Kong
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Nonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
Chapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
Penalized Splines - A statistical Idea with numerous Applications...
Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1 Penalized Splines - A statistical Idea with numerous Applications...
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income
An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach
Intro B, W, M, & R SPDE/GMRF Example End An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Finn Lindgren 1 Håvard Rue 1 Johan
Statistics Graduate Programs
Statistics Graduate Programs Kathleen Maurer, Coordinator of Graduate Studies 146 Middlebush Columbia, MO 65211 573-882-6376 http://www.stat.missouri.edu/ About Statistics The statistics department faculty
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
Linear regression methods for large n and streaming data
Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Probabilistic concepts of risk classification in insurance
Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
