# Variable Selection for Health Care Demand in Germany

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Variable Selection for Health Care Demand in Germany Zhu Wang Connecticut Children s Medical Center University of Connecticut School of Medicine July 23, 2015 This document reproduces the data analysis presented in Wang et al. (2015). For a description of the theory behind application illustrated here we refer to the original manuscript. Riphahn et al. (2003) utilized a part of the German Socioeconomic Panel (GSOEP) data set to analyze the number of doctor visits. The original data have twelve annual waves from 1984 to 1995 for a representative sample of German households, which provide broad information on the health care utilization, current employment status, and the insurance arrangements under which subjects are protected. The data set contains number of doctor office visits for 1,812 West German men aged 25 to 65 years in the last three months of As shown in the figure, many doctor office visits are zeros, which can be difficult to fit with a Poisson or negative binomial model. Therefore, zero-inflated negative binomial (ZINB) model is considered. R> library("mpath") R> library("zic") R> library("pscl") R> data(docvisits) R> barplot(with(docvisits, table(docvisits)), ylab = "Frequency", xlab = "Doctor office visits") 1

2 Frequency Doctor office visits We include the linear spline variables age30 to age60 and their interaction terms with the health satisfaction health. R> dt <- docvisits[, -(2:3)] R> tmp <- model.matrix(~age30 * health + age35 * health + age40 * health + age45 * health + age50 * health + age55 * health + age60 * health, data = dt)[, -(1:9)] R> dat <- cbind(dt, tmp) Full ZINB model with all predictor variables. R> m1 <- zeroinfl(docvisits ~.., data = dat, dist = "negbin") R> summary(m1) R> cat("loglik of zero-inflated model", loglik(m1)) R> cat("bic of zero-inflated model", AIC(m1, k = log(dim(dat)[1]))) R> cat("aic of zero-inflated model", AIC(m1)) Backward stepwise variable selection with significance level alpha=0.01. R> fitbe <- be.zeroinfl(m1, data = dat, dist = "negbin", alpha = 0.01, trace = FALSE) R> summary(fitbe) R> cat("loglik of zero-inflated model with backward selection", loglik(fitbe)) R> cat("bic of zero-inflated model with backward selection", AIC(fitbe, k = log(dim(dat)[1]))) Compute LASSO estimates. 2

3 R> fit.lasso <- zipath(docvisits ~.., data = dat, family = "negbin", nlambda = 100, lambda.zero.min.ratio = 0.001, maxit.em = 300, maxit.theta = 25, theta.fixed = FALSE, trace = FALSE, penalty = "enet", rescale = FALSE) R> minbic <- which.min(bic(fit.lasso)) R> coef(fit.lasso, minbic) R> cat("theta estimate", fit.lasso\$theta[minbic]) Compute standard errors of coefficients and theta (the last one for theta). R> se(fit.lasso, minbic, log = FALSE) R> AIC(fit.lasso)[minBic] R> BIC(fit.lasso)[minBic] R> loglik(fit.lasso)[minbic] R> n <- dim(dat)[1] R> K <- 10 R> set.seed(197) R> foldid <- split(sample(1:n), rep(1:k, length = n)) nlambda = 100, lambda.count = fit.lasso\$lambda.count[1:30], lambda.zero = fit.lasso\$lambda.zero[1:30], maxit.em = 300, penalty = "enet", rescale = FALSE, foldid = foldid) Compute MCP estimates. We compute solution paths for the first 30 pairs of shrinkage parameters (the EM algorithm can be slow), and then evaluate results as for the LASSO estimates. For cross-validation, set maximum number of iterations in estimating scaling parameter 1 (maxit.theta=1) to reduce computation costs. R> tmp <- zipath(docvisits ~.., data = dat, family = "negbin", gamma.count = 2.7, gamma.zero = 2.7, lambda.zero.min.ratio = 0.1, maxit = 1, maxit.em = 1, maxit.theta = 2, theta.fixed = FALSE, penalty = "mnet") R> fit.mcp <- zipath(docvisits ~.., data = dat, family = "negbin", gamma.count = 2.7, gamma.zero = 2.7, lambda.count = tmp\$lambda.count[1:30], maxit.theta = 25, theta.fixed = FALSE, penalty = "mnet") R> minbic <- which.min(bic(fit.mcp)) R> coef(fit.mcp, minbic) R> cat("theta estimate", fit.mcp\$theta[minbic]) 3

4 Compute standard errors of coefficients and theta (the last one for theta). R> se(fit.mcp, minbic, log = FALSE) R> AIC(fit.mcp)[minBic] R> BIC(fit.mcp)[minBic] R> loglik(fit.mcp)[minbic] gamma.count = 2.7, gamma.zero = 2.7, lambda.count = tmp\$lambda.count[1:30], penalty = "mnet", rescale = FALSE, foldid = foldid) Compute SCAD estimates. R> tmp <- zipath(docvisits ~.., data = dat, family = "negbin", gamma.count = 2.5, gamma.zero = 2.5, lambda.zero.min.ratio = 0.01, maxit = 1, maxit.em = 1, maxit.theta = 2, theta.fixed = FALSE, penalty = "snet") R> fit.scad <- zipath(docvisits ~.., data = dat, family = "negbin", gamma.count = 2.5, gamma.zero = 2.5, lambda.count = tmp\$lambda.count[1:30], maxit.theta = 25, theta.fixed = FALSE, penalty = "snet") R> minbic <- which.min(bic(fit.scad)) R> coef(fit.scad, minbic) R> cat("theta estimate", fit.scad\$theta[minbic]) Compute standard errors of coefficients and theta (the last one for theta). R> se(fit.scad, minbic, log = FALSE) R> AIC(fit.scad)[minBic] R> BIC(fit.scad)[minBic] R> loglik(fit.scad)[minbic] gamma.count = 2.5, gamma.zero = 2.5, lambda.count = tmp\$lambda.count[1:30], penalty = "snet", rescale = FALSE, foldid = foldid) 4

5 References Regina T Riphahn, Achim Wambach, and Andreas Million. Incentive effects in the demand for health care: a bivariate panel count data estimation. Journal of Applied Econometrics, 18(4): , Zhu Wang, Shuangge Ma, and Ching-Yun Wang. Variable selection for zeroinflated and overdispersed data with application to health care demand in germany. Biometrical Journal, Article first published online: 8 JUN 2015 DOI: /bimj

### Package zic. September 4, 2015

Package zic September 4, 2015 Version 0.9 Date 2015-09-03 Title Bayesian Inference for Zero-Inflated Count Models Author Markus Jochmann Maintainer Markus Jochmann

### Aileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 07-10

AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR IN IRELAND Aileen Murphy, Department of Economics, UCC, Ireland. DEPARTMENT OF ECONOMICS WORKING PAPER SERIES 07-10 1 AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR

### Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Copayments for Ambulatory Care in Germany: A Natural Experiment Using a Difference-in-Difference Approach

Copayments for Ambulatory Care in Germany: A Natural Experiment Using a Difference-in-Difference Approach SOEP User Conference 2008, Berlin Jonas Schreyögg 1 and Markus Grabka 2 1) Dept. Health Care Management,

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### The Chi-Square Diagnostic Test for Count Data Models

The Chi-Square Diagnostic Test for Count Data Models M. Manjón-Antoĺın and O. Martínez-Ibañez QURE-CREIP Department of Economics, Rovira i Virgili University. 2012 Spanish Stata Users Group Meeting (Universitat

### Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

### Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

### The frequency of visiting a doctor: is the decision to go independent of the frequency?

Discussion Paper: 2009/04 The frequency of visiting a doctor: is the decision to go independent of the frequency? Hans van Ophem www.feb.uva.nl/ke/uva-econometrics Amsterdam School of Economics Department

### Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

### A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

### Package dsmodellingclient

Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD

### How Many People Visit YouTube? Imputing Missing Events in Panels With Excess Zeros

How Many People Visit YouTube? Imputing Missing Events in Panels With Excess Zeros Georg M. Goerg, Yuxue Jin, Nicolas Remy, Jim Koehler 1 1 Google, Inc.; United States E-mail for correspondence: gmg@google.com

### Factoring Trinomials: The ac Method

6.7 Factoring Trinomials: The ac Method 6.7 OBJECTIVES 1. Use the ac test to determine whether a trinomial is factorable over the integers 2. Use the results of the ac test to factor a trinomial 3. For

### Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

### Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

### Discussion Papers. Copayments for Ambulatory Care in Germany: A Natural Experiment Using a Difference-in- Difference Approach

Deutsches Institut für Wirtschaftsforschung www.diw.de Discussion Papers 777 Jonas Schreyögg Markus M. Grabka Copayments for Ambulatory Care in Germany: A Natural Experiment Using a Difference-in- Difference

### EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Package bigdata. R topics documented: February 19, 2015

Type Package Title Big Data Analytics Version 0.1 Date 2011-02-12 Author Han Liu, Tuo Zhao Maintainer Han Liu Depends glmnet, Matrix, lattice, Package bigdata February 19, 2015 The

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Model-based boosting in R

Model-based boosting in R Introduction to Gradient Boosting Matthias Schmid Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg Statistical

### ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

### Automated Biosurveillance Data from England and Wales, 1991 2011

Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

### Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

### Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

### Introduction to Generalized Linear Models

to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK WU, 2008 04 22-24 Copyright c Heather Turner, 2008 to Generalized

### Modern regression 2: The lasso

Modern regression 2: The lasso Ryan Tibshirani Data Mining: 36-462/36-662 March 21 2013 Optional reading: ISL 6.2.2, ESL 3.4.2, 3.4.3 1 Reminder: ridge regression and variable selection Recall our setup:

Incentive Eects in the Demand for Health Care: A Bivariate Panel Count Data Estimation Regina T. Riphahn 1 2 3 Achim Wambach 1 2 Andreas Million 1 3 1 University of Munich, 2 CEPR, London, 3 IZA, Bonn

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

### Lab 13: Logistic Regression

Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

### Package EstCRM. July 13, 2015

Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Time Series Analysis of Network Traffic

Time Series Analysis of Network Traffic Cyriac James IIT MADRAS February 9, 211 Cyriac James (IIT MADRAS) February 9, 211 1 / 31 Outline of the presentation Background Motivation for the Work Network Trace

### Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

### More Flexible GLMs Zero-Inflated Models and Hybrid Models

More Flexible GLMs Zero-Inflated Models and Hybrid Models Mathew Flynn, Ph.D. Louise A. Francis FCAS, MAAA Motivation: GLMs are widely used in insurance modeling applications. Claim or frequency models

### DO SPA VISITS IMPROVE HEALTH: EVIDENCE FROM GERMAN MICRO DATA. Jonathan Klick Florida State University College of Law

DO SPA VISITS IMPROVE HEALTH: EVIDENCE FROM GERMAN MICRO DATA Jonathan Klick Florida State University College of Law Thomas Stratmann George Mason University Abstract The health benefits of spas have been

### Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Wednesday PM. Multiple regression. Multiple regression in SPSS. Presentation of AM results Multiple linear regression. Logistic regression

Wednesday PM Presentation of AM results Multiple linear regression Simultaneous Stepwise Hierarchical Logistic regression Multiple regression Multiple regression extends simple linear regression to consider

### University of Zurich. Happiness and unemployment: a panel data analysis for Germany. Zurich Open Repository and Archive. Winkelmann, L; Winkelmann, R

University of Zurich Zurich Open Repository and Archive Winterthurerstr. 190 CH-8057 Zurich http://www.zora.unizh.ch Year: 1995 Happiness and unemployment: a panel data analysis for Germany Winkelmann,

### Practical Differential Gene Expression. Introduction

Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the gene-summarized count data

### Data analysis in supersaturated designs

Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### Gender Differences in Smoking Behavior

DISCUSSION PAPER SERIES IZA DP No. 2259 Gender Differences in Smoking Behavior Thomas Bauer Silja Göhlmann Mathias Sinning August 2006 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study

### Does Black-Scholes framework for Option Pricing use Constant Volatilities and Interest Rates? New Solution for a New Problem

Does Black-Scholes framework for Option Pricing use Constant Volatilities and Interest Rates? New Solution for a New Problem Gagan Deep Singh Assistant Vice President Genpact Smart Decision Services Financial

### 16 : Demand Forecasting

16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical

### Elisa Iezzi* Matteo Lippi Bruni** Cristina Ugolini**

24-25 June 2010 Elisa Iezzi* Matteo Lippi Bruni** Cristina Ugolini** elisa.iezzi@unibo.it matteo.lippibruni2@unibo.it cristina.ugolini@unibo.it * Department of Statistics, University of Bologna **Department

### [3] Big Data: Model Selection

[3] Big Data: Model Selection Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/teaching [3] Making Model Decisions Out-of-Sample vs In-Sample performance Regularization

### Package metafuse. November 7, 2015

Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1 Date 2015-11-06 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang

### GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide

GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M Agenda Introduction Testing the link function

### Household s Life Insurance Demand - a Multivariate Two Part Model

Household s Life Insurance Demand - a Multivariate Two Part Model Edward (Jed) W. Frees School of Business, University of Wisconsin-Madison July 30, 1 / 19 Outline 1 2 3 4 2 / 19 Objective To understand

### Bayesian Multiple Imputation of Zero Inflated Count Data

Bayesian Multiple Imputation of Zero Inflated Count Data Chin-Fang Weng chin.fang.weng@census.gov U.S. Census Bureau, 4600 Silver Hill Road, Washington, D.C. 20233-1912 Abstract In government survey applications,

### Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.

Web-based Supplementary Materials for Modeling of Hormone Secretion-Generating Mechanisms With Splines: A Pseudo-Likelihood Approach by Anna Liu and Yuedong Wang Web Appendix A This appendix computes mean

### Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

### Package SurvCorr. February 26, 2015

Type Package Title Correlation of Bivariate Survival Times Version 1.0 Date 2015-02-25 Package SurvCorr February 26, 2015 Author Meinhard Ploner, Alexandra Kaider and Georg Heinze Maintainer Georg Heinze

### Individual Determinants of Work Attendance: Evidence on the Role of Personality

DISCUSSION PAPER SERIES IZA DP No. 4927 Individual Determinants of Work Attendance: Evidence on the Role of Personality Susi Störmer René Fahr May 2010 Forschungsinstitut zur Zukunft der Arbeit Institute

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

### Asymmetric Information in the Portuguese Health Insurance Market

Asymmetric Information in the Portuguese Health Insurance Market Teresa Bago d Uva Instituto Nacional de Estatística J.M.C. Santos Silva ISEG/Universidade Técnica de Lisboa 1 1 Introduction ² The negative

### Entry Strategies, Founder s Human Capital and Start- Up Size

Entry Strategies, Founder s Human Capital and Start- Up Size Sandra Gottschalk, Kathrin Müller, and Michaela Niefert Centre for European Economic Research (ZEW), Mannheim, Germany First version (January,

### Regression Clustering

Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

### Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995

### HEALTH CARE REFORM AND THE NUMBER OF DOCTOR VISITS AN ECONOMETRIC ANALYSIS

JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 19: 455 472 (2004) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.764 HEALTH CARE REFORM AND THE NUMBER OF DOCTOR VISITS

### Modelling the Scores of Premier League Football Matches

Modelling the Scores of Premier League Football Matches by: Daan van Gemert The aim of this thesis is to develop a model for estimating the probabilities of premier league football outcomes, with the potential

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Female labor supply and parental leave benefits - The causal effect of paying higher transfers for a shorter period of time

Female labor supply and parental leave benefits - The causal effect of paying higher transfers for a shorter period of time Annette Bergemann a VU University Amsterdam Regina T. Riphahn b University Erlangen-Nuremberg

### Using the ac Method to Factor

4.6 Using the ac Method to Factor 4.6 OBJECTIVES 1. Use the ac test to determine factorability 2. Use the results of the ac test 3. Completely factor a trinomial In Sections 4.2 and 4.3 we used the trial-and-error

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Chapter 7. Lyapunov Exponents. 7.1 Maps

Chapter 7 Lyapunov Exponents Lyapunov exponents tell us the rate of divergence of nearby trajectories a key component of chaotic dynamics. For one dimensional maps the exponent is simply the average

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### FACTORING TRINOMIALS IN THE FORM OF ax 2 + bx + c

Tallahassee Community College 55 FACTORING TRINOMIALS IN THE FORM OF ax 2 + bx + c This kind of trinomial differs from the previous kind we have factored because the coefficient of x is no longer "1".

### Causal Leading Indicators Detection for Demand Forecasting

Causal Leading Indicators Detection for Demand Forecasting Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, Bram Desmet Department of Industrial Management, Ghent University 13/07/2015 EURO

### Running Head: MONEY, WELL-BEING, AND LOSS AVERSION 1. Money, Well-Being, and Loss Aversion:

Running Head: MONEY, WELL-BEING, AND LOSS AVERSION 1 Money, Well-Being, and Loss Aversion: Does an Income Loss Have a Greater Effect on Well-Being than an Equivalent Income Gain? Christopher J. Boyce Behavioural

### BLOCKWISE SPARSE REGRESSION

Statistica Sinica 16(2006), 375-390 BLOCKWISE SPARSE REGRESSION Yuwon Kim, Jinseog Kim and Yongdai Kim Seoul National University Abstract: Yuan an Lin (2004) proposed the grouped LASSO, which achieves

### GLAM Array Methods in Statistics

GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array,

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### Human Capital and Ethnic Self-Identification of Migrants

DISCUSSION PAPER SERIES IZA DP No. 2300 Human Capital and Ethnic Self-Identification of Migrants Laura Zimmermann Liliya Gataullina Amelie Constant Klaus F. Zimmermann September 2006 Forschungsinstitut

### Evidence to Action: Use of Predictive Models for Beach Water Postings

Evidence to Action: Use of Predictive Models for Beach Water Postings Canadian Society for Epidemiology and Biostatistics Caitlyn Paget, June 4 th 2015 Goal is to improve program delivery Can we improve

### A Short Tour of the Predictive Modeling Process

Chapter 2 A Short Tour of the Predictive Modeling Process Before diving in to the formal components of model building, we present a simple example that illustrates the broad concepts of model building.

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### A Comparison of Variable Selection Techniques for Credit Scoring

1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Paper DV-06-2015. KEYWORDS: SAS, R, Statistics, Data visualization, Monte Carlo simulation, Pseudo- random numbers

Paper DV-06-2015 Intuitive Demonstration of Statistics through Data Visualization of Pseudo- Randomly Generated Numbers in R and SAS Jack Sawilowsky, Ph.D., Union Pacific Railroad, Omaha, NE ABSTRACT Statistics

ADVERTISING & MEDIA Is the alcoholic beverage industry targeting minors with magazine ads? Advertising, Alcohol, and Youth BY JON P. N ELSON Pennsylvania State University In 2001, the u.s. alcoholic beverage

### Sect 6.1 - Greatest Common Factor and Factoring by Grouping

Sect 6.1 - Greatest Common Factor and Factoring by Grouping Our goal in this chapter is to solve non-linear equations by breaking them down into a series of linear equations that we can solve. To do this,

### Blockwise Sparse Regression

Blockwise Sparse Regression 1 Blockwise Sparse Regression Yuwon Kim, Jinseog Kim, and Yongdai Kim Seoul National University, Korea Abstract: Yuan an Lin (2004) proposed the grouped LASSO, which achieves

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful