Studying employment pathways of graduates by a latent Markov model

Size: px
Start display at page:

Download "Studying employment pathways of graduates by a latent Markov model"

Transcription

1 Studying employment pathways of graduates by a latent Markov model Fulvia Pennoni Abstract Motivated by an application to a longitudinal dataset deriving from administrative data which concern labour market and academic performances in Lombardy, we propose a multivariate latent Markov model with covariates for panel data. Our aim is to investigate how covariates influence labour market performance of the graduates which is measured through three type of response variables. The model is based on a Markov process to represent the latent characteristics of the subjects. Maximum likelihood estimation of the model parameters is based on the Expectation-Maximisation algorithm and it is performed by using a two-step approach first estimating a latent class model and then the latent Markov model. Key words: Expectation-Maximisation algorithm, human capital, labour market, latent variable model, panel data 1 Introduction In this paper we propose a model for the evaluation of the employment pathways in terms of wage, easiness in switching between types of position and employment skill of the graduates. In the present job market condition which is affected by the financial crisis, it is interesting to study university-to-work transition in terms of, human capital increase. As recently proposed by the OECD report [1] human capital is the knowledge, skill, competencies and attributes embodied in individuals that are relevant to economic activity. It is a complex, multifaced phenomenon which is not directly measurable only in terms of wage [2], [3]. Therefore the interest is on the evolution of a latent characteristic of an individual which is indirectly measured by certain response variables. Fulvia Pennoni Department of Statistics and Quantitive Methods, Via Bicocca degli Arcimboldi 8, 20126, Milano, fulvia.pennoni@unimib.it 1

2 2 Fulvia Pennoni A model that is suitable for the type of analysis above is the latent Markov model as proposed by [4]. The model represents the evolution of the latent characteristic of interest by an unobservable Markov chain which has a reduced number of states. The response variables are assumed to be conditionally independent given this latent process. In this paper the model is considered in its multivariate version including individual covariates in the latent process as proposed by [5]. Likelihood inference of the model is based on the EM algorithm [6]. 2 The dataset The dataset concerns 1624 individuals who graduated in 2007 from four universities of Milan. They have been followed along four quarters after the graduation date, covering one year. The choice of the specific 2007 cohort is motivated by the availability of the data coming from the following integrated databases: i) database of the observatory of the labour market in Lombardy, which has collected mandatory notices from public and private employers regarding changes in job status form 2000; ii) database of the Revenue office, which provides information about wages from 2004 to present of all subjects residing in Lombardy; iii) database of graduates from four universities of Milan, which provides information on the academic careers of graduates from The response variables are: i) employment status and type of employment contract indicating whether a subject is employed with a permanent or temporary contract, ii) job quality measured by the skill level of the job which is derived by a categorization of the job qualification made by the Italian National Institute of Statistics, iii) wages for each quarter. The available covariates concern the socio-demographic characteristics such as: i) age, ii) family income, iii) gender, iv) student employment, and academic characteristics such as v) type of degree, and vi) final grade. In Table 1 we report the descriptive statistics for the distribution of the available covariates, whereas in Table 2 we report the descriptive statistics for the response variables for each quarter of observation. 3 The proposed model With reference to a subject in the sample of n subjects observed at T time occasion, we introduce the symbol Y (t) to denote the vector of response of variables of interest at time occasion t, t = 1,...,T, which has elements Y (t) j, j = 1,...,r where each has c j levels. The symbol X (t) denotes the vector of all individual covariates available at the t-th time occasion. The proposed model assume the existence of a latent process U = (U (1),...,U (T ) ) which affects the distribution of the response variables. The main assumptions of the model are that the response variables in Y (t) are conditionally independent given the latent process U and that the latent process follows a first-order homogeneous Markov chain with state space {1,...,k}. We Y (t) j

3 Studying employment pathways of graduates by a latent Markov model 3 Covariate Category % Mean St.dev. age (in 2007) family income in Euro gender: male female employment before 2007 no yes type of degree technical architecture 8.93 business humanistic science final grade < cum laude Table 1 Descriptive statistics for the distributions of the covariates Type of contract Year st quarter 2nd quarter 3rd quarter 4th quarter none temporary permanent Skill none medium/low high Wage none less than 3750 e high than 3750 e Table 2 Frequency of every response variable for period of observation denote the conditional response probabilities by φ jy u = f (t) Y j U (t)(y u), j = 1,...,r,t = 1,...,T,u = 1,...,k,y = 0,...,c j 1. We admit that the covariates affect the distribution of the response variables given the latent process [5]. Therefore we have the following initial and transition probabilities of the latent process π u x = f U (t) X (t) (u x), u = 1,...,k π u ūx = f U (t) U (t 1),X (t) (u ū,x), t = 2,...,T, ū,u = 1,...,k,

4 4 Fulvia Pennoni where x denotes a realization of X (t), u denotes a realization of U (t), and ū denotes a realization of U (t 1). An interesting way to allow the initial and transition probabilities of the latent Markov chain to depend on the individual covariates is by adopting the following parameterization and log π u x π 1 x = β 0u + x β 1u, u = 2,...,k, (1) log π u ūx πū ūx = γ 0ūu + x (γ 1u γ 1ū ) u = 1,...,k, u ū. (2) This formulation is of interest when we want to understand how covariates affect the latent characteristic which is indirectly measured by the response variables. For classifying the sample of subjects on the basis of categorical response variables we rely on a latent class model [7]. According to this model we also select the number of latent classes by using the Bayesian Information Criterion (BIC, [8]). Then, the estimation is performed by maximizing the log-likelihood [ ] l(θ) = log i fỹ X (ỹ i x i ), where x i and ỹ i are vectors of observed data i = 1,...,n, and θ is the vector of all model parameters. Function l(θ) is efficiently computed by using a recursion which is known in the hidden Markov literature. Likelihood maximization is performed by an EM algorithm ([6], [9]) based on the complete data log-likelihood, that is the loglikelihood that we could compute if we knew the latent state of each subject at every occasion. Differently from a standard EM algorithm under this maximization we do not update the conditional response probabilities which are held fixed. In such a way, the algorithm is faster to converge as the number of iterations needed are much less. Once parameter estimates have been computed, standard errors are associated at this estimates. They are computed on the basis of nonparametric bootstrap [10] which consists of repeatedly drawing samples from the observed sample and computing the maximum likelihood estimates for every bootstrap sample. Then the standard error corresponding to the parameter estimate is found through the bootstrap distribution of the estimators. 4 Main results In applying the basic latent class model to the dataset, we chose the number of latent states k = 3. The maximum log-likelihood of the model is equal to ˆl = with 20 parameters. The corresponding value of BIC is The estimates of the conditional response probabilities according to this model are reported in Table 3.

5 Studying employment pathways of graduates by a latent Markov model 5 ˆφ jy u Type of contract u = 1 u = 2 u = 3 none temporary permanent Skill none medium/low high Wage none less than 3750 e high than 3750 e Table 3 Estimated conditional probabilities of labour condition under the selected model We observe different types of labour conditions given the latent classes. In particular, the first class, which is the largest with about 53% of subjects, corresponds to unemployed subjects which may have income from other sources. For the second class, including about the 27% of subjects, we have those subjects with temporary contract and with some qualified work but with low wage. For the third class, including 20% of subjects, we have subjects with stable high quality jobs and high income. According to the selected number of classes the analysis is focused on studying the dependence between the latent classes and the observable covariates by fitting the multivariate latent Markov model with covariates. The results of this fitting in terms of the parameters affecting the initial and the transition probabilities are reported in Table 4. Effect ˆβ12 ˆβ13 ˆγ 12 ˆγ 13 female student employee age grade grade grade grade grade grade family income/ architecture business humanistic science Table 4 Estimates of the regression parameters affecting the latent process ( minus the sample average, 95% bootstrap interval does not contain 0)

6 6 Fulvia Pennoni In order to properly interpret the results in this table we have to consider that the adopted model is based on the parameterizations (1) and (2). On the basis of the estimates of the regression coefficients for the multinomial logit of the latent classes, at the beginning of the time occasions, to be female has a positive effect on being in the second latent class with respect to a male. It means that females find more easily a low quality job compared to males. Having work experience during university has a strong positive effect on finding a first job and also a high quality employment. Students from high income families opt to continue their education or simply avoid search effort to find a job compared to those from low income families. Students with a technical degree have much more chance of getting a job position with respect to the other degrees. Even for the most qualified jobs people with a degree in architecture and humanistic are disadvantaged compared to those with a technical degree. Considering the subsequent periods of observation females are more likely to accept a low quality employment compared to their male counterparts, as well as student employees compared to students. We notice also that, less young graduates tend to have more difficulty to find a job. Moreover, technical degree helps to reach a high quality employment compared to the other degrees followed by business and science degrees. Acknowledgements We are grateful to Prof. M. Mezzanzanica and to Dr. M. Fontana, of the Interuniversity Research Centre on Public Services, University of Milano-Bicocca, for providing the dataset. We also acknowledge Finite mixture and latent variable models for causal inference and analysis of socio-economic data (FIRB - Futuro in ricerca) funded by the Italian Government (RBFR12SHVV). References 1. OECD (1998). Human Capital Investment. An International Comparison. Paris: Centre for International Research and Innovation. 2. Folloni, G. and Vittadini, G. (2010). Human capital measurement: a survey. Journal of Economics Surveys, 24, Wößmann, L. (2003). Specifying human capital. Journal of Economic Surveys, 17, Wiggins, L.M. (1973). Panel Analysis: Latent probability models for attitude and behavious processes. Elsevier. 5. Bartolucci, F. and Farcomeni, A. and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data. Chapman & Hall/CRC, Boca Raton. 6. Baum, L.E. and Petrie, T. and Soules, G. and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, Lazarsfeld, P.F. and Henry, N.W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston. 8. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, Dempster, A.P. and Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, Efron, B. and Tibshirani J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC, New York.

A hidden Markov model for criminal behaviour classification

A hidden Markov model for criminal behaviour classification RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University

More information

Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

More information

Curriculum Vitae of Francesco Bartolucci

Curriculum Vitae of Francesco Bartolucci Curriculum Vitae of Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia Via A. Pascoli, 20 06123 Perugia (IT) email: bart@stat.unipg.it http://www.stat.unipg.it/bartolucci

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Using Mixture Latent Markov Models for Analyzing Change with Longitudinal Data

Using Mixture Latent Markov Models for Analyzing Change with Longitudinal Data Using Mixture Latent Markov Models for Analyzing Change with Longitudinal Data Jay Magidson, Ph.D. President, Statistical Innovations Inc. Belmont, MA., U.S. Presented at Modern Modeling Methods (M3) 2013,

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

The Expectation Maximization Algorithm A short tutorial

The Expectation Maximization Algorithm A short tutorial The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Missing data: the hidden problem

Missing data: the hidden problem white paper Missing data: the hidden problem Draw more valid conclusions with SPSS Missing Data Analysis white paper Missing data: the hidden problem 2 Just about everyone doing analysis has some missing

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults

Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Journal of Data Science 7(2009), 469-485 Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Liang Zhu, Jianguo Sun and Phillip Wood University of Missouri Abstract: Alcohol

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

15 Ordinal longitudinal data analysis

15 Ordinal longitudinal data analysis 15 Ordinal longitudinal data analysis Jeroen K. Vermunt and Jacques A. Hagenaars Tilburg University Introduction Growth data and longitudinal data in general are often of an ordinal nature. For example,

More information

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Ethnicity and Second Generation Immigrants

Ethnicity and Second Generation Immigrants Ethnicity and Second Generation Immigrants Christian Dustmann, Tommaso Frattini, Nikolaos Theodoropoulos Key findings: Ethnic minority individuals constitute a large and growing share of the UK population:

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA Hatice UENAL Institute of Epidemiology and Medical Biometry, Ulm University, Germany

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

200612 - ADL - Longitudinal Data Analysis

200612 - ADL - Longitudinal Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 749 - MAT - Department

More information

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004 The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Is Temporary Agency Employment a Stepping Stone for Immigrants?

Is Temporary Agency Employment a Stepping Stone for Immigrants? D I S C U S S I O N P A P E R S E R I E S IZA DP No. 6405 Is Temporary Agency Employment a Stepping Stone for Immigrants? Elke Jahn Michael Rosholm March 2012 Forschungsinstitut zur Zukunft der Arbeit

More information

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

More information

Logistic Regression. BUS 735: Business Decision Making and Research

Logistic Regression. BUS 735: Business Decision Making and Research Goals of this section 2/ 8 Specific goals: Learn how to conduct regression analysis with a dummy independent variable. Learning objectives: LO2: Be able to construct and use multiple regression models

More information

Revenue Management with Correlated Demand Forecasting

Revenue Management with Correlated Demand Forecasting Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.

More information

QDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca

QDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca A01 084/01 university of milano bicocca QDquaderni department of informatics, systems and communication UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti research

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Psychology 209. Longitudinal Data Analysis and Bayesian Extensions Fall 2012

Psychology 209. Longitudinal Data Analysis and Bayesian Extensions Fall 2012 Instructor: Psychology 209 Longitudinal Data Analysis and Bayesian Extensions Fall 2012 Sarah Depaoli (sdepaoli@ucmerced.edu) Office Location: SSM 312A Office Phone: (209) 228-4549 (although email will

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Ping-Yin Kuan Department of Sociology Chengchi Unviersity Taiwan Presented

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling

An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling Social and Personality Psychology Compass 2/1 (2008): 302 317, 10.1111/j.1751-9004.2007.00054.x An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling Tony Jung and K. A. S. Wickrama*

More information

Bayesian Predictive Profiles with Applications to Retail Transaction Data

Bayesian Predictive Profiles with Applications to Retail Transaction Data Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification. COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Earnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet

Earnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet Earnings in private obs after participation to post-doctoral programs : an assessment using a treatment effect model Isabelle Recotillet Institute of Labor Economics and Industrial Sociology, UMR 6123,

More information

Missing data in randomized controlled trials (RCTs) can

Missing data in randomized controlled trials (RCTs) can EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Comparison of resampling method applied to censored data

Comparison of resampling method applied to censored data International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

More information

Learning diagnostic diagrams in transport-based data-collection systems

Learning diagnostic diagrams in transport-based data-collection systems University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers Faculty of Engineering and Information Sciences 2014 Learning diagnostic diagrams in transport-based data-collection

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

The Training Needs of Older Workers

The Training Needs of Older Workers The Training Needs of Older Workers Katrina Ball, Josie Misko and Andrew Smith National Centre for Vocational Education Research ABSTRACT The nature of work has been the subject of significant change in

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Get Britain Working Measures Official Statistics

Get Britain Working Measures Official Statistics Get Britain Working Measures Official Statistics Publication date: 9:30am Wednesday 21 August 2013 Contents Summary... 3 Introduction... 3 Get Britain Working Measures Policy Description... 3 Technical

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Business Process Modeling

Business Process Modeling Business Process Concepts Process Mining Kelly Rosa Braghetto Instituto de Matemática e Estatística Universidade de São Paulo kellyrb@ime.usp.br January 30, 2009 1 / 41 Business Process Concepts Process

More information

Michigan Department of Community Health

Michigan Department of Community Health Michigan Department of Community Health January 2007 INTRODUCTION The Michigan Department of Community Health (MDCH) asked Public Sector Consultants Inc. (PSC) to conduct a survey of licensed dental hygienists

More information

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

The Chinese Restaurant Process

The Chinese Restaurant Process COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

Analysis of counts with two latent classes, with application to risk assessment based on physician-visit records of cancer survivors

Analysis of counts with two latent classes, with application to risk assessment based on physician-visit records of cancer survivors Biostatistics Advance Access published December 1, 2013 Biostatistics (2013), pp. 1 14 doi:10.1093/biostatistics/kxt052 Analysis of counts with two latent classes, with application to risk assessment based

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Country Focus: Migration of Portuguese nationals during the crisis

Country Focus: Migration of Portuguese nationals during the crisis Country Focus: Migration of Portuguese nationals during the crisis João Peixoto, Joana Azevedo and Pedro Candeias ISEG, Universidade de Lisboa and ISCTE-Instituto Universitário de Lisboa Background The

More information

Statistics and Marketing. Peter E. Rossi and Greg M. Allenby

Statistics and Marketing. Peter E. Rossi and Greg M. Allenby Statistics and Marketing Peter E. Rossi and Greg M. Allenby Statistical research in marketing is heavily influenced by the availability of different types of data. The last ten years have seen an explosion

More information

IPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs

IPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs IPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs Intervention or Policy Evaluation Questions Design Questions Elements Types Key Points Introduction What Is Evaluation Design? Connecting

More information

Validation of Software for Bayesian Models Using Posterior Quantiles

Validation of Software for Bayesian Models Using Posterior Quantiles Validation of Software for Bayesian Models Using Posterior Quantiles Samantha R. COOK, Andrew GELMAN, and Donald B. RUBIN This article presents a simulation-based method designed to establish the computational

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

More information

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal

More information

Missing Data. Paul D. Allison INTRODUCTION

Missing Data. Paul D. Allison INTRODUCTION 4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)

More information

A revisit of the hierarchical insurance claims modeling

A revisit of the hierarchical insurance claims modeling A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014

More information

Business Statistics: Chapter 2: Data Quiz A

Business Statistics: Chapter 2: Data Quiz A CHAPTER 2 Quiz A Business Statistics, 2nd ed. 2-1 Business Statistics: Chapter 2: Data Quiz A Name 1. The mission of the Pew Internet & Life Project is to explore the impact of the Internet on families,

More information