Studying employment pathways of graduates by a latent Markov model
|
|
- Grace Evans
- 7 years ago
- Views:
Transcription
1 Studying employment pathways of graduates by a latent Markov model Fulvia Pennoni Abstract Motivated by an application to a longitudinal dataset deriving from administrative data which concern labour market and academic performances in Lombardy, we propose a multivariate latent Markov model with covariates for panel data. Our aim is to investigate how covariates influence labour market performance of the graduates which is measured through three type of response variables. The model is based on a Markov process to represent the latent characteristics of the subjects. Maximum likelihood estimation of the model parameters is based on the Expectation-Maximisation algorithm and it is performed by using a two-step approach first estimating a latent class model and then the latent Markov model. Key words: Expectation-Maximisation algorithm, human capital, labour market, latent variable model, panel data 1 Introduction In this paper we propose a model for the evaluation of the employment pathways in terms of wage, easiness in switching between types of position and employment skill of the graduates. In the present job market condition which is affected by the financial crisis, it is interesting to study university-to-work transition in terms of, human capital increase. As recently proposed by the OECD report [1] human capital is the knowledge, skill, competencies and attributes embodied in individuals that are relevant to economic activity. It is a complex, multifaced phenomenon which is not directly measurable only in terms of wage [2], [3]. Therefore the interest is on the evolution of a latent characteristic of an individual which is indirectly measured by certain response variables. Fulvia Pennoni Department of Statistics and Quantitive Methods, Via Bicocca degli Arcimboldi 8, 20126, Milano, fulvia.pennoni@unimib.it 1
2 2 Fulvia Pennoni A model that is suitable for the type of analysis above is the latent Markov model as proposed by [4]. The model represents the evolution of the latent characteristic of interest by an unobservable Markov chain which has a reduced number of states. The response variables are assumed to be conditionally independent given this latent process. In this paper the model is considered in its multivariate version including individual covariates in the latent process as proposed by [5]. Likelihood inference of the model is based on the EM algorithm [6]. 2 The dataset The dataset concerns 1624 individuals who graduated in 2007 from four universities of Milan. They have been followed along four quarters after the graduation date, covering one year. The choice of the specific 2007 cohort is motivated by the availability of the data coming from the following integrated databases: i) database of the observatory of the labour market in Lombardy, which has collected mandatory notices from public and private employers regarding changes in job status form 2000; ii) database of the Revenue office, which provides information about wages from 2004 to present of all subjects residing in Lombardy; iii) database of graduates from four universities of Milan, which provides information on the academic careers of graduates from The response variables are: i) employment status and type of employment contract indicating whether a subject is employed with a permanent or temporary contract, ii) job quality measured by the skill level of the job which is derived by a categorization of the job qualification made by the Italian National Institute of Statistics, iii) wages for each quarter. The available covariates concern the socio-demographic characteristics such as: i) age, ii) family income, iii) gender, iv) student employment, and academic characteristics such as v) type of degree, and vi) final grade. In Table 1 we report the descriptive statistics for the distribution of the available covariates, whereas in Table 2 we report the descriptive statistics for the response variables for each quarter of observation. 3 The proposed model With reference to a subject in the sample of n subjects observed at T time occasion, we introduce the symbol Y (t) to denote the vector of response of variables of interest at time occasion t, t = 1,...,T, which has elements Y (t) j, j = 1,...,r where each has c j levels. The symbol X (t) denotes the vector of all individual covariates available at the t-th time occasion. The proposed model assume the existence of a latent process U = (U (1),...,U (T ) ) which affects the distribution of the response variables. The main assumptions of the model are that the response variables in Y (t) are conditionally independent given the latent process U and that the latent process follows a first-order homogeneous Markov chain with state space {1,...,k}. We Y (t) j
3 Studying employment pathways of graduates by a latent Markov model 3 Covariate Category % Mean St.dev. age (in 2007) family income in Euro gender: male female employment before 2007 no yes type of degree technical architecture 8.93 business humanistic science final grade < cum laude Table 1 Descriptive statistics for the distributions of the covariates Type of contract Year st quarter 2nd quarter 3rd quarter 4th quarter none temporary permanent Skill none medium/low high Wage none less than 3750 e high than 3750 e Table 2 Frequency of every response variable for period of observation denote the conditional response probabilities by φ jy u = f (t) Y j U (t)(y u), j = 1,...,r,t = 1,...,T,u = 1,...,k,y = 0,...,c j 1. We admit that the covariates affect the distribution of the response variables given the latent process [5]. Therefore we have the following initial and transition probabilities of the latent process π u x = f U (t) X (t) (u x), u = 1,...,k π u ūx = f U (t) U (t 1),X (t) (u ū,x), t = 2,...,T, ū,u = 1,...,k,
4 4 Fulvia Pennoni where x denotes a realization of X (t), u denotes a realization of U (t), and ū denotes a realization of U (t 1). An interesting way to allow the initial and transition probabilities of the latent Markov chain to depend on the individual covariates is by adopting the following parameterization and log π u x π 1 x = β 0u + x β 1u, u = 2,...,k, (1) log π u ūx πū ūx = γ 0ūu + x (γ 1u γ 1ū ) u = 1,...,k, u ū. (2) This formulation is of interest when we want to understand how covariates affect the latent characteristic which is indirectly measured by the response variables. For classifying the sample of subjects on the basis of categorical response variables we rely on a latent class model [7]. According to this model we also select the number of latent classes by using the Bayesian Information Criterion (BIC, [8]). Then, the estimation is performed by maximizing the log-likelihood [ ] l(θ) = log i fỹ X (ỹ i x i ), where x i and ỹ i are vectors of observed data i = 1,...,n, and θ is the vector of all model parameters. Function l(θ) is efficiently computed by using a recursion which is known in the hidden Markov literature. Likelihood maximization is performed by an EM algorithm ([6], [9]) based on the complete data log-likelihood, that is the loglikelihood that we could compute if we knew the latent state of each subject at every occasion. Differently from a standard EM algorithm under this maximization we do not update the conditional response probabilities which are held fixed. In such a way, the algorithm is faster to converge as the number of iterations needed are much less. Once parameter estimates have been computed, standard errors are associated at this estimates. They are computed on the basis of nonparametric bootstrap [10] which consists of repeatedly drawing samples from the observed sample and computing the maximum likelihood estimates for every bootstrap sample. Then the standard error corresponding to the parameter estimate is found through the bootstrap distribution of the estimators. 4 Main results In applying the basic latent class model to the dataset, we chose the number of latent states k = 3. The maximum log-likelihood of the model is equal to ˆl = with 20 parameters. The corresponding value of BIC is The estimates of the conditional response probabilities according to this model are reported in Table 3.
5 Studying employment pathways of graduates by a latent Markov model 5 ˆφ jy u Type of contract u = 1 u = 2 u = 3 none temporary permanent Skill none medium/low high Wage none less than 3750 e high than 3750 e Table 3 Estimated conditional probabilities of labour condition under the selected model We observe different types of labour conditions given the latent classes. In particular, the first class, which is the largest with about 53% of subjects, corresponds to unemployed subjects which may have income from other sources. For the second class, including about the 27% of subjects, we have those subjects with temporary contract and with some qualified work but with low wage. For the third class, including 20% of subjects, we have subjects with stable high quality jobs and high income. According to the selected number of classes the analysis is focused on studying the dependence between the latent classes and the observable covariates by fitting the multivariate latent Markov model with covariates. The results of this fitting in terms of the parameters affecting the initial and the transition probabilities are reported in Table 4. Effect ˆβ12 ˆβ13 ˆγ 12 ˆγ 13 female student employee age grade grade grade grade grade grade family income/ architecture business humanistic science Table 4 Estimates of the regression parameters affecting the latent process ( minus the sample average, 95% bootstrap interval does not contain 0)
6 6 Fulvia Pennoni In order to properly interpret the results in this table we have to consider that the adopted model is based on the parameterizations (1) and (2). On the basis of the estimates of the regression coefficients for the multinomial logit of the latent classes, at the beginning of the time occasions, to be female has a positive effect on being in the second latent class with respect to a male. It means that females find more easily a low quality job compared to males. Having work experience during university has a strong positive effect on finding a first job and also a high quality employment. Students from high income families opt to continue their education or simply avoid search effort to find a job compared to those from low income families. Students with a technical degree have much more chance of getting a job position with respect to the other degrees. Even for the most qualified jobs people with a degree in architecture and humanistic are disadvantaged compared to those with a technical degree. Considering the subsequent periods of observation females are more likely to accept a low quality employment compared to their male counterparts, as well as student employees compared to students. We notice also that, less young graduates tend to have more difficulty to find a job. Moreover, technical degree helps to reach a high quality employment compared to the other degrees followed by business and science degrees. Acknowledgements We are grateful to Prof. M. Mezzanzanica and to Dr. M. Fontana, of the Interuniversity Research Centre on Public Services, University of Milano-Bicocca, for providing the dataset. We also acknowledge Finite mixture and latent variable models for causal inference and analysis of socio-economic data (FIRB - Futuro in ricerca) funded by the Italian Government (RBFR12SHVV). References 1. OECD (1998). Human Capital Investment. An International Comparison. Paris: Centre for International Research and Innovation. 2. Folloni, G. and Vittadini, G. (2010). Human capital measurement: a survey. Journal of Economics Surveys, 24, Wößmann, L. (2003). Specifying human capital. Journal of Economic Surveys, 17, Wiggins, L.M. (1973). Panel Analysis: Latent probability models for attitude and behavious processes. Elsevier. 5. Bartolucci, F. and Farcomeni, A. and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data. Chapman & Hall/CRC, Boca Raton. 6. Baum, L.E. and Petrie, T. and Soules, G. and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, Lazarsfeld, P.F. and Henry, N.W. (1968). Latent Structure Analysis. Houghton Mifflin, Boston. 8. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, Dempster, A.P. and Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, Efron, B. and Tibshirani J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC, New York.
A hidden Markov model for criminal behaviour classification
RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University
More informationItem selection by latent class-based methods: an application to nursing homes evaluation
Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University
More informationCurriculum Vitae of Francesco Bartolucci
Curriculum Vitae of Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia Via A. Pascoli, 20 06123 Perugia (IT) email: bart@stat.unipg.it http://www.stat.unipg.it/bartolucci
More informationCHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationChenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)
Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel
More informationA REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA
123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationUsing Mixture Latent Markov Models for Analyzing Change with Longitudinal Data
Using Mixture Latent Markov Models for Analyzing Change with Longitudinal Data Jay Magidson, Ph.D. President, Statistical Innovations Inc. Belmont, MA., U.S. Presented at Modern Modeling Methods (M3) 2013,
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationModel-Based Cluster Analysis for Web Users Sessions
Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr
More informationThe Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationMissing data: the hidden problem
white paper Missing data: the hidden problem Draw more valid conclusions with SPSS Missing Data Analysis white paper Missing data: the hidden problem 2 Just about everyone doing analysis has some missing
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationStatistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults
Journal of Data Science 7(2009), 469-485 Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Liang Zhu, Jianguo Sun and Phillip Wood University of Missouri Abstract: Alcohol
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More information15 Ordinal longitudinal data analysis
15 Ordinal longitudinal data analysis Jeroen K. Vermunt and Jacques A. Hagenaars Tilburg University Introduction Growth data and longitudinal data in general are often of an ordinal nature. For example,
More informationModule 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationAn extension of the factoring likelihood approach for non-monotone missing data
An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
More informationEthnicity and Second Generation Immigrants
Ethnicity and Second Generation Immigrants Christian Dustmann, Tommaso Frattini, Nikolaos Theodoropoulos Key findings: Ethnic minority individuals constitute a large and growing share of the UK population:
More informationIt is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
More informationCHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA
CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA Hatice UENAL Institute of Epidemiology and Medical Biometry, Ulm University, Germany
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More information200612 - ADL - Longitudinal Data Analysis
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 749 - MAT - Department
More informationThe HB. How Bayesian methods have changed the face of marketing research. Summer 2004
The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationIs Temporary Agency Employment a Stepping Stone for Immigrants?
D I S C U S S I O N P A P E R S E R I E S IZA DP No. 6405 Is Temporary Agency Employment a Stepping Stone for Immigrants? Elke Jahn Michael Rosholm March 2012 Forschungsinstitut zur Zukunft der Arbeit
More informationPublication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
More informationLogistic Regression. BUS 735: Business Decision Making and Research
Goals of this section 2/ 8 Specific goals: Learn how to conduct regression analysis with a dummy independent variable. Learning objectives: LO2: Be able to construct and use multiple regression models
More informationRevenue Management with Correlated Demand Forecasting
Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.
More informationQDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca
A01 084/01 university of milano bicocca QDquaderni department of informatics, systems and communication UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti research
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationPsychology 209. Longitudinal Data Analysis and Bayesian Extensions Fall 2012
Instructor: Psychology 209 Longitudinal Data Analysis and Bayesian Extensions Fall 2012 Sarah Depaoli (sdepaoli@ucmerced.edu) Office Location: SSM 312A Office Phone: (209) 228-4549 (although email will
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationWho Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008
Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008 Ping-Yin Kuan Department of Sociology Chengchi Unviersity Taiwan Presented
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationAn Introduction to Latent Class Growth Analysis and Growth Mixture Modeling
Social and Personality Psychology Compass 2/1 (2008): 302 317, 10.1111/j.1751-9004.2007.00054.x An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling Tony Jung and K. A. S. Wickrama*
More informationBayesian Predictive Profiles with Applications to Retail Transaction Data
Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationCOURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.
COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationOverview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
More informationEarnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet
Earnings in private obs after participation to post-doctoral programs : an assessment using a treatment effect model Isabelle Recotillet Institute of Labor Economics and Industrial Sociology, UMR 6123,
More informationMissing data in randomized controlled trials (RCTs) can
EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationComparison of resampling method applied to censored data
International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison
More informationLearning diagnostic diagrams in transport-based data-collection systems
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers Faculty of Engineering and Information Sciences 2014 Learning diagnostic diagrams in transport-based data-collection
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationThe Training Needs of Older Workers
The Training Needs of Older Workers Katrina Ball, Josie Misko and Andrew Smith National Centre for Vocational Education Research ABSTRACT The nature of work has been the subject of significant change in
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationGet Britain Working Measures Official Statistics
Get Britain Working Measures Official Statistics Publication date: 9:30am Wednesday 21 August 2013 Contents Summary... 3 Introduction... 3 Get Britain Working Measures Policy Description... 3 Technical
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationBusiness Process Modeling
Business Process Concepts Process Mining Kelly Rosa Braghetto Instituto de Matemática e Estatística Universidade de São Paulo kellyrb@ime.usp.br January 30, 2009 1 / 41 Business Process Concepts Process
More informationMichigan Department of Community Health
Michigan Department of Community Health January 2007 INTRODUCTION The Michigan Department of Community Health (MDCH) asked Public Sector Consultants Inc. (PSC) to conduct a survey of licensed dental hygienists
More informationScaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce
Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationMissing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator
More informationThe Chinese Restaurant Process
COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.
More informationA HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT
New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:
More informationAnalysis of counts with two latent classes, with application to risk assessment based on physician-visit records of cancer survivors
Biostatistics Advance Access published December 1, 2013 Biostatistics (2013), pp. 1 14 doi:10.1093/biostatistics/kxt052 Analysis of counts with two latent classes, with application to risk assessment based
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationMISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)
MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,
More informationSensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCountry Focus: Migration of Portuguese nationals during the crisis
Country Focus: Migration of Portuguese nationals during the crisis João Peixoto, Joana Azevedo and Pedro Candeias ISEG, Universidade de Lisboa and ISCTE-Instituto Universitário de Lisboa Background The
More informationStatistics and Marketing. Peter E. Rossi and Greg M. Allenby
Statistics and Marketing Peter E. Rossi and Greg M. Allenby Statistical research in marketing is heavily influenced by the availability of different types of data. The last ten years have seen an explosion
More informationIPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs
IPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs Intervention or Policy Evaluation Questions Design Questions Elements Types Key Points Introduction What Is Evaluation Design? Connecting
More informationValidation of Software for Bayesian Models Using Posterior Quantiles
Validation of Software for Bayesian Models Using Posterior Quantiles Samantha R. COOK, Andrew GELMAN, and Donald B. RUBIN This article presents a simulation-based method designed to establish the computational
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationStatistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation
Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk
More informationEfficient and Practical Econometric Methods for the SLID, NLSCY, NPHS
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal
More informationMissing Data. Paul D. Allison INTRODUCTION
4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)
More informationA revisit of the hierarchical insurance claims modeling
A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014
More informationBusiness Statistics: Chapter 2: Data Quiz A
CHAPTER 2 Quiz A Business Statistics, 2nd ed. 2-1 Business Statistics: Chapter 2: Data Quiz A Name 1. The mission of the Pew Internet & Life Project is to explore the impact of the Internet on families,
More information