A hidden Markov model for criminal behaviour classification

Size: px
Start display at page:

Download "A hidden Markov model for criminal behaviour classification"

Transcription

1 RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University of Florence, Italy.

2 RSS2004 p.2/19 Background Analysis of criminal behaviour: we want to model offending patterns as well as taking into account the nature of offending and the sequence of offence type; criminal histories recorded as official histories: England and Wales Offenders Index which is a court based record of the criminal histories of all offenders in England and Wales from 1963 to the current day; general population sample of n =5, 470 individuals paroled from the cohort of those born in 1953, and followed through to 1993; offences are combined into J =10major categories described in the Offendex Index Codebook (1998); following Francis et al. (2004) we have define T =6time windows or age strips:10-15,16-20, 21-25, 26-30,

3 RSS2004 p.3/19 Univariate Latent Markov model Used by Bijleveld and Mooijaart (2003): the offending pattern of a subject within strip age t, t =,...,T is represented by X t a single discrete random variable; {X t } depends only on a random process {C t }; {C t } follows a first-order homogeneous Markov chain with k states, initial probabilities π c s and transition probabilities π c1 c 2 ; the joint distribution of {X t } may be expressed as p(x 1 = x 1,...,X T = x T )= φ x1 c 1 π c1 φ x2 c 2 π c1 c 2 φ xt c T π ct 1 c T, c 2 c T c 1 where φ x c = p(x t = x C t = c).

4 RSS2004 p.4/19 Multivariate Extension X tj is a binary random variable equal to 1 if he/she is convicted for offence of type j within the strip age t and to 0 otherwise; we assume local independence i.e. that for t =1,..., T, X tj are conditionally independent given C t : φx c = p(x t = x C t = c) = J j=1 λ x j j c (1 λ j c) 1 x j, where λ j c = p(x tj =1 C t = c), X t =(X t1,,x tj ) and x j denotes the j element of the vector x.

5 RSS2004 p.5/19 Restricted version of the model (unidimensional Rasch) We assume that for each type of offence we have logit(λ j c )=α c + β j, (1) where α c is the tendency to commit crimes of the subject in the latent class c (i.e. individual characteristic) β j is the easiness to commit crime of type j; it allows for an appropriate labelling of the latent classes to order the latent classes λ j 1 <= <= λ j k, j =1,...,J, such constrain is used to formulate a latent class version of the Rasch (1961) model which is well-known in the Psychometric literature.

6 RSS2004 p.6/19 Restricted version of the model (multidimensional Rasch) The previous model assumes that each type of offence has the same latent trait: this may be too much restrictive; we consider that the crimes may be partitioned into s homogenous subgroups so that logit(λ j c )= s δ jd α cd + β j, (2) d=1 where α cd is the tendency of the subject in the latent class c to commit crimes in the subgroup d; δ jd is equal to 1 if the crime j is in the subgroup d and to 0 otherwise; we can classify the offences into groups where crimes belonging to the same group have the same latent trait.

7 RSS2004 p.7/19 Likelihood inference The log-likelihood of the model for an observed cohort of n subjects is l(θ) = n log[l i (θ)], i=1 where θ is the notation for all the parameters, L i (θ) is the function p(x i1,...,x it ) defined evaluated at θ. L i (θ) may be computed through the well-known recursions in the hidden Markov literature (see Levinson et al., 1983, and MacDonald and Zucchini, 1997, Sec. 2.2); l(θ) is maximized with the EM algorithm which requires the log-likelihood of the complete data l (θ).

8 RSS2004 p.8/19 The complete data log-likelihood may be expressed as l (θ) = v 1c log π c + u c1 c 2 log π c1 c 2 + c c 1 c 2 v itc {x itj log λ cj +(1 x itj )log(1 λ cj )}, i t c j where v itc is a dummy variable, referred to the i-th subject, which is equal to 1 if C t = c and to 0 otherwise, v tc = i v itc and u c1 c 2 is the number of transitions from the c 1 -th to the c 2 -th state.

9 RSS2004 p.9/19 EM algorithm E : computes the conditional expected value of l (θ), given the observed data and the current value of the parameters. M : updates the parameter estimates by maximizing the expected value of l (θ) computed above. When the model is constrained (unidimensional or multidimensional Rasch) the parameters α cd and β j are estimated by fitting a logistic model with a suitable design matrix Z defined according to the model of interest to the data.

10 RSS2004 p.10/19 Choice of the number of classes (k) The optimal number of latent classes can be chosen with the likelihood ratio between the model with k states and that with k +1 states, D k = 2(ˆl k ˆl k+1 ), for increasing values of k; or using the Bayesian Information Criterion (Kass and Raftery, 1995) defined as BIC k = 2l k + r k log(n) where r k is the number of parameters in the model with k states. According to this strategy, the optimal number of states is the one for that BIC k is minimum.

11 RSS2004 p.11/19 Choice of the number of latent traits The crimes are clustered using a hierarchical algorithm. At each step the algorithm aggregates the two cluster of crimes which are the closest in terms of deviance between the model fitted at the previous step and the multidimensional Rasch model fitted after the aggregation of the two clusters. The steps are iterated until the BIC of the resulting model is lower than the unconstrained model. The algorithm stops when all the items are grouped together.

12 An application We applied the model to a sample of n =5, 470 males taken from the dataset illustrated above; we used the estimated number of live births in the cohort year 1953 as reported by Prime et al. (2001). For a number of classes between 1 and 7 we obtain k l k r k BIC k 1 21, , , , , , , , , , , , , , 036 We choose k =5states as we have the smallest BIC. RSS2004 p.12/19

13 RSS2004 p.13/19 Choice of the clusters Using the hierarchical algorithm the best fit (BIC =35, 433) was for the following cluster aggregations for each of the the 10 typology of crimes and the estimation of β s. latent trait Offence s category (j) β j Violence against the person X Sexual offences X Burglary X Robbery X Theft and handling stolen goods X Fraud and Forgery X Criminal Damage X Drug Offences X Motoring Offences X Other offences X 7.493

14 RSS2004 p.14/19 Estimated α s parameters Values of the estimated tendencies of the subject for each latent state in every subgroup c α 1 α 2 α

15 Estimate of π and Π Initial probabilities π c π 1 π 2 π 3 π 4 π Transition probabilities π cd s of the Markov Chain are the following c RSS2004 p.15/19

16 RSS2004 p.16/19 Advantages of the proposed methodology We achieve parsimonious description of the dynamic process underlying the data; the approach is based on general population sample and not on an offender-based sample as in other studies; it allows to estimate a waste choice of models and to choose the best one going to the simple latent class model to the constrained model with subgroups; it can provide important information for policy, such as incarceration or incapacitation policy against the offenders.

17 RSS2004 p.17/19 Future extensions Constraint the probabilities λ j c s to be equal to 0 for a latent class so that this class may be identified as that of non-offensive subjects; consider also models in which the transition probabilities may vary with age (non homogeneous of the Markov chains); consider restriced models in which the transition matrix has a particular structure (e.g. triangular, symmetric); include explanatory variables, such as gender or race, in the model.

18 RSS2004 p.18/19 References Bijleveld, C. J. H., and Mooijaart, A. Neerlandica, 57, 3, (2003). Latent Markov Modelling of Recidivism Data. Statistica (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. series B, 39, Dempster, A. P., Laird, N. M. and Rubin, D. B. (1996). Using Bootstrap Likelihood Ratios in Finite Mixture Models. J. R. Statist. Soc., B, 58, Feng, Z. and McCulloch, C. E. (2004). Identifying Patterns and Pathways of Offending Behaviour: A New Approach to Typologies of Crime. European Journal of Criminology, 1, Francis, B., Soothill, K. and Fligelstone, R. Kass R. E. and Raftery A. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430), Lazarsfeld, P. F. and Henry, N. W (1968). Latent Structure Analysis. Boston: Houghton Mifflin. Levinson S. E., Rabiner, L. R. and Sondhi, M. M. (1983). An introduction to an application of theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Thechnical Journal, 62, (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86, Lindsay, B., Clogg, C. and Grego, J.

19 RSS2004 p.19/19 (1995). Patterns of drug use among white institutionalized delinquents in Georgia. Evidence from a latent class analysis. Journal of Drug Education, 25, McCutcheon, A. L. and Thomas, G. (1997). Hidden Markov and Other Models for Discrete-valued Time Series. London: Chapman & Hall. MacDonald I. and Zucchini W. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, New York, John and Wiley. (1998). Offenders Index Codebook, London: Home Office. Available at Research development and Statistics Directorate (2001). Criminal careers of those born between 1953 and Statistical Bulletin 4/01. London: Home Office. Prime, J., White, S., Liriano, S. and Patel, K. Rasch, G. (1961). On general laws and the meaning of measurement in psychology, Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, 4, (1973). Panel Analysis: Latent Probability Models for Attitudes and Behavior Processes. Amsterdam: Elsevier. Wiggins, L. M.

Studying employment pathways of graduates by a latent Markov model

Studying employment pathways of graduates by a latent Markov model Studying employment pathways of graduates by a latent Markov model Fulvia Pennoni Abstract Motivated by an application to a longitudinal dataset deriving from administrative data which concern labour market

More information

Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

More information

Introduction to latent variable models

Introduction to latent variable models Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

More information

Hidden Markov Models: An Approach to Sequence Analysis in Population Studies

Hidden Markov Models: An Approach to Sequence Analysis in Population Studies Hidden Markov Models: An Approach to Sequence Analysis in Population Studies Danilo Bolano National Center of Competence in Research LIVES Institute for Demographic and Life Course Studies University of

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Abstract

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Abstract !!" # $&%('),.-0/12-35417698:4= 12-@?>ABAC8ED.-03GFH-I -J&KL.M>I = NPO>3Q/>= M,%('>)>A('(RS$B$&%BTU8EVXWY$&A>Z[?('\RX%Y$]W>\U8E^7_a`bVXWY$&A>Z[?('\RX)>?BT(' A Gentle Tutorial of the EM Algorithm and

More information

Part II: Web Content Mining Chapter 3: Clustering

Part II: Web Content Mining Chapter 3: Clustering Part II: Web Content Mining Chapter 3: Clustering Learning by Example and Clustering Hierarchical Agglomerative Clustering K-Means Clustering Probability-Based Clustering Collaborative Filtering Slides

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

More information

Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis

Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis Jürgen Rost University of Kiel A model is proposed that combines the theoretical strength of the Rasch model with the heuristic

More information

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Item Response Theory in R using Package ltm

Item Response Theory in R using Package ltm Item Response Theory in R using Package ltm Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl Department of Statistics and Mathematics

More information

A general statistical framework for assessing Granger causality

A general statistical framework for assessing Granger causality A general statistical framework for assessing Granger causality The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Regime Switching Models: An Example for a Stock Market Index

Regime Switching Models: An Example for a Stock Market Index Regime Switching Models: An Example for a Stock Market Index Erik Kole Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam April 2010 In this document, I discuss in detail

More information

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press -! Generative statistical models try to learn how to generate data from latent variables (for example class labels).

More information

The Start of a Criminal Career: Does the Type of Debut Offence Predict Future Offending? Research Report 77. Natalie Owen & Christine Cooper

The Start of a Criminal Career: Does the Type of Debut Offence Predict Future Offending? Research Report 77. Natalie Owen & Christine Cooper The Start of a Criminal Career: Does the Type of Debut Offence Predict Future Offending? Research Report 77 Natalie Owen & Christine Cooper November 2013 Contents Executive Summary... 3 Introduction...

More information

Learning with labeled and unlabeled data

Learning with labeled and unlabeled data Learning with labeled and unlabeled data page: 1 of 21 Learning with labeled and unlabeled data Author: Matthias Seeger, Institute for Adaptive Neural Computation, University of Edinburgh Presented by:

More information

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi CS 59000 Statistical Machine learning Lecture 18: Midterm Review Yuan (Alan) Qi Overview Overfitting, probabilities, decision theory, entropy and KL divergence, ML and Bayesian estimation of Gaussian and

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

More information

Lecture 10: Sequential Data Models

Lecture 10: Sequential Data Models CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Third Edition Ethem Alpaydın The MIT Press Cambridge, Massachusetts London, England 2014 Massachusetts Institute of Technology All rights reserved. No part of this book

More information

COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN

COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN COS 513: SEQUENCE MODELS I LECTURE ON NOV 22, 2010 PREM GOPALAN 1. INTRODUCTION In this lecture we consider how to model sequential data. Rather than assuming that the data are all independent of each

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

CSE 527 Notes Lecture 5, 10/13/04. Model-Based Clustering

CSE 527 Notes Lecture 5, 10/13/04. Model-Based Clustering Model-Based Clustering Review of Partitional Clustering, K-Means: 1. Decide # of clusters, K 2. Assign initial estimates for the center of each of K clusters 3. Assign each point to its nearest center

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Hidden Markov Models applied to Data Mining

Hidden Markov Models applied to Data Mining Hidden Markov Models applied to Data Mining Andrea Marin Dipartimento di Informatica Università Ca Foscari di Venezia Via Torino 155, 30172 Venezia Mestre, Italy marin@dsi.unive.it Abstract. Final task

More information

Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

More information

Multivariate Statistical Modelling Based on Generalized Linear Models

Multivariate Statistical Modelling Based on Generalized Linear Models Ludwig Fahrmeir Gerhard Tutz Multivariate Statistical Modelling Based on Generalized Linear Models Second Edition With contributions from Wolfgang Hennevogl With 51 Figures Springer Contents Preface to

More information

The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm

The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm Michael Collins 1 Introduction This note covers the following topics: The Naive Bayes model for classification (with text classification

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Fuzzy Clustering of Quantitative and Qualitative Data

Fuzzy Clustering of Quantitative and Qualitative Data Fuzzy Clustering of Quantitative and Qualitative Data Christian Döring, Christian Borgelt, and Rudolf Kruse Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg

More information

Crime Location Crime Type Month Year Betting Shop Criminal Damage April 2010 Betting Shop Theft April 2010 Betting Shop Assault April 2010

Crime Location Crime Type Month Year Betting Shop Criminal Damage April 2010 Betting Shop Theft April 2010 Betting Shop Assault April 2010 Crime Location Crime Type Month Year Betting Shop Theft April 2010 Betting Shop Assault April 2010 Betting Shop Theft April 2010 Betting Shop Theft April 2010 Betting Shop Assault April 2010 Betting Shop

More information

UNSUPERVISED LEARNING AND CLUSTERING. Jeff Robble, Brian Renzenbrink, Doug Roberts

UNSUPERVISED LEARNING AND CLUSTERING. Jeff Robble, Brian Renzenbrink, Doug Roberts UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised Procedures A procedure that uses unlabeled data in its classification process. Why would we use these? Collecting

More information

Latent class representation of the Grade of Membership model

Latent class representation of the Grade of Membership model Latent class representation of the Grade of Membership model By ELENA A. EROSHEVA Technical Report No. 492 Department of Statistics, University of Washington, Box 354322 Seattle, WA 98195-4322, U. S. A.

More information

Clustering / Unsupervised Methods

Clustering / Unsupervised Methods Clustering / Unsupervised Methods Jason Corso, Albert Chen SUNY at Buffalo J. Corso (SUNY at Buffalo) Clustering / Unsupervised Methods 1 / 41 Clustering Introduction Until now, we ve assumed our training

More information

MIXTURE DENSITY ESTIMATION

MIXTURE DENSITY ESTIMATION 3 MIXTURE DENSITY ESTIMATION In this chapter we consider mixture densities, the main building block for the dimension reduction techniques described in the following chapters. In the first section we introduce

More information

Bayesian Mixture Models and the Gibbs Sampler

Bayesian Mixture Models and the Gibbs Sampler Bayesian Mixture Models and the Gibbs Sampler David M. Blei Columbia University October 19, 2015 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical quantity

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

A Sparse Kernel Density Estimation Algorithm using Forward Constrained Regression

A Sparse Kernel Density Estimation Algorithm using Forward Constrained Regression A Sparse Kernel Density Estimation Algorithm using Forward Constrained Regression Xia Hong, Sheng Chen, and Chris Harris School of Systems Engineering, University of Reading, Reading, RG6 6AY, UK School

More information

SAMPLE SIZE DETERMINATION USING POSTERIOR PREDICTIVE DISTRIBUTIONS

SAMPLE SIZE DETERMINATION USING POSTERIOR PREDICTIVE DISTRIBUTIONS Sankhyā : The Indian Journal of Statistics Special Issue on Bayesian Analysis 1998, Volume spl, Series, Pt. 1, pp. 161-175 SAMPLE SIZE DETERMINATION USING POSTERIOR PREDICTIVE DISTRIBUTIONS By DONALD B.

More information

Curriculum Vitae of Francesco Bartolucci

Curriculum Vitae of Francesco Bartolucci Curriculum Vitae of Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia Via A. Pascoli, 20 06123 Perugia (IT) email: bart@stat.unipg.it http://www.stat.unipg.it/bartolucci

More information

Classifying Galaxies using a data-driven approach

Classifying Galaxies using a data-driven approach Classifying Galaxies using a data-driven approach Supervisor : Prof. David van Dyk Department of Mathematics Imperial College London London, April 2015 Outline The Classification Problem 1 The Classification

More information

Package MixGHD. June 26, 2015

Package MixGHD. June 26, 2015 Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author

More information

Structural Equation Models: Mixture Models

Structural Equation Models: Mixture Models Structural Equation Models: Mixture Models Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University Jay Magidson Statistical Innovations Inc. 1 Introduction This article discusses

More information

Density Estimation and Mixture Models

Density Estimation and Mixture Models Density Estimation and Mixture Models, Week 4 N. Schraudolph Today s Topics: Modeling: parametric vs. non-parametric models semi-parametric and mixture models Density Estimation: non-parametric: histogram,

More information

CHAPTER 5 SEMI-SUPERVISED LEARNING WITH HIDDEN STATE VECTOR MODEL

CHAPTER 5 SEMI-SUPERVISED LEARNING WITH HIDDEN STATE VECTOR MODEL CHAPTER 5 SEMI-SUPERVISED LEARNING WITH HIDDEN STATE VECTOR 65 Spoken Language Understanding has been a challenge in the design of the spoken dialogue system where the intention of the speaker has to be

More information

Analyzing categorical panel data by means of causal log-linear models with latent variables Vermunt, Jeroen; Georg, W.

Analyzing categorical panel data by means of causal log-linear models with latent variables Vermunt, Jeroen; Georg, W. Tilburg University Analyzing categorical panel data by means of causal log-linear models with latent variables Vermunt, Jeroen; Georg, W. Document version: Publisher's PDF, also known as Version of record

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

ASC 076 INTRODUCTION TO SOCIAL AND CRIMINAL PSYCHOLOGY

ASC 076 INTRODUCTION TO SOCIAL AND CRIMINAL PSYCHOLOGY DIPLOMA IN CRIME MANAGEMENT AND PREVENTION COURSES DESCRIPTION ASC 075 INTRODUCTION TO SOCIOLOGY AND ANTHROPOLOGY Defining Sociology and Anthropology, Emergence of Sociology, subject matter and subdisciplines.

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

GRADE PERFORMANCE IN STATISTICS: A BAYESIAN FRAMEWORK

GRADE PERFORMANCE IN STATISTICS: A BAYESIAN FRAMEWORK ABSTRACT GRADE PERFORMANCE IN STATISTICS: A BAYESIAN FRAMEWORK Lawrence V. Fulton, Texas State University, San Marcos, Texas, USA Nathaniel D. Bastian, University of Maryland University College, Adelphi,

More information

Missing Data Problems in Machine Learning

Missing Data Problems in Machine Learning Missing Data Problems in Machine Learning Senate Thesis Defense Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 8, 008 Contents: Overview Notation Theory Of

More information

Semi-supervised learning

Semi-supervised learning Semi-supervised learning Learning from both labeled and unlabeled data Semi-supervised learning Learning from both labeled and unlabeled data Motivation: labeled data may be hard/expensive to get, but

More information

Introduction to Mixture Modeling

Introduction to Mixture Modeling research methodology series Introduction to Mixture Modeling Kevin A. Kupzyk, MA Methodological Consultant, CYFS SRM Unit Originally presented on 12/11/09 by the Statistics & Research Methodology Unit

More information

Preserving Class Discriminatory Information by. Context-sensitive Intra-class Clustering Algorithm

Preserving Class Discriminatory Information by. Context-sensitive Intra-class Clustering Algorithm Preserving Class Discriminatory Information by Context-sensitive Intra-class Clustering Algorithm Yingwei Yu, Ricardo Gutierrez-Osuna, and Yoonsuck Choe Department of Computer Science Texas A&M University

More information

Approximate Inference

Approximate Inference Approximate Inference IPAM Summer School Ruslan Salakhutdinov BCS, MIT Deprtment of Statistics, University of Toronto 1 Plan 1. Introduction/Notation. 2. Illustrative Examples. 3. Laplace Approximation.

More information

2005 JSM Presentation. Bayesian Models for Adjusting Response Bias in Survey Data: An Example in Estimating Rape and Domestic Violence from the NCVS

2005 JSM Presentation. Bayesian Models for Adjusting Response Bias in Survey Data: An Example in Estimating Rape and Domestic Violence from the NCVS Bayesian Models for Adjusting Response Bias in Survey Data: An Example in Estimating Rape and Domestic Violence from the NCVS Qingzhao Yu Elizabeth A. Stasny Statistics Department The Ohio State University

More information

Optimal Hedging of Interest Rate Exposure Given Credit Correlation

Optimal Hedging of Interest Rate Exposure Given Credit Correlation Spring 11 Optimal Hedging of Interest Rate Exposure Given Credit Correlation Ray Chen, Abhay Subramanian, Xiao Tang, Michael Turrin Stanford University, MS&E 444 1 1. Introduction Interest rate risk arises

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

An introduction to Hidden Markov Models

An introduction to Hidden Markov Models An introduction to Hidden Markov Models Christian Kohlschein Abstract Hidden Markov Models (HMM) are commonly defined as stochastic finite state machines. Formally a HMM can be described as a 5-tuple Ω

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

On Multi-dimensional Markov Chain Models

On Multi-dimensional Markov Chain Models On Multi-dimensional Markov Chain Models Wai-Ki Ching Shu-Qin Zhang Advanced Modeling and Applied Computing Laboratory Department of Mathematics The University of Hong Kong Pokfulam Road, Hong Kong E-mail:

More information

Probabilistic trust models in network security

Probabilistic trust models in network security UNIVERSITY OF SOUTHAMPTON Probabilistic trust models in network security by Ehab M. ElSalamouny A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering

More information

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

F].SLR0E IVX] To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

F].SLR0E IVX] To make a grammar probabilistic, we need to assign a probability to each context-free rewrite Notes on the Inside-Outside Algorithm F].SLR0E IVX] To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

10-601: Machine Learning Midterm Exam November 3, Solutions

10-601: Machine Learning Midterm Exam November 3, Solutions 10-601: Machine Learning Midterm Exam November 3, 2010 Solutions Instructions: Make sure that your exam has 16 pages (not including this cover sheet) and is not missing any sheets, then write your full

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Comparing Conditional and Marginal Direct Estimation of Subgroup Distributions

Comparing Conditional and Marginal Direct Estimation of Subgroup Distributions RESEARCH REPORT January 2003 RR-03-02 Comparing Conditional and Marginal Direct Estimation of Subgroup Distributions Matthias von Davier Research & Development Division Princeton, NJ 08541 Comparing Conditional

More information

Data a systematic approach

Data a systematic approach Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

The Exponential Family

The Exponential Family The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

More information

Class Notes: Week 3. proficient

Class Notes: Week 3. proficient Ronald Heck Class Notes: Week 3 1 Class Notes: Week 3 This week we will look a bit more into relationships between two variables using crosstabulation tables. Let s go back to the analysis of home language

More information

A crash course in probability and Naïve Bayes classification

A crash course in probability and Naïve Bayes classification Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

More information

Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

More information

Lecture 4: More on Continuous Random Variables and Functions of Random Variables

Lecture 4: More on Continuous Random Variables and Functions of Random Variables Lecture 4: More on Continuous Random Variables and Functions of Random Variables ELE 525: Random Processes in Information Systems Hisashi Kobayashi Department of Electrical Engineering Princeton University

More information

6.891 Machine learning and neural networks

6.891 Machine learning and neural networks 6.89 Machine learning and neural networks Mid-term exam: SOLUTIONS October 3, 2 (2 points) Your name and MIT ID: No Body, MIT ID # Problem. (6 points) Consider a two-layer neural network with two inputs

More information