# A hidden Markov model for criminal behaviour classification

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University of Florence, Italy.

2 RSS2004 p.2/19 Background Analysis of criminal behaviour: we want to model offending patterns as well as taking into account the nature of offending and the sequence of offence type; criminal histories recorded as official histories: England and Wales Offenders Index which is a court based record of the criminal histories of all offenders in England and Wales from 1963 to the current day; general population sample of n =5, 470 individuals paroled from the cohort of those born in 1953, and followed through to 1993; offences are combined into J =10major categories described in the Offendex Index Codebook (1998); following Francis et al. (2004) we have define T =6time windows or age strips:10-15,16-20, 21-25, 26-30,

3 RSS2004 p.3/19 Univariate Latent Markov model Used by Bijleveld and Mooijaart (2003): the offending pattern of a subject within strip age t, t =,...,T is represented by X t a single discrete random variable; {X t } depends only on a random process {C t }; {C t } follows a first-order homogeneous Markov chain with k states, initial probabilities π c s and transition probabilities π c1 c 2 ; the joint distribution of {X t } may be expressed as p(x 1 = x 1,...,X T = x T )= φ x1 c 1 π c1 φ x2 c 2 π c1 c 2 φ xt c T π ct 1 c T, c 2 c T c 1 where φ x c = p(x t = x C t = c).

4 RSS2004 p.4/19 Multivariate Extension X tj is a binary random variable equal to 1 if he/she is convicted for offence of type j within the strip age t and to 0 otherwise; we assume local independence i.e. that for t =1,..., T, X tj are conditionally independent given C t : φx c = p(x t = x C t = c) = J j=1 λ x j j c (1 λ j c) 1 x j, where λ j c = p(x tj =1 C t = c), X t =(X t1,,x tj ) and x j denotes the j element of the vector x.

5 RSS2004 p.5/19 Restricted version of the model (unidimensional Rasch) We assume that for each type of offence we have logit(λ j c )=α c + β j, (1) where α c is the tendency to commit crimes of the subject in the latent class c (i.e. individual characteristic) β j is the easiness to commit crime of type j; it allows for an appropriate labelling of the latent classes to order the latent classes λ j 1 <= <= λ j k, j =1,...,J, such constrain is used to formulate a latent class version of the Rasch (1961) model which is well-known in the Psychometric literature.

6 RSS2004 p.6/19 Restricted version of the model (multidimensional Rasch) The previous model assumes that each type of offence has the same latent trait: this may be too much restrictive; we consider that the crimes may be partitioned into s homogenous subgroups so that logit(λ j c )= s δ jd α cd + β j, (2) d=1 where α cd is the tendency of the subject in the latent class c to commit crimes in the subgroup d; δ jd is equal to 1 if the crime j is in the subgroup d and to 0 otherwise; we can classify the offences into groups where crimes belonging to the same group have the same latent trait.

7 RSS2004 p.7/19 Likelihood inference The log-likelihood of the model for an observed cohort of n subjects is l(θ) = n log[l i (θ)], i=1 where θ is the notation for all the parameters, L i (θ) is the function p(x i1,...,x it ) defined evaluated at θ. L i (θ) may be computed through the well-known recursions in the hidden Markov literature (see Levinson et al., 1983, and MacDonald and Zucchini, 1997, Sec. 2.2); l(θ) is maximized with the EM algorithm which requires the log-likelihood of the complete data l (θ).

8 RSS2004 p.8/19 The complete data log-likelihood may be expressed as l (θ) = v 1c log π c + u c1 c 2 log π c1 c 2 + c c 1 c 2 v itc {x itj log λ cj +(1 x itj )log(1 λ cj )}, i t c j where v itc is a dummy variable, referred to the i-th subject, which is equal to 1 if C t = c and to 0 otherwise, v tc = i v itc and u c1 c 2 is the number of transitions from the c 1 -th to the c 2 -th state.

9 RSS2004 p.9/19 EM algorithm E : computes the conditional expected value of l (θ), given the observed data and the current value of the parameters. M : updates the parameter estimates by maximizing the expected value of l (θ) computed above. When the model is constrained (unidimensional or multidimensional Rasch) the parameters α cd and β j are estimated by fitting a logistic model with a suitable design matrix Z defined according to the model of interest to the data.

10 RSS2004 p.10/19 Choice of the number of classes (k) The optimal number of latent classes can be chosen with the likelihood ratio between the model with k states and that with k +1 states, D k = 2(ˆl k ˆl k+1 ), for increasing values of k; or using the Bayesian Information Criterion (Kass and Raftery, 1995) defined as BIC k = 2l k + r k log(n) where r k is the number of parameters in the model with k states. According to this strategy, the optimal number of states is the one for that BIC k is minimum.

11 RSS2004 p.11/19 Choice of the number of latent traits The crimes are clustered using a hierarchical algorithm. At each step the algorithm aggregates the two cluster of crimes which are the closest in terms of deviance between the model fitted at the previous step and the multidimensional Rasch model fitted after the aggregation of the two clusters. The steps are iterated until the BIC of the resulting model is lower than the unconstrained model. The algorithm stops when all the items are grouped together.

12 An application We applied the model to a sample of n =5, 470 males taken from the dataset illustrated above; we used the estimated number of live births in the cohort year 1953 as reported by Prime et al. (2001). For a number of classes between 1 and 7 we obtain k l k r k BIC k 1 21, , , , , , , , , , , , , , 036 We choose k =5states as we have the smallest BIC. RSS2004 p.12/19

13 RSS2004 p.13/19 Choice of the clusters Using the hierarchical algorithm the best fit (BIC =35, 433) was for the following cluster aggregations for each of the the 10 typology of crimes and the estimation of β s. latent trait Offence s category (j) β j Violence against the person X Sexual offences X Burglary X Robbery X Theft and handling stolen goods X Fraud and Forgery X Criminal Damage X Drug Offences X Motoring Offences X Other offences X 7.493

14 RSS2004 p.14/19 Estimated α s parameters Values of the estimated tendencies of the subject for each latent state in every subgroup c α 1 α 2 α

15 Estimate of π and Π Initial probabilities π c π 1 π 2 π 3 π 4 π Transition probabilities π cd s of the Markov Chain are the following c RSS2004 p.15/19

16 RSS2004 p.16/19 Advantages of the proposed methodology We achieve parsimonious description of the dynamic process underlying the data; the approach is based on general population sample and not on an offender-based sample as in other studies; it allows to estimate a waste choice of models and to choose the best one going to the simple latent class model to the constrained model with subgroups; it can provide important information for policy, such as incarceration or incapacitation policy against the offenders.

17 RSS2004 p.17/19 Future extensions Constraint the probabilities λ j c s to be equal to 0 for a latent class so that this class may be identified as that of non-offensive subjects; consider also models in which the transition probabilities may vary with age (non homogeneous of the Markov chains); consider restriced models in which the transition matrix has a particular structure (e.g. triangular, symmetric); include explanatory variables, such as gender or race, in the model.

18 RSS2004 p.18/19 References Bijleveld, C. J. H., and Mooijaart, A. Neerlandica, 57, 3, (2003). Latent Markov Modelling of Recidivism Data. Statistica (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. series B, 39, Dempster, A. P., Laird, N. M. and Rubin, D. B. (1996). Using Bootstrap Likelihood Ratios in Finite Mixture Models. J. R. Statist. Soc., B, 58, Feng, Z. and McCulloch, C. E. (2004). Identifying Patterns and Pathways of Offending Behaviour: A New Approach to Typologies of Crime. European Journal of Criminology, 1, Francis, B., Soothill, K. and Fligelstone, R. Kass R. E. and Raftery A. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430), Lazarsfeld, P. F. and Henry, N. W (1968). Latent Structure Analysis. Boston: Houghton Mifflin. Levinson S. E., Rabiner, L. R. and Sondhi, M. M. (1983). An introduction to an application of theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Thechnical Journal, 62, (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86, Lindsay, B., Clogg, C. and Grego, J.

19 RSS2004 p.19/19 (1995). Patterns of drug use among white institutionalized delinquents in Georgia. Evidence from a latent class analysis. Journal of Drug Education, 25, McCutcheon, A. L. and Thomas, G. (1997). Hidden Markov and Other Models for Discrete-valued Time Series. London: Chapman & Hall. MacDonald I. and Zucchini W. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, New York, John and Wiley. (1998). Offenders Index Codebook, London: Home Office. Available at Research development and Statistics Directorate (2001). Criminal careers of those born between 1953 and Statistical Bulletin 4/01. London: Home Office. Prime, J., White, S., Liriano, S. and Patel, K. Rasch, G. (1961). On general laws and the meaning of measurement in psychology, Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, 4, (1973). Panel Analysis: Latent Probability Models for Attitudes and Behavior Processes. Amsterdam: Elsevier. Wiggins, L. M.

### Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

### Introduction to latent variable models

Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

### Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

### The Start of a Criminal Career: Does the Type of Debut Offence Predict Future Offending? Research Report 77. Natalie Owen & Christine Cooper

The Start of a Criminal Career: Does the Type of Debut Offence Predict Future Offending? Research Report 77 Natalie Owen & Christine Cooper November 2013 Contents Executive Summary... 3 Introduction...

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

### Lecture 10: Sequential Data Models

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Crime Location Crime Type Month Year Betting Shop Criminal Damage April 2010 Betting Shop Theft April 2010 Betting Shop Assault April 2010

Crime Location Crime Type Month Year Betting Shop Theft April 2010 Betting Shop Assault April 2010 Betting Shop Theft April 2010 Betting Shop Theft April 2010 Betting Shop Assault April 2010 Betting Shop

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### A general statistical framework for assessing Granger causality

A general statistical framework for assessing Granger causality The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Item Response Theory in R using Package ltm

Item Response Theory in R using Package ltm Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl Department of Statistics and Mathematics

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### Package MixGHD. June 26, 2015

Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Classifying Galaxies using a data-driven approach

Classifying Galaxies using a data-driven approach Supervisor : Prof. David van Dyk Department of Mathematics Imperial College London London, April 2015 Outline The Classification Problem 1 The Classification

### Curriculum Vitae of Francesco Bartolucci

Curriculum Vitae of Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia Via A. Pascoli, 20 06123 Perugia (IT) email: bart@stat.unipg.it http://www.stat.unipg.it/bartolucci

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### ASC 076 INTRODUCTION TO SOCIAL AND CRIMINAL PSYCHOLOGY

DIPLOMA IN CRIME MANAGEMENT AND PREVENTION COURSES DESCRIPTION ASC 075 INTRODUCTION TO SOCIOLOGY AND ANTHROPOLOGY Defining Sociology and Anthropology, Emergence of Sociology, subject matter and subdisciplines.

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Structural Equation Models: Mixture Models

Structural Equation Models: Mixture Models Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University Jay Magidson Statistical Innovations Inc. 1 Introduction This article discusses

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Automated Hierarchical Mixtures of Probabilistic Principal Component Analyzers

Automated Hierarchical Mixtures of Probabilistic Principal Component Analyzers Ting Su tsu@ece.neu.edu Jennifer G. Dy jdy@ece.neu.edu Department of Electrical and Computer Engineering, Northeastern University,

### Data a systematic approach

Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### Probabilistic trust models in network security

UNIVERSITY OF SOUTHAMPTON Probabilistic trust models in network security by Ehab M. ElSalamouny A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering

### An introduction to Hidden Markov Models

An introduction to Hidden Markov Models Christian Kohlschein Abstract Hidden Markov Models (HMM) are commonly defined as stochastic finite state machines. Formally a HMM can be described as a 5-tuple Ω

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### Female offenders and child dependents. Ministry of Justice

Female offenders and child dependents Ministry of Justice 08 October 2015 Previous estimates of the proportion of female offenders who have child dependents at the time of their disposal have been based

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand

### Health Status Monitoring Through Analysis of Behavioral Patterns

Health Status Monitoring Through Analysis of Behavioral Patterns Tracy Barger 1, Donald Brown 1, and Majd Alwan 2 1 University of Virginia, Systems and Information Engineering, Charlottesville, VA 2 University

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### QDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca

A01 084/01 university of milano bicocca QDquaderni department of informatics, systems and communication UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti research

### A tutorial on Bayesian model selection. and on the BMSL Laplace approximation

A tutorial on Bayesian model selection and on the BMSL Laplace approximation Jean-Luc (schwartz@icp.inpg.fr) Institut de la Communication Parlée, CNRS UMR 5009, INPG-Université Stendhal INPG, 46 Av. Félix

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### A Bayesian Antidote Against Strategy Sprawl

A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)

### UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Latent Class (Finite Mixture) Segments How to find them and what to do with them

Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview

### An Outcome Analysis of Connecticut s Halfway House Programs

An Outcome Analysis of Connecticut s Halfway House Programs Stephen M. Cox, Ph.D. Professor Department of Criminology and Criminal Justice Central Connecticut State University Study Impetus and Purpose

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### Language Modeling. Chapter 1. 1.1 Introduction

Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### Hypothesis testing and the error of the third kind

Psychological Test and Assessment Modeling, Volume 54, 22 (), 9-99 Hypothesis testing and the error of the third kind Dieter Rasch Abstract In this note it is shown that the concept of an error of the

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Package EstCRM. July 13, 2015

Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

### A mixture model for random graphs

A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Central Statistics Office (CSO) Recorded Crime Statistics Frequently Asked Questions

Central Statistics Office (CSO) Recorded Crime Statistics Frequently Asked Questions 26th June 2014 Introduction. The purposes of this document is to address some commonly asked questions about CSO recorded

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### METHOD OF MOMENTS LEARNING FOR LEFT-TO-RIGHT HIDDEN MARKOV MODELS

METHOD OF MOMENTS LEARNING FOR LEFT-TO-RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan [, Johannes Traa ], Paris Smaragdis [,],\, Daniel Hsu ]] [ UIUC Computer Science Department, ]] Columbia University Computer

### 6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

### Questionnaire: Domestic (Gender and Family) Violence Interventions

Questionnaire: Domestic (Gender and Family) Violence Interventions STRENGTHENING TRANSNATIONAL APPROACHES TO REDUCING REOFFENDING (STARR) On behalf of The Institute of Criminology STRENGTHENING TRANSNATIONAL

### An Extension of the CHAID Tree-based Segmentation Algorithm to Multiple Dependent Variables

An Extension of the CHAID Tree-based Segmentation Algorithm to Multiple Dependent Variables Jay Magidson 1 and Jeroen K. Vermunt 2 1 Statistical Innovations Inc., 375 Concord Avenue, Belmont, MA 02478,