Package MixGHD. June 26, 2015



Similar documents
Package MFDA. R topics documented: July 2, Version Date Title Model Based Functional Data Analysis

Package EstCRM. July 13, 2015

Statistical Machine Learning from Data

Package MDM. February 19, 2015

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Lecture 9: Introduction to Pattern Analysis

Package neuralnet. February 20, 2015

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package PACVB. R topics documented: July 10, Type Package

Cluster analysis with SPSS: K-Means Cluster Analysis

Package missforest. February 20, 2015

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Cluster Analysis using R

Statistical Machine Learning

Package mcmcse. March 25, 2016

A hidden Markov model for criminal behaviour classification

Neural Networks Lesson 5 - Cluster Analysis

Bayesian Information Criterion The BIC Of Algebraic geometry

Package metafuse. November 7, 2015

K-Means Clustering Tutorial

Package hier.part. February 20, Index 11. Goodness of Fit Measures for a Regression Hierarchy

Package bigdata. R topics documented: February 19, 2015

STA 4273H: Statistical Machine Learning

Penalized Logistic Regression and Classification of Microarray Data

Package HHG. July 14, 2015

Introduction to Clustering

IBM SPSS Neural Networks 22

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Package CoImp. February 19, 2015

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Linear Classification. Volker Tresp Summer 2015

Package retrosheet. April 13, 2015

Package bigrf. February 19, 2015

Package acrm. R topics documented: February 19, 2015

Package dsmodellingclient

Package CRM. R topics documented: February 19, 2015

1. Classification problems

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

How To Identify Noisy Variables In A Cluster

HT2015: SC4 Statistical Data Mining and Machine Learning

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Lecture 3: Linear methods for classification

Visualization of textual data: unfolding the Kohonen maps.

Ira J. Haimowitz Henry Schwarz

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Support Vector Machines with Clustering for Training with Very Large Datasets

Package MBA. February 19, Index 7. Canopy LIDAR data

Classification Techniques for Remote Sensing

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files

Model-Based Cluster Analysis for Web Users Sessions

Component Ordering in Independent Component Analysis Based on Data Power

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Data Mining: Algorithms and Applications Matrix Math Review

UW CSE Technical Report Probabilistic Bilinear Models for Appearance-Based Vision

Categorical Data Visualization and Clustering Using Subjective Factors

Package tagcloud. R topics documented: July 3, 2015

Machine Learning and Pattern Recognition Logistic Regression

Package smoothhr. November 9, 2015

APPLIED MISSING DATA ANALYSIS

Package TSfame. February 15, 2013

COC131 Data Mining - Clustering

Chosen problems and their final solutions of Chap. 2 (Waldron)- Par 1

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Package PRISMA. February 15, 2013

Package erp.easy. September 26, 2015

Package ATE. R topics documented: February 19, Type Package Title Inference for Average Treatment Effects using Covariate. balancing.

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Machine Learning using MapReduce

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

New Ensemble Combination Scheme

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Package samplesize4surveys

Package support.ces. R topics documented: April 27, Type Package

SCAN: A Structural Clustering Algorithm for Networks

Programming Exercise 3: Multi-class Classification and Neural Networks

Introduction to Machine Learning Using Python. Vikram Kamath

Predict Influencers in the Social Network

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

How To Cluster

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

Standardization and Its Effects on K-Means Clustering Algorithm

Classification by Pairwise Coupling

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

MS1b Statistical Data Mining

Package empiricalfdr.deseq2

Time Series Analysis

Transcription:

Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author Cristina Tortora, Ryan P. Browne, Brian C. Franczak and Paul D. McNicholas. Maintainer Cristina Tortora <ctortora@math.mcmaster.ca> Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution.the first model 'MGHD' is the classical mixture of generalized hyperbolic distributions. The 'MGHFA' is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The 'MSGHD', mixture of multiple scaled generalized hyperbolic distributions. The 'cmsghd' is a 'MSGHD' with convex contour plots. The 'MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model. Imports Bessel,stats, mvtnorm, ghyp, numderiv, mixture, e1071 Depends MASS, R (>= 3.1.3) NeedsCompilation no License GPL (>= 2) Repository CRAN Date/Publication 2015-06-26 00:50:42 R topics documented: ARI............................................. 2 banknote........................................... 3 bankruptcy......................................... 3 cmsghd.......................................... 4 DA.............................................. 6 MCGHD.......................................... 8 MGHD........................................... 10 MGHFA........................................... 12 MSGHD........................................... 13 1

2 ARI Selection.......................................... 15 sonar............................................. 17 Index 18 ARI Adjusted Rand Index. Compares two classifications using the ddjusted Rand index (ARI). ARI(x=NULL, y=null) Arguments x Details Value A n dimensional vector of class labels. y A n dimensional vector of class labels.. The ARI has expected value 0 in case of random partition, it is equal to one in case of perfect agreement.. The adjusted Rand index value Author(s) Cristina Tortora Maintainer: Cristina Tortora <ctortora@mcmaster.ca> L. Hubert and P. Arabie (1985) Comparing Partitions, Journal of the Classification 2:193-218. Examples ##loading crabs data data(banknote) ##model estimation model=mghd(data=banknote[,2:7], G=2 ) #result ARI(model$model$map, banknote[,1])

banknote 3 banknote Swiss Banknote data The data set contain 6 measures ond 100 genuine and counterfeit Swiss franc banknotes. data(banknote) Format A data frame with the following variables: Status the status of the banknote: genuine or counterfeit Length Length of bill (mm) Left Width of left edge (mm) Right Width of right edge (mm) Bottom Bottom margin width (mm) Top Top margin width (mm) Diagonal Length of diagonal (mm) Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5-8 bankruptcy Bankruptcy data The data set contain the ratio of retained earnings (RE) to total assets, and the ratio of earnings before interests and taxes (EBIT) to total assets of 66 American firms recorded in the form of ratios. Half of the selected firms had filed for bankruptcy. data(bankruptcy)

4 cmsghd Format A data frame with the following variables: Y the status of the firm: 0 bankruptcy or 1 financially sound. RE ratio EBIT ratio Altman E.I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(4): 589-609 cmsghd Convex mixture of multiple scaled generalized hyperbolic distributions (cmsghd). Carries out model-based clustering using the convex mixture of multiple scaled generalized hyperbolic distributions. The cmsghd only allows conves level sets. cmsghd(data=null,gpar0=null,g=2,max.iter=100,label=null,eps=1e-2, method="km",scale=true) Arguments data gpar0 G max.iter label eps method scale A n x p matrix or data frame such that rows correspond to observations and columns correspond to variables. (optional) A list containing the initial parameters of the mixture model. See the Details section. The range of values for the number of clusters. (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. ( optional) A n dimensional vector, if label[i]=k then observation belongs to group k, if NULL then the data has no known groups. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical",random "random", and model based "modelbased" ( optional) A logical value indicating whether or not the data should be scaled, true by default.

cmsghd 5 Details Value The arguments gpar0, if specified, is a list structure containing at least one p dimensional vector mu, alpha and phi, a pxp matrix gamma, and a px2 matrix cpl containing the vector omega and the vector lambda. A list with components BIC Bayesian information criterion value for each G. model BIC ICL gpar loglik map z Author(s) A list containing the following elements for the selected model Bayesian information criterion value. ICL index. A list of the model parameters The log-likelihood values. A vector of integers indicating the maximum a posteriori classifications for the best model. A matrix giving the raw values upon which map is based. Cristina Tortora, Ryan P. Browne, and Paul D. McNicholas. Maintainer: Cristina Tortora <ctortora@mcmaster.ca> C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2014). A Mixture of Coalesced Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1403.2332 See Also MGHD MSGHD Examples ##Generate random data set.seed(3) mu1 <- mu2 <- c(0,0) Sigma1 <- matrix(c(1,0.85,0.85,1),2,2) Sigma2 <- matrix(c(1,-0.85,-0.85,1),2,2) X1 <- mvrnorm(n=150,mu=mu1,sigma=sigma1) X2 <- mvrnorm(n=150,mu=mu2,sigma=sigma2) X <- rbind(x1,x2) ##model estimation

6 DA em=cmsghd(x,g=2,max.iter=50,method="random") #result plot(x,col=em$model$map) DA Discriminant analysis using the mixture of generalized hyperbolic distributions. Carries out model-based discriminant analysis using 5 different models: the mixture of multiple scaled generalized hyperbolic distributions (MGHD), the mixture of generalized hyperbolic factor analyzers (MGHFA), the mixture of multiple scaled generalized hyperbolic distributions (MS- GHD),the mixture of convex multiple scaled generalized hyperbolic distributions (cmsghd) and the mixture of coaelesed generalized hyperbolic distributions (MCGHD). DA(train,trainL,test,testL,method="GHD",starting="km",max.iter=10, eps=1e-2,q=2,scale=true) Arguments train trainl test testl method starting max.iter eps A n1 x p matrix or data frame such that rows correspond to observations and columns correspond to variables of the training data set. A n1 dimensional vector of membership for the units of the training set. If trainl[i]=k then observation belongs to group k. A n2 x p matrix or data frame such that rows correspond to observations and columns correspond to variables of the test data set. A n2 dimensional vector of membership for the units of the test set. If testl[i]=k then observation belongs to group k. ( optional) A string indicating the method to be used form discriminant analysis, if not specified GHD is used. Alternative methods are: MGHFA, MSGHD, cmsghd, MCGHD. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical", random "random", and model based "modelbased" clustering (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration.

DA 7 q scale (optional) used only if MGHFA method is selected. A numerical parameter giving the number of factors. ( optional) A logical value indicating whether or not the data should be scaled, true by default. Value A list with components model A list with the model parameters. testmembership A vector of integers indicating the membership of the units in the test set ARI Author(s) A value indicating the adjusted rand index for the test set. Cristina Tortora Maintainer: Cristina Tortora <ctortora@mcmaster.ca> R.P. Browne, and P.D. McNicholas (2013). A Mixture of Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1305.1036 C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2014). A Mixture of Coalesced Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1403.2332 C. Tortora, P.D. McNicholas, and R.P. Browne (2014). A Mixture of Generalized Hyperbolic Factor Analyzers. Arxiv preprint arxiv:1311.6530 See Also MGHD MGHFA MSGHD cmsghd MCGHD Examples ##loading crabs data data(crabs) ##divide the data in training set and test set lab=rbind(matrix(1,100,1),matrix(2,100,1)) train=crabs[c(1:90,110:200),4:8] trainl=lab[c(1:90,110:200),1] test=crabs[91:109,4:8] testl=lab[91:109,1] ##model estimation model=da(train,trainl,test,testl,method="msghd",max.iter=20,starting="hierarchical") #result model$aritest

8 MCGHD MCGHD Mixture of coalesced generalized hyperbolic distributions (MCGHD). Carries out model-based clustering using the mixture of coalesced generalized hyperbolic distributions. MCGHD(data=NULL,gpar0=NULL,G=2,max.iter=100,eps=1e-2,label=NULL, method="km",scale=true) Arguments data gpar0 G max.iter eps label method scale A n x p matrix or data frame such that rows correspond to observations and columns correspond to variables. (optional) A list containing the initial parameters of the mixture model. See the Details section. The range of values for the number of clusters. (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. ( optional) A n dimensional vector, if label[i]=k then observation belongs to group k, if NULL then the data has no known groups. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical", random "random", and model based "modelbased" ( optional) A logical value indicating whether or not the data should be scaled, true by default. Details The arguments gpar0, if specified, has to be a list structure containing as much element as the number of components G. Each element must include the following parameters: one p dimensional vector mu, alpha and phi, a pxp matrix gamma, a px2 vector cpl containing the vectors omega and lambda, and a 2-dimensional vector containing the omega0 and lambda0.

MCGHD 9 Value A list with components BIC Bayesian information criterion value for each G. model BIC ICL gpar loglik map par z A list containing the following elements for the selected model Bayesian information criterion value. ICL index. A list of the model parameters in the rotated space. The log-likelihood values. A vector of integers indicating the maximum a posteriori classifications for the best model. A list of the model parameters. A matrix giving the raw values upon which map is based. Author(s) Cristina Tortora, Ryan P. Browne, Brian C. Franczak and Paul D. McNicholas. Maintainer: Cristina Tortora <ctortora@mcmaster.ca> C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2014). A Mixture of Coalesced Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1403.2332v5 See Also MGHD, MSGHD Examples ##loading banknote data data(banknote) ##model estimation model=mcghd(banknote[,2:7],g=2,max.iter=20) #result table(banknote[,1],model$model$map)

10 MGHD MGHD Mixture of generalized hyperbolic distributions (MGHD). Carries out model-based clustering and classification using the mixture of generalized hyperbolic distributions. MGHD(data=NULL,gpar0=NULL,G=2,max.iter=100,label=NULL,eps=1e-2, method="kmeans",scale=true) Arguments data gpar0 G max.iter label eps method scale A n x p matrix or data frame such that rows correspond to observations and columns correspond to variables. (optional) A list containing the initial parameters of the mixture model. See the Details section. The range of values for the number of clusters. (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. ( optional) A n dimensional vector, if label[i]=k then observation belongs to group k, If label[i]=0 then observation has no known group, if NULL then the data has no known groups. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical", random "random", and model based "modelbased" clustering ( optional) A logical value indicating whether or not the data should be scaled, true by default. Details The arguments gpar0, if specified, is a list structure containing at least one p dimensional vector mu, and alpha, a pxp matrix sigma, and a 2 dimensional vector containing omega and lambda.

MGHD 11 Value A list with components BIC Bayesian information criterion value for each G. model A list containing the following elements for the selected model BIC Bayesian information criterion value. ICL ICL index. gpar A list of the model parameters. loglik The log-likelihood values. map A vector of integers indicating the maximum a posteriori classifications for the best model. z A matrix giving the raw values upon which map is based. Author(s) Ryan P. Browne, Cristina Tortora Maintainer: Cristina Tortora <ctortora@mcmaster.ca> R.P. Browne, and P.D. McNicholas (2015). A Mixture of Generalized Hyperbolic Distributions. Canadian Journal of Statistics Examples ##loading crabs data data(crabs) ##model estimation model=mghd(data=crabs[,4:8], G=2 ) #result plot(model$model$loglik) table(model$model$map, crabs[,2]) ## Classification ##loading bankruptcy data data(bankruptcy) #70% belong to the training set label=bankruptcy[,1] #for a Classification porpuse the label cannot be 0 label[1:33]=2 a=round(runif(20)*65+1) label[a]=0 ##model estimation model=mghd(data=bankruptcy[,2:3], G=2, label=label ) #result table(model$model$map,bankruptcy[,1])

12 MGHFA MGHFA Mixture of generalized hyperbolic factor analyzers (MGHFA). Carries out model-based clustering and classification using the mixture of generalized hyperbolic factor analyzers. MGHFA(data=NULL, gpar0=null, G=2, max.iter=100, label =NULL,q=2,eps=1e-2, method="kmeans", scale=true ) Arguments data gpar0 G max.iter label q eps method scale A matrix or data frame such that rows correspond to observations and columns correspond to variables. (optional) A list containing the initial parameters of the mixture model. See the Details section. The range of values for the number of clusters. (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. ( optional) A n dimensional vector, if label[i]=k then observation belongs to group k, If label[i]=0 then observation has no known group, if NULL then the data has no known groups. The range of values for the number of factors. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical" and model based "modelbased" clustering ( optional) A logical value indicating whether or not the data should be scaled, true by default. Details The arguments gpar0, if specified, is a list structure containing at least one p dimensional vector mu, alpha and phi, a pxp matrix gamma, a 2 dimensional vector cpl containing omega and lambda.

MSGHD 13 Value A list with components BIC Bayesian information criterion value for each combination of G and q. model BIC gpar loglik map z Author(s) A list containing the following elements for the selected model Bayesian information criterion value. A list of the model parameters. The log-likelihood values. A vector of integers indicating the maximum a posteriori classifications for the best model. A matrix giving the raw values upon which map is based. Cristina Tortora, Ryan P. Browne, and Paul D. McNicholas. Maintainer: Cristina Tortora <ctortora@mcmaster.ca> C. Tortora, P.D. McNicholas, and R.P. Browne (2015). A Mixture of Generalized Hyperbolic Factor Analyzers. Advanced in Data Analysis and Classification. Examples ## Classification #70% belong to the training set data(sonar) label=sonar[,61] set.seed(135) a=round(runif(62)*207+1) label[a]=0 ##model estimation model=mghfa(data=sonar[,1:60], G=2, max.iter=25,q=2,label=label) #result table(model$model$map,sonar[,61]) MSGHD Mixture of multiple scaled generalized hyperbolic distributions (MS- GHD). Carries out model-based clustering using the mixture of multiple scaled generalized hyperbolic distributions.

14 MSGHD MSGHD(data=NULL,gpar0=NULL,G=2,max.iter=100,label=NULL,eps=1e-2, method="km",scale=true) Arguments data gpar0 G max.iter label eps method scale A n x p matrix or data frame such that rows correspond to observations and columns correspond to variables. (optional) A list containing the initial parameters of the mixture model. See the Details section. The range of values for the number of clusters. (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. ( optional) A n dimensional vector, if label[i]=k then observation belongs to group k, if NULL then the data has no known groups. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical", random "random", and model based "modelbased" clustering ( optional) A logical value indicating whether or not the data should be scaled, true by default. Details Value The arguments gpar0, if specified, is a list structure containing at least one p dimensional vector mu, alpha and phi, a pxp matrix gamma, and a px2 matrix cpl containing the vector omega and the vector lambda. A list with components BIC Bayesian information criterion value for each G. model BIC ICL gpar loglik map z A list containing the following elements for the selected model Bayesian information criterion value. ICL index. A list of the model parameters The log-likelihood values. A vector of integers indicating the maximum a posteriori classifications for the best model. A matrix giving the raw values upon which map is based.

Selection 15 Author(s) Cristina Tortora, Ryan P. Browne, Brian C. Franczak and Paul D. McNicholas. Maintainer: Cristina Tortora <ctortora@mcmaster.ca> C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2014). Mixture of Multiple Scaled Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1403.2332v2 See Also MGHD Examples ##loading banknote data data(banknote) ##model estimation model=msghd(banknote[,2:7],g=2,max.iter=30) #result table(banknote[,1],model$model$map) Selection Select the best result using the mixture of generalized hyperbolic distributions. Replicates a routine n times and gives as result the best solution, the available routines are: the mixture of multiple scaled generalized hyperbolic distributions (MGHD), the mixture of generalized hyperbolic factor analyzers (MGHFA), the mixture of multiple scaled generalized hyperbolic distributions (MSGHD), the mixture of convex multiple scaled generalized hyperbolic distributions (cmsghd) and the mixture of coaelesed generalized hyperbolic distributions (MCGHD). Selection(data,label, G=2, niter=50,method="ghd",starting="kmeans", max.iter=10,eps=1e-2,labelcl=null,q=2,scale=true,criterion="ari") Arguments data label A n x p matrix or data frame such that rows correspond to observations and columns correspond to variables. A n dimensional vector of known membership, if label[i]=k unit i belongs to cluster k. It is used to select the solution.

16 Selection G niter method starting max.iter eps labelcl q scale criterion A numerical parameter giving the number of clusters. A numerical parameter giving the number of replications. ( optional) A string indicating the method to be used, if not specified GHD is used. Alternative methods are: MGHFA, MSGHD, cmsghd, MCGHD. ( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical",random "random", and model based "modelbased" (optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use. (optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration. (optional) A n dimensional vector of membership, if labelcl[i]=k then observation belongs to group k, if NULL then the data has no known groups. (optional) used only if MGHFA method is selected. A numerical parameter giving the number of factors. ( optional) A logical value indicating whether or not the data should be scaled, true by default. ( optional) A string indicating the selection criterion, if not specified the adjusted rand index (ARI) is used. The alternative criteria are the BIC and the ICL. Details Select the best model according to the adjusted rand index or the BIC Value A list with components model ARI A list with the model parameters. A value indicating the adjusted rand index. Author(s) Cristina Tortora Maintainer: Cristina Tortora <ctortora@mcmaster.ca> R.P. Browne, and P.D. McNicholas (2015). A Mixture of Generalized Hyperbolic Distributions. Canadian Journal of Statistics C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2014). A Mixture of Coalesced Generalized Hyperbolic Distributions. Arxiv preprint arxiv:1403.2332 C. Tortora, P.D. McNicholas, and R.P. Browne (2014). A Mixture of Generalized Hyperbolic Factor Analyzers. Advanced in Data Analysis and Classification

sonar 17 See Also MGHD MGHFA MSGHD cmsghd MCGHD Examples ##loading bankruptcy data #data(bankruptcy) ##model estimation #mod=selection(bankruptcy[,2:3],bankruptcy[,1],method="mcghd",eps=0.001,max.iter=1000,niter=10) ##result #mod$ari sonar Sonar data Format Source The data report the patterns obtained by bouncing sonar signals at various angles and under various conditions. There are 208 patterns in all, 111 obtained by bouncing sonar signals off a metal cylinder and 97 obtained by bouncing signals off rocks. Each pattern is a set of 60 numbers (variables) taking values between 0 and 1. data(sonar) A data frame with 208 observations and 61 columns. The first 60 columns contain the variables. The 61st column gives the material: 1 rock, 2 metal. UCI machine learning repository R.P. Gorman and T. J. Sejnowski (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1: 75-89

Index Topic Classification MGHD, 10 MGHFA, 12 Topic Clustering cmsghd, 4 DA, 6 MCGHD, 8 MGHD, 10 MGHFA, 12 MSGHD, 13 Selection, 15 Topic Generalized hyperboilc distribution cmsghd, 4 DA, 6 MCGHD, 8 MGHD, 10 MGHFA, 12 MSGHD, 13 Selection, 15 Topic data sets banknote, 3 bankruptcy, 3 sonar, 17 ARI, 2 banknote, 3 bankruptcy, 3 cmsghd, 4, 7, 17 DA, 6 MCGHD, 7, 8, 17 MGHD, 5, 7, 9, 10, 15, 17 MGHFA, 7, 12, 17 MSGHD, 5, 7, 9, 13, 17 Selection, 15 sonar, 17 18