Package funfem. March 13, 2015

Similar documents
Package MixGHD. June 26, 2015

Package MFDA. R topics documented: July 2, Version Date Title Model Based Functional Data Analysis

Package bigdata. R topics documented: February 19, 2015

FUNCTIONAL DATA ANALYSIS: INTRO TO R s FDA

Package metafuse. November 7, 2015

Package PACVB. R topics documented: July 10, Type Package

Package EstCRM. July 13, 2015

Package MBA. February 19, Index 7. Canopy LIDAR data

Package smoothhr. November 9, 2015

Package empiricalfdr.deseq2

Package missforest. February 20, 2015

Cluster Analysis using R

Cluster Analysis: Advanced Concepts

Package hier.part. February 20, Index 11. Goodness of Fit Measures for a Regression Hierarchy

Package neuralnet. February 20, 2015

Package MDM. February 19, 2015

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

Package tagcloud. R topics documented: July 3, 2015

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Package TSfame. February 15, 2013

Package retrosheet. April 13, 2015

FlowMergeCluster Documentation

Package biganalytics

Ira J. Haimowitz Henry Schwarz

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Package Rdsm. February 19, 2015

Data Mining Lab 5: Introduction to Neural Networks

Package cpm. July 28, 2015

Package DCG. R topics documented: June 8, Type Package

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package pdfetch. R topics documented: July 19, 2015

lavaan: an R package for structural equation modeling

A hidden Markov model for criminal behaviour classification

A mixture model for random graphs

Package CRM. R topics documented: February 19, 2015

Package ERP. December 14, 2015

Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon

The SPSS TwoStep Cluster Component

Package plan. R topics documented: February 20, 2015

Probabilistic Methods for Time-Series Analysis

Package cgdsr. August 27, 2015

Package treemap. February 15, 2013

Package GSA. R topics documented: February 19, 2015

Package ATE. R topics documented: February 19, Type Package Title Inference for Average Treatment Effects using Covariate. balancing.

Package ChannelAttribution

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Package bigrf. February 19, 2015

Transforming XML trees for efficient classification and clustering

Package png. February 20, 2015

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Towards running complex models on big data

Package survpresmooth

Package CoImp. February 19, 2015

Package HHG. July 14, 2015

Package uptimerobot. October 22, 2015

Charles BOUVEYRON. Professor of Statistics, Université Paris 5 René Descartes Head of the Department of Statistics, Laboratoire MAP5

STA 4273H: Statistical Machine Learning

Package hive. January 10, 2011

Package jrvfinance. R topics documented: October 6, 2015

Package support.ces. R topics documented: August 27, Type Package

Package support.ces. R topics documented: April 27, Type Package

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

Package hazus. February 20, 2015

Package globe. August 29, 2016

Package fastghquad. R topics documented: February 19, 2015

Package PRISMA. February 15, 2013

Structural Health Monitoring Tools (SHMTools)

Probabilistic Latent Semantic Analysis (plsa)

Package acrm. R topics documented: February 19, 2015

Penalized Logistic Regression and Classification of Microarray Data

Neural Networks Lesson 5 - Cluster Analysis

MACHINE LEARNING IN HIGH ENERGY PHYSICS

HT2015: SC4 Statistical Data Mining and Machine Learning

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Package erp.easy. September 26, 2015

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Public Transportation BigData Clustering

Course: Model, Learning, and Inference: Lecture 5

TOWARD BIG DATA ANALYSIS WORKSHOP

Apache Hama Design Document v0.6

Package LexisPlotR. R topics documented: April 4, Type Package

7 Time series analysis

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Introduction to Clustering

Transcription:

Package funfem March 13, 2015 Type Package Title Clustering in the Discriminative Functional Subspace Version 1.1 Date 2015-03-05 Author Charles Bouveyron Depends R (>= 2.10), MASS, fda, elasticnet Maintainer Charles Bouveyron <charles.bouveyron@parisdescartes.fr> Description The funfem algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace. License GPL-2 LazyLoad yes NeedsCompilation no Repository CRAN Date/Publication 2015-03-13 11:47:30 R topics documented: funfem-package...................................... 2 funfem........................................... 3 velib............................................. 5 velov............................................. 6 Index 8 1

2 funfem-package funfem-package Model-based clustering in the discriminative functional subspaces with the funfem algorithm Description Details The package provides the funfem algorithm (Bouveyron et al., 2014) which allows to cluster functional data by modeling the curves within a common and discriminative functional subspace. Package: funfem Type: Package Version: 1.0 Date: 2014-09-06 License: GPL-2 Author(s) Charles Bouveyron Maintainer: <charles.bouveyron@parisdescartes.fr> References C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014. Examples Clustering the well-known "Canadian temperature" data (Ramsay & Silverman) basis <- create.bspline.basis(c(0, 365), nbasis=21, norder=4) fdobj <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"],basis, fdnames=list("day", "Station", "Deg C"))$fd res = funfem(fdobj,k=4) Visualization of the partition and the group means par(mfrow=c(1,2)) plot(fdobj,col=res$cls,lwd=2,lty=1) fdmeans = fdobj; fdmeans$coefs = t(res$prms$my) plot(fdmeans,col=1:max(res$cls),lwd=2)

funfem 3 funfem The funfem algorithm for the clustering of functional data. Description Usage The funfem algorithm allows to cluster time series or, more generally, functional data. It is based on a discriminative functional mixture model which allows the clustering of the data in a unique and discriminative functional subspace. This model presents the advantage to be parsimonious and can therefore handle long time series. funfem(fd, K=2:6, model = "AkjBk", crit = "bic", init = "hclust", Tinit = c(), maxit = 50, eps = 1e-06, disp = FALSE, lambda = 0, graph = FALSE) Arguments fd K Value a functional data object produced by the fda package. an integer vector specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6. model a vector of discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". The option "all" executes the funfem algorithm on the 12 models and select the best model according to the maximum value obtained by model selection criterion. crit init Tinit maxit eps disp lambda graph A list is returned: the criterion to be used for model selection ( bic, aic or icl ). bic is the default. the initialization type ( random, kmeans of hclust ). hclust is the default. a n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual). the maximum number of iterations before the stop of the Fisher-EM algorithm. the threshold value for the likelihood differences to stop the Fisher-EM algorithm. if true, some messages are printed during the clustering. Default is false. the l0 penalty (between 0 and 1) for the sparse version. See (Bouveyron et al., 2014) for details. Default is 0. if true, it plots the evolution of the log-likelhood. Default is false. model the model name.

4 funfem K cls P prms U aic bic icl loglik ll nbprm call plot crit the number of groups. the group membership of each individual estimated by the Fisher-EM algorithm. the posterior probabilities of each individual for each group. the model parameters. the orientation of the functional subspace according to the basis functions. the value of the Akaike information criterion. the value of the Bayesian information criterion. the value of the integrated completed likelihood criterion. the log-likelihood values computed at each iteration of the FEM algorithm. the log-likelihood value obtained at the last iteration of the FEM algorithm. the number of free parameters in the model. the call of the function. some information to pass to the plot.fem function. the model selction criterion used. Author(s) Charles Bouveyron References C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014. Examples Clustering the well-known "Canadian temperature" data (Ramsay & Silverman) basis <- create.bspline.basis(c(0, 365), nbasis=21, norder=4) fdobj <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"],basis, fdnames=list("day", "Station", "Deg C"))$fd res = funfem(fdobj,k=4) Visualization of the partition and the group means par(mfrow=c(1,2)) plot(fdobj,col=res$cls,lwd=2,lty=1) fdmeans = fdobj; fdmeans$coefs = t(res$prms$my) plot(fdmeans,col=1:max(res$cls),lwd=2) DO NOT RUN Load the velib data and smoothing data(velib) basis<- create.fourier.basis(c(0, 181), nbasis=25) fdobj <- smooth.basis(1:181,t(velib$data),basis)$fd Clustrering with FunFEM res = funfem(fdobj,k=6,model= AkjBk,init= kmeans,lambda=0,disp=true)

velib 5 Visualization of group means fdmeans = fdobj; fdmeans$coefs = t(res$prms$my) plot(fdmeans,col=1:res$k,xaxt= n,lwd=2) axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) Choice of K res = funfem(fdobj,k=2:20,model= AkjBk,init= kmeans,lambda=0,disp=true) plot(2:20,res$plot$bic,type= b,xlab= K,main= BIC ) Computation of the closest stations from the group means par(mfrow=c(3,2)) for (i in 1:res$K) { matplot(t(velib$data[which.max(res$p[,i]),]),type= l,lty=i,col=i,xaxt= n, lwd=2,ylim=c(0,1)) axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) title(main=paste( Cluster,i, -,velib$names[which.max(res$p[,i])])) } Visualization in the discriminative subspace (projected scores) par(mfrow=c(1,1)) plot(t(fdobj$coefs) text(t(fdobj$coefs) Spatial visualization of the clustering (with library ggmap) library(ggmap) Mymap = get_map(location = Paris, zoom = 12, maptype = terrain ) ggmap(mymap) + geom_point(data=velib$position,aes(longitude,latitude), colour = I(res$cl), size = I(3)) FunFEM clustering with sparsity res2 = funfem(fdobj,k=res$k,model= AkjBk,init= user,tinit=res$p, lambda=0.01,disp=true) Visualization of group means and the selected functional bases split.screen(c(2,1)) fdmeans = fdobj; fdmeans$coefs = t(res2$prms$my) screen(1); plot(fdmeans,col=1:res2$k,xaxt= n,lwd=2); axis(1,at=seq(5,181,6), labels=velib$dates[seq(5,181,6)],las=2) basis$dropind = which(rowsums(abs(res2$u))==0) screen(2); plot(basis,col=1,lty=1,xaxt= n,xlab= Disc. basis functions ) axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) close.screen(all=true) velib NA Description This data set contains data from the bike sharing system of Paris, called Vélib. The data are loading profiles of the bike stations over one week. The data were collected every hour during the period Sunday 1st Sept. - Sunday 7th Sept., 2014.

6 velov Usage data(velib) Format The format is: - data: the loading profiles (nb of available bikes / nb of bike docks) of the 1189 stations at 181 time points. - position: the longitude and latitude of the 1189 bike stations. - dates: the download dates. - bonus: indicates if the station is on a hill (bonus = 1). - names: the names of the stations. Source The real time data are available at https://developer.jcdecaux.com/ (with an api key). References The data were first used in C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014. Examples data(velib) matplot(t(velib$data[1:5,]),type= l,lty=1,col=2:5,xaxt= n,lwd=2,ylim=c(0,1)) axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) velov NA Description This data set contains data from the bike sharing system of Lyon, called Vélo v. The data are loading profiles of the bike stations over one week. The data were collected every hour during the period Sunday 9th March - Sunday 16th March, 2014. Usage data(velov)

velov 7 Format Source The format is: - data: the loading profiles (nb of available bikes / nb of bike docks) of the 345 stations at 181 times. - position: the longitude and latitude of the 345 bike stations. - dates: the download dates. - bonus: indicates if the station is on a hill (bonus = 1). - names: the names of the stations. The real time data are available at https://developer.jcdecaux.com/ (with an api key). References The data were first used in C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014. Examples data(velov) matplot(t(velov$data[1:5,]),type= l,lty=1,col=2:5,xaxt= n,lwd=2,ylim=c(0,1)) axis(1,at=seq(5,181,6),labels=velov$dates[seq(5,181,6)],las=2)

Index Topic clustering funfem, 3 Topic datasets velib, 5 velov, 6 Topic functional data funfem, 3 Topic package funfem-package, 2 funfem, 3 funfem-package, 2 velib, 5 velov, 6 8