clusterMI-package() R function from [clusterMI]

clusterMI: Cluster Analysis with Missing Values by Multiple Imputation

clusterMI is an R package to perform clustering with missing values. For achieving this goal, multiple imputation is used. The package offers various multiple imputation methods dedicated to clustered individuals, as discussed in Audigier et al. (2021) arXiv:2106.04424. In addition, it allows pooling results both in terms of partition and instability, as proposed in Audigier and Niang (2022) doi:10.1007/s11634-022-00519-1. Among applications, this instability measure can be used to choose a number of clusters with missing values.

Examples


data(wine, package = "clusterMI")

require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

# imputation
m <- 5 # number of imputed data sets. Should be larger in practice
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

# cluster analysis by kmeans and pooling
nnodes <- 2 # Number of CPU cores for parallel computing
res.pool <- clusterMI(res.imp, nnodes = nnodes)

res.pool$instability
table(ref, res.pool$part)

# choice of nb.clust

res.nbclust <- choosenbclust(res.pool)
res.nbclust$nb.clust

References

Audigier, V. and Niang, N., Clustering with missing data: which equivalent for Rubin's rules? Advances in Data Analysis and Classification doi:10.1007/s11634-022-00519-1, 2022.

Audigier, V., Niang, N., & Resche-Rigon, M. (2021). Clustering with missing data: which imputation model for which cluster analysis method?. arXiv preprint arXiv:2106.04424.

Audigier, V. (2024). Clustering on incomplete data using clusterMI. 10emes Rencontres R, Vannes, France. hal preprint hal:04550305.

Bar-Hen, A. and Audigier, V., An ensemble learning method for variable selection: application to high dimensional data and missing values, Journal of Statistical Computation and Simulation, doi:10.1080/00949655.2022.2070621, 2022.

Fang, Y. and Wang, J., Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468-477 doi:10.1016/j.csda.2011.09.003 2012.

clusterMI package Read PDF manual

Maintainer: Vincent Audigier
License: GPL-2 | GPL-3
Last published: 2025-02-24

Useful links