SparseClustering function

Sparse Clustering

Sparse Clustering

Implements the sparse clustering methods of [Witten/Tibshirani, 2010].

SparseClustering(DataOrDistances, ClusterNo, Type="Hierarchical", PlotIt=F,Silent=FALSE, NoPerms=10,Wbounds, ...)

Arguments

  • DataOrDistances: Either a [1:n,1:d] matrix of dataset to be clustered. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features.

    or a [1:n,1:n] symmetric distance matrix.

  • ClusterNo: Numeric indicating number to cluster to find in Tree/ Dendrogramm in case of Type="Hierachical" or numer of cluster to use in Type="kmeans"

  • Type: (optional) Char selecting methods Hierarchical or kmeans. Default: "Hierarchical"

  • PlotIt: (optional) Boolean. Default = FALSE = No plotting performed.

  • Silent: (optional) Boolean: print output or not (Default = FALSE = no output)

  • NoPerms: (optional), numeric scalar, Number of permutations.

  • Wbounds: (optional) numeric vector, range of tuning parameters to consider. This is the L1 bound on w, the feature weights [Witten/Tibshirani, 2010].

  • ...: Further arguments passed on to sparcl HierarchicalSparseCluster or KMeansSparseCluster depending on Type.

Returns

List of - Cls: [1:n] numerical vector with n numbers defining the classification as the main output of the clustering algorithm. It has k unique numbers representing the arbitrary labels of the clustering.

  • Object: Object defined by clustering algorithm as the other output of this algorithm

  • Tree: Object Tree if Type="Hierachical" is used.

References

[Witten/Tibshirani, 2010] Witten, D. and Tibshirani, R.: A Framework for Feature Selection in Clustering. Journal of the American Statistical Association, Vol. 105(490), pp. 713-726, 2010.

Author(s)

Quirin Stier, Michael Thrun

Note

Quality of clustering results varies between sparse hierarchical if data is given in comparison to the case that distances are given.

Examples

# Hepta data("Hepta") Data = Hepta$Data V1 = SparseClustering(Data, ClusterNo=7, Type="kmeans") Cls1 = V1$Cls V2 = SparseClustering(Data, ClusterNo=7, Type="Hierarchical") Cls2 = V2$Cls InputDistances = parallelDist::parDist(Data, method="euclidean") DistanceMatrix = as.matrix(InputDistances) V3 = SparseClustering(DistanceMatrix, ClusterNo=7, Type="Hierarchical") Cls3 = V3$Cls ## Not run: set.seed(1) Data = matrix(rnorm(100*50),ncol=50) y = c(rep(1,50),rep(2,50)) Data[y==1,1:25] = Data[y==1,1:25]+2 V1 = SparseClustering(Data, ClusterNo=2, Type="kmeans") Cls1 = V1$Cls ## End(Not run)
  • Maintainer: Michael Thrun
  • License: GPL-3
  • Last published: 2023-10-19