tsne2clus function

t-Stochastic Neighbor Embedding to Clusters

t-Stochastic Neighbor Embedding to Clusters

Finds clusters on a 2 dimensional map using Density-based spatial clustering of applications with noise (DBSCAN; Esther et al. 1996).

tsne2clus( S.tsne, ann = NULL, labels, aest = NULL, eps_res = 100, eps_range = c(0, 4), min.clus.size = 10, group.names = "Groups", xlab = "x: tSNE(X)", ylab = "y: tSNE(X)", clus = TRUE )

Arguments

  • S.tsne: Outcome of function "pca2tsne"
  • ann: Subjects' annotation data. An incidence matrix assigning subjects to classes of biological relevance. Meant to tune cluster assignation via Biological Homogeneity Index (BHI). If ann=NULL, the number of clusters is tuned with the Silhouette index instead of BHI. Defaults to NULL.
  • labels: Character vector with labels describing subjects. Meant to assign aesthetics to the visual display of clusters.
  • aest: Data frame containing points shape and color. Defaults to NULL.
  • eps_res: How many eps values should be explored between the specified range?
  • eps_range: Vector containing the minimum and maximum eps values to be explored. Defaults to c(0, 4).
  • min.clus.size: Minimum size for a cluster to appear in the visual display. Defaults to 10
  • group.names: The title for the legend's key if 'aest' is specified.
  • xlab: Name of the 'xlab'. Defaults to "x: tSNE(X)"
  • ylab: Name of the 'ylab'. Defaults to "y: tSNE(X)"
  • clus: Should we do clustering? Defaults to TRUE. If false, only point aesthetics are applied.

Returns

  • A list with the results of the DBSCAN clustering and (if argument 'plot'=TRUE) the corresponding graphical displays.

  • dbscan.res: a list with the results of the (sparse) SVD, containing:

    • cluster: Cluster partition.
    • eps: Optimal eps according to the Silhouette or Biological Homogeneity indexes criteria.
    • SIL: Maximum peak in the trajectory of the Silhouette index.
    • BHI: Maximum peak in the trajectory of the Biological Homogeneity index.
  • clusters.plot: A ggplot object with the clusters' graphical display.

Details

The function takes the outcome of pca2tsne (or a list containing any two-columns matrix) and finds clusters via DBSCAN. It extends code from the MEREDITH (Taskesen et al. 2016) and clValid (Datta & Datta, 2018) R packages to tune DBSCAN parameters with Silhouette or Biological Homogeneity indexes.

Examples

library(MOSS) library(viridis) library(cluster) library(annotate) # Using the 'iris' data tow show cluster definition via BHI criterion. set.seed(42) data(iris) # Scaling columns. X <- scale(iris[, -5]) # Calling pca2tsne to map the three variables onto a 2-D map. Z <- pca2tsne(X, perp = 30, n.samples = 1, n.iter = 1000) # Using 'species' as previous knoledge to identify clusters. ann <- model.matrix(~ -1 + iris[, 5]) # Getting clusters. tsne2clus(Z, ann = ann, labels = iris[, 5], aest = aest.f(iris[, 5]), group.names = "Species", eps_range = c(0, 3) ) # Example of usage within moss. set.seed(43) sim_blocks <- simulate_data()$sim_blocks out <- moss(sim_blocks[-4], tSNE = TRUE, cluster = list(eps_range = c(0, 4), eps_res = 100, min_clus_size = 1), plot = TRUE ) out$clus_plot out$clusters_vs_PCs

References

  • Ester, Martin, Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," 226_231.
  • Hahsler, Michael, and Matthew Piekenbrock. 2017. "Dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms." https://cran.r-project.org/package=dbscan.
  • Datta, Susmita, and Somnath Datta. 2006. Methods for Evaluating Clustering Algorithms for Gene Expression Data Using a Reference Set of Functional Classes. BMC Bioinformatics 7 (1). BioMed Central:397.
  • Taskesen, Erdogan, Sjoerd M. H. Huisman, Ahmed Mahfouz, Jesse H. Krijthe, Jeroen de Ridder, Anja van de Stolpe, Erik van den Akker, Wim Verheagh, and Marcel J. T. Reinders. 2016. Pan-Cancer Subtyping in a 2D-Map Shows Substructures That Are Driven by Specific Combinations of Molecular Characteristics. Scientific Reports 6 (1):24949.
  • Maintainer: Agustin Gonzalez-Reymundez
  • License: GPL-2
  • Last published: 2022-03-25