evaluate_sample_k function

Evaluate sample k

Evaluate sample k

This functions calculates three indices (Davies-Bouldin, Calinsky-Harabasz and average Silhouette score) for each k. Calculations are made for a single sample and for a default range of k that goes from 3 to 10.

evaluate_sample_k( data, sample_id, samples_col = "Sample", abundance_col = "Abundance", range = 3:10, with_plot = FALSE, ... )

Arguments

  • data: a data.frame with, at least, the classification, abundance and sample information for each phylogenetic unit.
  • sample_id: String with name of the sample to apply this function.
  • samples_col: String with name of column with sample names.
  • abundance_col: string with name of column with abundance values. Default is "Abundance".
  • range: The range of values of k to test, default is from 3 to 10.
  • with_plot: If FALSE (default) returns a vector, but if TRUE will return a plot with the scores.
  • ...: Extra arguments.

Returns

A data.frame (or plot) with several indices for each number of clusters.

Details

Note : To get the indices for all samples, use evaluate_k() instead.

Data input

This function takes a data.frame with a column for samples and a column for abundance (minimum), but can take any number of other columns. It will then filter the specific sample that you want to analyze. You can also pre-filter for your specific sample, but you still need to provide the sample ID (sample_id) and the table always needs a column for Sample and another for Abundance (indicate how you name them with the arguments samples_col and abundance_col).

Output options

The default option returns a data.frame with Davies-Bouldin, Calinsky-Harabasz and average Silhouette scores for each k. This is a simple output that can then be used for other analysis. However, we also provide the option to show a plot (set with_plot = TRUE).

Three indices are calculated by this function:

  • Davies-Bouldin with check_DB();
  • Calinsky-Harabasz with check_DB();
  • average Silhouette score check_avgSil().

Examples

library(dplyr) # evaluate_sample_k(nice_tidy, sample_id = "ERR2044662") # To change range evaluate_sample_k(nice_tidy, sample_id = "ERR2044662", range = 4:11) # To make simple plot evaluate_sample_k(nice_tidy, sample_id = "ERR2044662", range = 4:11, with_plot =TRUE)

See Also

check_CH(), check_DB(), check_avgSil(), suggest_k(), evaluate_k()

  • Maintainer: Francisco Pascoal
  • License: GPL (>= 3)
  • Last published: 2025-04-07