data: a data.frame with, at least, the classification, abundance and sample information for each phylogenetic unit.
sample_id: string with name of selected sample.
taxa_col: string with name of column with phylogenetic units. Usually OTU or ASV.
plot_all: If TRUE, will make a plot for all samples with mean and standard deviation. If FALSE (default), then the plot will illustrate a single sample, that you have to specifiy in sample_id argument.
silhouette_score: string with column name with silhouette score values. Default is "Silhouette_scores"
classification_col: string with name of column with classification for each row. Default value is "Classification".
abundance_col: string with name of column with abundance values. Default is "Abundance".
log_scaled: if TRUE then abundance scores will be shown in Log10 scale. Default to FALSE.
colors: vector with colors. Should have the same lenght as the number of classifications.
...: other arguments
Returns
A grid of ggplot objects with clustering results and silhouette plot obtained from define_rb().
Details
This function combined plot_ulrb_clustering() and plot_ulrb_silhouette(). The plots can be done for a single sample or for all samples.
The results from the main function of ulrb package, define_rb(), will include the classification of each taxa and the silhouette score obtained for each observation. Thus, to access the clustering results, there are two main plots to check:
the rank abundance curve obtained after ulrb classification;
and the silhouette plot.
Interpretation of Silhouette plot
Based on chapter 2 of "Finding Groups in Data: An Introduction to Cluster Analysis." (Kaufman and Rousseeuw, 1991); a possible interpretation of the clustering structure based on the Silhouette plot is:
0.71-1.00 (A strong structure has been found);
0.51-0.70 (A reasonable structure has been found);
0.26-0.50 (The structure is weak and could be artificial);
<0.26 (No structure has been found).
Examples
classified_species <- define_rb(nice_tidy)# Default parameters for a single sample ERR2044669plot_ulrb(classified_species, sample_id ="ERR2044669", taxa_col ="OTU", abundance_col ="Abundance")# All samples in a datasetplot_ulrb(classified_species, taxa_col ="OTU", abundance_col ="Abundance", plot_all =TRUE)# All samples with a log scaleplot_ulrb(classified_species, taxa_col ="OTU", abundance_col ="Abundance", plot_all =TRUE, log_scaled =TRUE)