metasnf R package [Documentation]

add_columns

Add columns to a dataframe

add_settings_matrix_rows

Add settings matrix rows

adjusted_rand_index_heatmap

Heatmap of pairwise adjusted rand indices between solutions

alluvial_cluster_plot

Alluvial plot of patients across cluster counts and important features

arrange_dl

Given a data_list object, sort data elements by subjectkey

assemble_data

Collapse a dataframe and/or a data_list into a single dataframe

assoc_pval_heatmap

Heatmap of pairwise associations between features

auto_plot

Automatically plot features across clusters

bar_plot

Bar plot separating a feature by cluster

batch_nmi

Calculate feature NMIs for a data_list and a derived solutions_matrix

batch_row_closure

Generate closure function to run batch_snf in an apply-friendly format

batch_snf_subsamples

Run SNF clustering pipeline on a list of subsampled data lists.

batch_snf

Run variations of SNF.

calc_aris

Meta-cluster calculations

calc_assoc_pval_matrix

Calculate p-values for all pairwise associations of features in a data...

calc_assoc_pval

Calculate p-values based on feature vectors and their types

calculate_coclustering

Calculate coclustering data.

calculate_db_indices

Calculate Davies-Bouldin indices

calculate_dunn_indices

Calculate Dunn indices

calculate_silhouettes

Calculate silhouette scores

cell_significance_fn

Place significance stars on ComplexHeatmap cells.

char_to_fac

Convert character-type columns of a dataframe to factor-type

check_dataless_annotations

Helper function to stop annotation building when no data was provided

check_hm_dependencies

Check for ComplexHeatmap and circlize dependencies

check_similarity_matrices

Check validity of similarity matrices

chi_squared_pval

Chi-squared test p-value (generic)

cocluster_density

Density plot coclustering stability across subsampled data.

cocluster_heatmap

Heatmap of observation co-clustering across resampled data.

coclustering_coverage_check

Coclustering coverage check

collapse_dl

Collapse a data_list into a single dataframe

colour_scale

Return a colour ramp for a given vector

convert_uids

Convert unique identifiers of data_list to 'subjectkey'

discretisation_evec_data

Internal function for estimate_nclust_given_graph

discretisation

Internal function for estimate_nclust_given_graph

dl_has_duplicates

Check if data list contains any duplicate features

dl_uid_first_col

Make the subjectkey UID columns of a data_list first

dl_variable_summary

Variable-level summary of a data_list

domain_merge

SNF scheme: Domain merge

domains

Domains

drop_inputs

Execute inclusion

esm_manhattan_plot

Manhattan plot of feature-cluster association p-values

estimate_nclust_given_graph

Estimate number of clusters for a similarity matrix

euclidean_distance

Distance metric: Euclidean distance

extend_solutions

Extend an solutions matrix to include outcome evaluations

fisher_exact_pval

Fisher exact test p-value

generate_annotations_list

Generate annotations list

generate_clust_algs_list

Generate a list of custom clustering algorithms

generate_data_list

Generate a data_list

generate_distance_metrics_list

Generate a list of distance metrics

generate_settings_matrix

Build a settings matrix

generate_weights_matrix

Generate a matrix to store feature weights

get_cluster_df

Extract cluster membership information from one solutions matrix row

get_cluster_solutions

Extract cluster membership information from a solutions_matrix

get_clusters

Extract cluster membership vector from one solutions matrix row

get_complete_uids

Pull complete-data UIDs from a list of dataframes

get_dist_matrix

Calculate distance matrices

get_dl_subjects

Extract subjects from a data_list

get_heatmap_order

Return the row or column ordering present in a heatmap

get_matrix_order

Return the hierarchical clustering order of a matrix

get_mean_pval

Get mean p-value

get_min_pval

Get minimum p-value

get_pvals

Get p-values from an extended solutions matrix

get_representative_solutions

Extract representative solutions from a matrix of ARIs

gower_distance

Distance metric: Gower distance

hamming_distance

Distance metric: Hamming distance

individual

SNF Scheme: Individual

jitter_plot

Jitter plot separating a feature by cluster

label_prop

Label propagation

label_splits

Convert a vector of partition indices into meta cluster labels

linear_adjust

Linearly correct data_list by features with unwanted signal

linear_model_pval

Linear model p-value (generic)

list_remove

Remove items from a data_list

lp_solutions_matrix

Label propagate cluster solutions to unclustered subjects

mc_manhattan_plot

Manhattan plot of feature-meta cluster associaiton p-values

merge_data_lists

Horizontally merge compatible data lists

merge_df_list

Merge list of dataframes

no_subs

Select all columns of a dataframe not starting with the 'subject_' pre...

numcol_to_numeric

Convert dataframe columns to numeric type

ord_reg_pval

Ordinal regression p-value

parallel_batch_snf

Parallel processing form of batch_snf

prefix_dl_sk

Add "subject_" prefix to all UID values in subjectkey column

pval_heatmap

Heatmap of p-values

random_removal

Generate random removal sequence

reduce_dl_to_common

Reduce data_list to common subjects

remove_dl_na

Remove NAs from a data_list object

rename_dl

Rename features in a data_list

reorder_dl_subs

Reorder the subjects in a data_list

resample

Helper resample function found in ?sample

save_heatmap

Save a heatmap object to a file

scale_diagonals

Adjust the diagonals of a matrix

settings_matrix_heatmap

Heatmap for visualizing a settings matrix

sew_euclidean_distance

Squared (excluding weights) Euclidean distance

shiny_annotator

Launch shiny app to identify meta cluster boundaries

similarity_matrix_heatmap

Plot heatmap of similarity matrix

similarity_matrix_path

Generate a complete path and filename to store an similarity matrix

siw_euclidean_distance

Squared (including weights) Euclidean distance

sn_euclidean_distance

Distance metric: Standard normalization then Euclidean

snf_step

Convert a data list to a similarity matrix through a variety of SNF sc...

spectral_eigen_classic

Clustering algorithm: Spectral clustering with eigen-gap heuristic

spectral_eigen

Clustering algorithm: Spectral clustering with eigen-gap heuristic

spectral_eight

Clustering algorithm: Spectral clustering for a eight cluster solution

spectral_five

Clustering algorithm: Spectral clustering for a five cluster solution

spectral_four

Clustering algorithm: Spectral clustering for a four cluster solution

spectral_nine

Clustering algorithm: Spectral clustering for a nine cluster solution

spectral_rot_classic

Clustering algorithm: Spectral clustering with rotation cost heuristic

spectral_rot

Clustering algorithm: Spectral clustering with rotation cost heuristic

spectral_seven

Clustering algorithm: Spectral clustering for a seven cluster solution

spectral_six

Clustering algorithm: Spectral clustering for a six cluster solution

spectral_ten

Clustering algorithm: Spectral clustering for a ten cluster solution

spectral_three

Clustering algorithm: Spectral clustering for a three cluster solution

spectral_two

Clustering algorithm: Spectral clustering for a two cluster solution

split_parser

Helper function to determine which row and columns to split on

subs

Select all columns of a dataframe starting with a given string prefix.

subsample_data_list

Create subsamples of a data_list

subsample_pairwise_aris

Calculate pairwise adjusted Rand indices across subsamples of data

summarize_clust_algs_list

Summarize a clust_algs_list object

summarize_dl

Summarize a data list

summarize_dml

Summarize metrics contained in a distance_metrics_list

summarize_pvals

Summarize p-value columns of an extended solutions matrix

train_test_assign

Training and testing split

two_step_merge

Two step SNF

var_manhattan_plot

Manhattan plot of feature-feature associaiton p-values

Download source package Read PDF manual

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintainer: Prashanth S Velayudhan
License: GPL (>= 3)
Last published: 2024-11-08

Useful links

metasnf1.1.2 package

Functions

Readme

Datasets

Dependencies

Imports

Versions

News