Meta Clustering with Similarity Network Fusion
Add columns to a dataframe
Add settings matrix rows
Heatmap of pairwise adjusted rand indices between solutions
Alluvial plot of patients across cluster counts and important features
Given a data_list object, sort data elements by subjectkey
Collapse a dataframe and/or a data_list into a single dataframe
Heatmap of pairwise associations between features
Automatically plot features across clusters
Bar plot separating a feature by cluster
Calculate feature NMIs for a data_list and a derived solutions_matrix
Generate closure function to run batch_snf in an apply-friendly format
Run SNF clustering pipeline on a list of subsampled data lists.
Run variations of SNF.
Meta-cluster calculations
Calculate p-values for all pairwise associations of features in a data...
Calculate p-values based on feature vectors and their types
Calculate coclustering data.
Calculate Davies-Bouldin indices
Calculate Dunn indices
Calculate silhouette scores
Place significance stars on ComplexHeatmap cells.
Convert character-type columns of a dataframe to factor-type
Helper function to stop annotation building when no data was provided
Check for ComplexHeatmap and circlize dependencies
Check validity of similarity matrices
Chi-squared test p-value (generic)
Density plot coclustering stability across subsampled data.
Heatmap of observation co-clustering across resampled data.
Coclustering coverage check
Collapse a data_list into a single dataframe
Return a colour ramp for a given vector
Convert unique identifiers of data_list to 'subjectkey'
Internal function for estimate_nclust_given_graph
Internal function for estimate_nclust_given_graph
Check if data list contains any duplicate features
Make the subjectkey UID columns of a data_list first
Variable-level summary of a data_list
SNF scheme: Domain merge
Domains
Execute inclusion
Manhattan plot of feature-cluster association p-values
Estimate number of clusters for a similarity matrix
Distance metric: Euclidean distance
Extend an solutions matrix to include outcome evaluations
Fisher exact test p-value
Generate annotations list
Generate a list of custom clustering algorithms
Generate a data_list
Generate a list of distance metrics
Build a settings matrix
Generate a matrix to store feature weights
Extract cluster membership information from one solutions matrix row
Extract cluster membership information from a solutions_matrix
Extract cluster membership vector from one solutions matrix row
Pull complete-data UIDs from a list of dataframes
Calculate distance matrices
Extract subjects from a data_list
Return the row or column ordering present in a heatmap
Return the hierarchical clustering order of a matrix
Get mean p-value
Get minimum p-value
Get p-values from an extended solutions matrix
Extract representative solutions from a matrix of ARIs
Distance metric: Gower distance
Distance metric: Hamming distance
SNF Scheme: Individual
Jitter plot separating a feature by cluster
Label propagation
Convert a vector of partition indices into meta cluster labels
Linearly correct data_list by features with unwanted signal
Linear model p-value (generic)
Remove items from a data_list
Label propagate cluster solutions to unclustered subjects
Manhattan plot of feature-meta cluster associaiton p-values
Horizontally merge compatible data lists
Merge list of dataframes
Select all columns of a dataframe not starting with the 'subject_' pre...
Convert dataframe columns to numeric type
Ordinal regression p-value
Parallel processing form of batch_snf
Add "subject_" prefix to all UID values in subjectkey column
Heatmap of p-values
Generate random removal sequence
Reduce data_list to common subjects
Remove NAs from a data_list object
Rename features in a data_list
Reorder the subjects in a data_list
Helper resample function found in ?sample
Save a heatmap object to a file
Adjust the diagonals of a matrix
Heatmap for visualizing a settings matrix
Squared (excluding weights) Euclidean distance
Launch shiny app to identify meta cluster boundaries
Plot heatmap of similarity matrix
Generate a complete path and filename to store an similarity matrix
Squared (including weights) Euclidean distance
Distance metric: Standard normalization then Euclidean
Convert a data list to a similarity matrix through a variety of SNF sc...
Clustering algorithm: Spectral clustering with eigen-gap heuristic
Clustering algorithm: Spectral clustering with eigen-gap heuristic
Clustering algorithm: Spectral clustering for a eight cluster solution
Clustering algorithm: Spectral clustering for a five cluster solution
Clustering algorithm: Spectral clustering for a four cluster solution
Clustering algorithm: Spectral clustering for a nine cluster solution
Clustering algorithm: Spectral clustering with rotation cost heuristic
Clustering algorithm: Spectral clustering with rotation cost heuristic
Clustering algorithm: Spectral clustering for a seven cluster solution
Clustering algorithm: Spectral clustering for a six cluster solution
Clustering algorithm: Spectral clustering for a ten cluster solution
Clustering algorithm: Spectral clustering for a three cluster solution
Clustering algorithm: Spectral clustering for a two cluster solution
Helper function to determine which row and columns to split on
Select all columns of a dataframe starting with a given string prefix.
Create subsamples of a data_list
Calculate pairwise adjusted Rand indices across subsamples of data
Summarize a clust_algs_list object
Summarize a data list
Summarize metrics contained in a distance_metrics_list
Summarize p-value columns of an extended solutions matrix
Training and testing split
Two step SNF
Manhattan plot of feature-feature associaiton p-values
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.
Useful links