Machine Learning for Integrating Partially Overlapped Genetic Datasets
Procrustes alignment and mapping back to distances
Run BESMI imputation for a list of dataset paths
Create masked matrices for BESMI
Impute a single dataset from masked matrix path
Iterative imputation with MICE (tails-chain)
KNN imputation sweep (uses VIM::kNN)
Prepare full GDM dataset from CSV or RData
Convert coordinate matrix to distance matrix
Create a heatmap of genetic distances (ggplot2)
Create MDS plot of genetic distances
Distance metrics
Determine bootstrap sample count for a given k
Initialize matrix by column means
Double-center a distance matrix
Export a simulated GDM to CSV
Perform MDS on a pair of distance matrices
Run simulation with predefined biological scenarios
Run a high-level genetic simulation with configurable model
Simulate genetic distances using realistic population structure
Create plotting handles for simulation results
Tools to simulate genetic distance matrices, align and compare them via multidimensional scaling (MDS) and Procrustes, and evaluate imputation with the Bootstrapping Evaluation for Structural Missingness Imputation (BESMI) framework. Methods align with Zhu et al. (2025) <doi:10.3389/fpls.2025.1543956> and the associated software resource Zhu (2025) <doi:10.26188/28602953>.
Useful links