The Self-Organizing Maps with Built-in Missing Data Imputation.
The Self-Organizing Maps with Built-in Missing Data Imputation.
imputeSOM is an extension of the online algorithm of the 'kohonen' package where missing data are imputed during the algorithm. All missing values are first imputed with initial values such as the mean of the observed variables.
data: a matrix or data.frame with continuous variables containing the observations to be mapped on the grid by the kohonen algorithm, even if there are incomplete.
grid: a grid for the codebook vectors: see somgrid.
rlen: the number of times the complete data set will be presented to the network.
alpha: learning rate, a vector of two numbers indicating the amount of change. Default is to decline linearly from 0.05 to 0.01 over rlen updates.
radius: the radius of the neighbourhood, either given as a single number or a vector (start, stop). If it is given as a single number the radius will change linearly from radius to zero; as soon as the neighbourhood gets smaller than one only the winning unit will be updated. Note that the default before version 3.0 was to run from radius to -radius. If nothing is supplied, the default is to start with a value that covers 2/3 of all unit-to-unit distances.
maxNA.fraction: the maximal fraction of values that may be NA to prevent the column to be removed.
keep.data: if TRUE, return original data and mapping information. If FALSE, only return the trained map (in essence the codebook vectors).
dist.fcts: distance function to be used for the data. Admissable values currently are "sumofsquares", "euclidean" and "manhattan. Default is to use "sumofsquares".
init: a matrix or data.frame corresponding to the initial values for the codebook vectors. It should have the same number of variables (columns) as the data. The number of rows corresponding to the number of units in the map.
Returns
An object of class "missSOM" with components - data: Data matrix, only returned if keep.data == TRUE.
ximp: Imputed data matrix.
unit.classif: Winning units for data objects, only returned if keep.data == TRUE.
distances: Distances of objects to their corresponding winning unit, only returned if keep.data == TRUE.
grid: The grid, an object of class somgrid.
codes: A list of matrices containing codebook vectors.
alpha, radius: Input arguments presented to the function.
maxNA.fraction: The maximal fraction of values that may be NA to prevent the column to be removed.
dist.fcts: The distance function used for the data.
Examples
data(wines)## Data with no missing values som.wines <- imputeSOM(scale(wines), grid = somgrid(5,5,"hexagonal"))summary(som.wines)print(dim(som.wines$data))## Data with missing values X <- scale(wines)missing_obs <- sample(1:nrow(wines),10, replace =FALSE)X[missing_obs,1:2]<-NaNsom.wines <- imputeSOM(X, grid = somgrid(5,5,"hexagonal"))summary(som.wines)print(dim(som.wines$ximp))print(sum(is.na(som.wines$ximp)))