dataset: The original data.frame. All columns, except classAttr one, have to be numeric or coercible to numeric.
newSamples: A data.frame containing the samples to be filtered. Must have the same structure as dataset.
k: Integer. Number of nearest neighbours to use in KNN algorithm to rule out samples. By default, 3.
iterations: Integer. Number of iterations for the algorithm. By default, 100.
smoothFactor: A positive numeric. By default, 1.
classAttr: character. Indicates the class attribute from dataset and newSamples. Must exist in them.
Returns
Filtered samples as a data.frame with same structure as newSamples.
Details
Uses game theory and Nash equilibriums to calculate the minority examples probability of trully belonging to the minority class. It discards examples which at the final stage of the algorithm have more probability of being a majority example than a minority one.
Examples
data(iris0)newSamples <- smotefamily::SMOTE(iris0[,-5], iris0[,5])$syn_data
# SMOTE overrides Class attr turning it into class# and dataset must have same class attribute as newSamplesnames(newSamples)<- c(names(newSamples)[-5],"Class")neater(iris0, newSamples, k =5, iterations =100, smoothFactor =1, classAttr ="Class")
References
Almogahed, B.A.; Kakadiaris, I.A. Neater: Filtering of Over-Sampled Data Using Non-Cooperative Game Theory. Soft Computing 19 (2014), Nr. 11, p. 3301–3322.