train: data.frame. A initial dataset to generate first model. All columns, except classAttr one, have to be numeric or coercible to numeric.
validation: data.frame. A dataset to compare results of consecutive classifiers. Must have the same structure of train.
wrapper: An S3 object. There must exist a method trainWrapper implemented for the class of the object, and a predict method implemented for the class of the model returned by trainWrapper. Alternatively, it can the name of one of the wrappers distributed with the package, "KNN" or "C5.0".
slideWin: Number of last sensitivities to take into account to meet the stopping criteria. By default, 10.
threshold: Threshold that the last slideWin sensitivities mean should reach. By default, 0.02.
classAttr: character. Indicates the class attribute from train and validation. Must exist in them.
...: further arguments for wrapper.
Returns
A data.frame with the same structure as train, containing the generated synthetic examples.
Details
Until the last slideWin executions of wrapper over validation dataset reach a mean sensitivity lower than threshold, the algorithm keeps generating samples using Gibbs Sampler, and adding misclassified samples with respect to a model generated by a former train, to the train dataset. Initial model is built on initial train.
Examples
data(haberman)# Create train and validation partitions of habermantrainFold <- sample(1:nrow(haberman), nrow(haberman)/2,FALSE)trainSet <- haberman[trainFold,]validationSet <- haberman[-trainFold,]# Defines our own wrapper with a C5.0 treemyWrapper <- structure(list(), class="TestWrapper")trainWrapper.TestWrapper <-function(wrapper, train, trainClass){ C50::C5.0(train, trainClass)}# Execute wRACOG with our own wrappernewSamples <- wracog(trainSet, validationSet, myWrapper, classAttr ="Class")# Execute wRACOG with predifined wrappers for "KNN" or "C5.0"KNNSamples <- wracog(trainSet, validationSet,"KNN")C50Samples <- wracog(trainSet, validationSet,"C5.0")
References
Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222–234.