wracog() R function from [imbalance]

Wrapper for rapidly converging Gibbs algorithm.

Generates synthetic minority examples by approximating their probability distribution until sensitivity of wrapper over validation

cannot be further improved. Works only on discrete numeric datasets.


wracog(
  train,
  validation,
  wrapper,
  slideWin = 10,
  threshold = 0.02,
  classAttr = "Class",
  ...
)

Arguments

train: data.frame. A initial dataset to generate first model. All columns, except classAttr one, have to be numeric or coercible to numeric.
validation: data.frame. A dataset to compare results of consecutive classifiers. Must have the same structure of train.
wrapper: An S3 object. There must exist a method trainWrapper implemented for the class of the object, and a predict method implemented for the class of the model returned by trainWrapper. Alternatively, it can the name of one of the wrappers distributed with the package, "KNN" or "C5.0".
slideWin: Number of last sensitivities to take into account to meet the stopping criteria. By default, 10.
threshold: Threshold that the last slideWin sensitivities mean should reach. By default, 0.02.
classAttr: character. Indicates the class attribute from train and validation. Must exist in them.
...: further arguments for wrapper.

Returns

A data.frame with the same structure as train, containing the generated synthetic examples.

Details

Until the last slideWin executions of wrapper over validation dataset reach a mean sensitivity lower than threshold, the algorithm keeps generating samples using Gibbs Sampler, and adding misclassified samples with respect to a model generated by a former train, to the train dataset. Initial model is built on initial train.

Examples


data(haberman)

# Create train and validation partitions of haberman
trainFold <- sample(1:nrow(haberman), nrow(haberman)/2, FALSE)
trainSet <- haberman[trainFold, ]
validationSet <- haberman[-trainFold, ]

# Defines our own wrapper with a C5.0 tree
myWrapper <- structure(list(), class="TestWrapper")
trainWrapper.TestWrapper <- function(wrapper, train, trainClass){
  C50::C5.0(train, trainClass)
}

# Execute wRACOG with our own wrapper
newSamples <- wracog(trainSet, validationSet, myWrapper,
                     classAttr = "Class")

# Execute wRACOG with predifined wrappers for "KNN" or "C5.0"
KNNSamples <- wracog(trainSet, validationSet, "KNN")
C50Samples <- wracog(trainSet, validationSet, "C5.0")

References

Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222–234.

imbalance package Read PDF manual

Maintainer: Ignacio Cordón
License: GPL (>= 2) | file LICENSE
Last published: 2020-04-07

Useful links

wracog function