Allows you to treat imbalanced discrete numeric datasets by generating synthetic minority examples, approximating their probability distribution.
racog(dataset, numInstances, burnin =100, lag =20, classAttr ="Class")
Arguments
dataset: data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.
numInstances: Integer. Number of new minority examples to generate.
burnin: Integer. It determines how many examples generated for a given one are going to be discarded firstly. By default, 100.
lag: Integer. Number of iterations between new generated example for a minority one. By default, 20.
classAttr: character. Indicates the class attribute from dataset. Must exist in it.
Returns
A data.frame with the same structure as dataset, containing the generated synthetic examples.
Details
Approximates minority distribution using Gibbs Sampler. Dataset must be discretized and numeric. In each iteration, it builds a new sample using a Markov chain. It discards first burnin iterations, and from then on, each lag iterations, it validates the example as a new minority example. It generates d(iterations−burnin)/lag where d is minority examples number.
Examples
data(iris0)# Generates new minority examplesnewSamples <- racog(iris0, numInstances =40, burnin =20, lag =10, classAttr ="Class")newSamples <- racog(iris0, numInstances =100)
References
Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222–234.