pdfos function

Probability density function estimation based oversampling

Probability density function estimation based oversampling

Generates synthetic minority examples for a numerical dataset approximating a Gaussian multivariate distribution which best fits the minority data.

pdfos(dataset, numInstances, classAttr = "Class")

Arguments

  • dataset: data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.
  • numInstances: Integer. Number of new minority examples to generate.
  • classAttr: character. Indicates the class attribute from dataset. Must exist in it.

Returns

A data.frame with the same structure as dataset, containing the generated synthetic examples.

Details

To generate the synthetic data, it approximates a normal distribution with mean a given example belonging to the minority class, and whose variance is the minority class variance multiplied by a constant; that constant is computed so that it minimizes the mean integrated squared error of a Gaussian multivariate kernel function.

Examples

data(iris0) newSamples <- pdfos(iris0, numInstances = 100, classAttr = "Class")

References

Gao, Ming; Hong, Xia; Chen, Sheng; Harris, Chris J.; Khalaf, Emad. Pdfos: Pdf Estimation Based Oversampling for Imbalanced Two-Class Problems. Neurocomputing 138 (2014), p. 248–259

Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986. – ISBN 0412246201

  • Maintainer: Ignacio Cordón
  • License: GPL (>= 2) | file LICENSE
  • Last published: 2020-04-07