Probability density function estimation based oversampling
Probability density function estimation based oversampling
Generates synthetic minority examples for a numerical dataset approximating a Gaussian multivariate distribution which best fits the minority data.
pdfos(dataset, numInstances, classAttr ="Class")
Arguments
dataset: data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.
numInstances: Integer. Number of new minority examples to generate.
classAttr: character. Indicates the class attribute from dataset. Must exist in it.
Returns
A data.frame with the same structure as dataset, containing the generated synthetic examples.
Details
To generate the synthetic data, it approximates a normal distribution with mean a given example belonging to the minority class, and whose variance is the minority class variance multiplied by a constant; that constant is computed so that it minimizes the mean integrated squared error of a Gaussian multivariate kernel function.
Gao, Ming; Hong, Xia; Chen, Sheng; Harris, Chris J.; Khalaf, Emad. Pdfos: Pdf Estimation Based Oversampling for Imbalanced Two-Class Problems. Neurocomputing 138 (2014), p. 248–259
Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986. – ISBN 0412246201