deleteNearZeroCoefficientOfVariation function

deleteNearZeroCoefficientOfVariation

deleteNearZeroCoefficientOfVariation

Filters out variables from a dataset that exhibit a coefficient of variation below a specified threshold, ensuring the retention of variables with meaningful variability.

deleteNearZeroCoefficientOfVariation(X, LIMIT = 0.1)

Arguments

  • X: Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.
  • LIMIT: Numeric. Cutoff for minimum variation. If coefficient is lesser than the limit, the variables are removed because not vary enough (default: 0.1).

Returns

Return a list of two objects: X: The new data.frame X filtered. variablesDeleted: The variables that have been removed by the filter. coeff_variation: The coefficient variables per each variable tested.

Details

The deleteNearZeroCoefficientOfVariation function is a pivotal tool in data preprocessing, especially when dealing with high-dimensional datasets. The coefficient of variation (CoV) is a normalized measure of data dispersion, calculated as the ratio of the standard deviation to the mean. In many scientific investigations, variables with a low CoV might be considered as offering limited discriminative information, potentially leading to noise in subsequent statistical analyses. By setting a threshold through the LIMIT parameter, this function provides a systematic approach to identify and exclude variables that do not meet the desired variability criteria. The underlying rationale is that variables with a CoV below the set threshold might not contribute significantly to the variability of the dataset and could be redundant or even detrimental for certain analyses. The function returns a modified dataset, a list of deleted variables, and the computed coefficients of variation for each variable. This comprehensive output ensures that researchers are well-informed about the preprocessing steps and can make subsequent analytical decisions with confidence.

Examples

data("X_proteomic") X <- X_proteomic filter <- deleteNearZeroCoefficientOfVariation(X, LIMIT = 0.1)

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

  • Maintainer: Pedro Salguero García
  • License: CC BY 4.0
  • Last published: 2025-03-05