ttest_filter function

Univariate filters

Univariate filters

A selection of simple univariate filters using t-test, Wilcoxon test, one-way ANOVA or correlation (Pearson or Spearman) for ranking variables. These filters are designed for speed. ttest_filter uses the Rfast package, wilcoxon_filter (Mann-Whitney) test uses matrixTests::row_wilcoxon_twosample , anova_filter uses matrixTests::col_oneway_welch (Welch's F-test) from the matrixTests

package. Can be applied to all or a subset of predictors. For mixed datasets (combined continuous & categorical) see stat_filter()

ttest_filter( y, x, force_vars = NULL, nfilter = NULL, p_cutoff = 0.05, rsq_cutoff = NULL, type = c("index", "names", "full"), keep_factors = TRUE, ... ) anova_filter( y, x, force_vars = NULL, nfilter = NULL, p_cutoff = 0.05, rsq_cutoff = NULL, type = c("index", "names", "full"), keep_factors = TRUE, ... ) wilcoxon_filter( y, x, force_vars = NULL, nfilter = NULL, p_cutoff = 0.05, rsq_cutoff = NULL, type = c("index", "names", "full"), exact = FALSE, keep_factors = TRUE, ... ) correl_filter( y, x, method = "pearson", force_vars = NULL, nfilter = NULL, p_cutoff = 0.05, rsq_cutoff = NULL, type = c("index", "names", "full"), keep_factors = TRUE, ... )

Arguments

  • y: Response vector

  • x: Matrix or dataframe of predictors

  • force_vars: Vector of column names within x which are always retained in the model (i.e. not filtered). Default NULL means all predictors will be passed to filterFUN.

  • nfilter: Number of predictors to return. If NULL all predictors with p-values < p_cutoff are returned.

  • p_cutoff: p value cut-off

  • rsq_cutoff: r^2 cutoff for removing predictors due to collinearity. Default NULL means no collinearity filtering. Predictors are ranked based on t-test. If 2 or more predictors are collinear, the first ranked predictor by t-test is retained, while the other collinear predictors are removed. See collinear().

  • type: Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a matrix of p values.

  • keep_factors: Logical affecting factors with 3 or more levels. Dataframes are coerced to a matrix using data.matrix . Binary factors are converted to numeric values 0/1 and analysed as such. If keep_factors is TRUE (the default), factors with 3 or more levels are not filtered and are retained. If keep_factors is FALSE, they are removed.

  • ...: optional arguments, including rsq_method passed to collinear()

    or arguments passed to matrixTests::row_wilcoxon_twosample in wilcoxon_filter().

  • exact: Logical whether exact or approximate p-value is calculated. Default is FALSE for speed.

  • method: Type of correlation, either "pearson" or "spearman".

Returns

Integer vector of indices of filtered parameters (type = "index") or character vector of names (type = "names") of filtered parameters in order of t-test p-value. If type is "full" full output from Rfast::ttests is returned.

Examples

## sigmoid function sigmoid <- function(x) {1 / (1 + exp(-x))} ## load iris dataset and simulate a binary outcome data(iris) dt <- iris[, 1:4] colnames(dt) <- c("marker1", "marker2", "marker3", "marker4") dt <- as.data.frame(apply(dt, 2, scale)) y2 <- sigmoid(0.5 * dt$marker1 + 2 * dt$marker2) > runif(nrow(dt)) y2 <- factor(y2, labels = c("C1", "C2")) ttest_filter(y2, dt) # returns index of filtered predictors ttest_filter(y2, dt, type = "name") # shows names of predictors ttest_filter(y2, dt, type = "full") # full results table data(iris) dt <- iris[, 1:4] y3 <- iris[, 5] anova_filter(y3, dt) # returns index of filtered predictors anova_filter(y3, dt, type = "full") # shows names of predictors anova_filter(y3, dt, type = "name") # full results table

See Also

lm_filter() stat_filter()

  • Maintainer: Myles Lewis
  • License: MIT + file LICENSE
  • Last published: 2025-03-10