A selection of simple univariate filters using t-test, Wilcoxon test, one-way ANOVA or correlation (Pearson or Spearman) for ranking variables. These filters are designed for speed. ttest_filter uses the Rfast package, wilcoxon_filter (Mann-Whitney) test uses matrixTests::row_wilcoxon_twosample , anova_filter uses matrixTests::col_oneway_welch (Welch's F-test) from the matrixTests
package. Can be applied to all or a subset of predictors. For mixed datasets (combined continuous & categorical) see stat_filter()
ttest_filter( y, x, force_vars =NULL, nfilter =NULL, p_cutoff =0.05, rsq_cutoff =NULL, type = c("index","names","full"), keep_factors =TRUE,...)anova_filter( y, x, force_vars =NULL, nfilter =NULL, p_cutoff =0.05, rsq_cutoff =NULL, type = c("index","names","full"), keep_factors =TRUE,...)wilcoxon_filter( y, x, force_vars =NULL, nfilter =NULL, p_cutoff =0.05, rsq_cutoff =NULL, type = c("index","names","full"), exact =FALSE, keep_factors =TRUE,...)correl_filter( y, x, method ="pearson", force_vars =NULL, nfilter =NULL, p_cutoff =0.05, rsq_cutoff =NULL, type = c("index","names","full"), keep_factors =TRUE,...)
Arguments
y: Response vector
x: Matrix or dataframe of predictors
force_vars: Vector of column names within x which are always retained in the model (i.e. not filtered). Default NULL means all predictors will be passed to filterFUN.
nfilter: Number of predictors to return. If NULL all predictors with p-values < p_cutoff are returned.
p_cutoff: p value cut-off
rsq_cutoff: r^2 cutoff for removing predictors due to collinearity. Default NULL means no collinearity filtering. Predictors are ranked based on t-test. If 2 or more predictors are collinear, the first ranked predictor by t-test is retained, while the other collinear predictors are removed. See collinear().
type: Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a matrix of p values.
keep_factors: Logical affecting factors with 3 or more levels. Dataframes are coerced to a matrix using data.matrix . Binary factors are converted to numeric values 0/1 and analysed as such. If keep_factors is TRUE (the default), factors with 3 or more levels are not filtered and are retained. If keep_factors is FALSE, they are removed.
...: optional arguments, including rsq_method passed to collinear()
or arguments passed to matrixTests::row_wilcoxon_twosample in wilcoxon_filter().
exact: Logical whether exact or approximate p-value is calculated. Default is FALSE for speed.
method: Type of correlation, either "pearson" or "spearman".
Returns
Integer vector of indices of filtered parameters (type = "index") or character vector of names (type = "names") of filtered parameters in order of t-test p-value. If type is "full" full output from Rfast::ttests is returned.
Examples
## sigmoid functionsigmoid <-function(x){1/(1+ exp(-x))}## load iris dataset and simulate a binary outcomedata(iris)dt <- iris[,1:4]colnames(dt)<- c("marker1","marker2","marker3","marker4")dt <- as.data.frame(apply(dt,2, scale))y2 <- sigmoid(0.5* dt$marker1 +2* dt$marker2)> runif(nrow(dt))y2 <- factor(y2, labels = c("C1","C2"))ttest_filter(y2, dt)# returns index of filtered predictorsttest_filter(y2, dt, type ="name")# shows names of predictorsttest_filter(y2, dt, type ="full")# full results tabledata(iris)dt <- iris[,1:4]y3 <- iris[,5]anova_filter(y3, dt)# returns index of filtered predictorsanova_filter(y3, dt, type ="full")# shows names of predictorsanova_filter(y3, dt, type ="name")# full results table