collinear3.0.0 package

Automated Multicollinearity Management

case_weights

Generate sample weights for imbalanced responses

collinear_select

Dual multicollinearity filtering algorithm

collinear_stats

Compute summary statistics for correlation and VIF

collinear

Smart multicollinearity management

cor_clusters

Group predictors by hierarchical correlation clustering

cor_cramer

Quantify association between categorical variables

cor_df

Compute signed pairwise correlations dataframe

cor_matrix

Signed pairwise correlation matrix

cor_select

Multicollinearity filtering by pairwise correlation threshold

cor_stats

Compute summary statistics for absolute pairwise correlations

drop_geometry_column

Removes geometry Column From sf Dataframes

f_auto_rules

Decision rules for f_auto()

f_auto

Automatic selection of predictor scoring method

f_binomial_gam

Area under the curve of binomial GAM predictions vs. observations

f_binomial_glm

Area Under the Curve of Binomial GLM predictions vs. observations

f_binomial_rf

Area Under the Curve of Binomial Random Forest predictions vs. observa...

f_categorical_rf

Cramer's V of Categorical Random Forest predictions vs. observations

f_count_gam

R-squared of Poisson GAM predictions vs. observations

f_count_glm

R-squared of Poisson GLM predictions vs. observations

f_count_rf

R-squared of Random Forest predictions vs. observations

f_functions

List predictor scoring functions

f_numeric_gam

R-squared of Gaussian GAM predictions vs. observations

f_numeric_glm

R-squared of Gaussian GLM predictions vs. observations

f_numeric_rf

R-squared of Random Forest predictions vs. observations

identify_categorical_variables

Find valid categorical variables in a dataframe

identify_logical_variables

Find logical variables in a dataframe

identify_numeric_variables

Find valid numeric variables in a dataframe

identify_response_type

Detect response variable type for model selection

identify_valid_variables

Find valid numeric, categorical, and logical variables in a dataframe

identify_zero_variance_variables

Find near-zero variance variables in a dataframe

model_formula

Build model formulas from response and predictors

preference_order

Rank predictors by importance or multicollinearity

print.collinear_output

Print all collinear selection results of collinear()

print.collinear_selection

Print single selection results from collinear

score_auc

Compute area under the ROC curve between binomial observations and pro...

score_cramer

Compute Cramer's V between categorical observations and predictions

score_r2

Compute R-squared between numeric observations and predictions

step_collinear

Tidymodels recipe step for multicollinearity filtering

summary.collinear_output

Summarize all results of collinear()

summary.collinear_selection

Summarize single response selection results of collinear

target_encoding_lab

Convert categorical predictors to numeric via target encoding

target_encoding_methods

Encode categories as response means

validate_arg_df_not_null

Ensure that argument df is not NULL

validate_arg_df

Check and prepare argument df

validate_arg_encoding_method

Check and validate argument encoding_method

validate_arg_f

Check and validate argument f

validate_arg_function_name

Build hierarchical function names for messages

validate_arg_max_cor

Check and constrain argument max_cor

validate_arg_max_vif

Check and constrain argument max_vif

validate_arg_predictors

Check and validate argument predictors

validate_arg_preference_order

Check and complete argument preference_order

validate_arg_quiet

Check and validate argument quiet

validate_arg_responses

Check and validate arguments response and responses

vif_df

Compute variance inflation factors dataframe

vif_select

Multicollinearity filtering by variance inflation factor threshold

vif_stats

VIF Statistics

vif

Compute variance inflation factors from a correlation matrix

Provides a comprehensive and automated workflow for managing multicollinearity in data frames with numeric and/or categorical variables. The package integrates five robust methods into a single function: (1) target encoding of categorical variables based on response values (Micci-Barreca, 2001 (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); (2) automated feature prioritization to preserve key predictors during filtering; (3 and 4) pairwise correlation and VIF filtering across all variable types (numeric–numeric, numeric–categorical, and categorical–categorical); (5) adaptive correlation and VIF thresholds. Together, these methods enable a reliable multicollinearity management in most use cases while maintaining model integrity. The package also supports parallel processing and progress tracking via the packages 'future' and 'progressr', and provides seamless integration with the 'tidymodels' ecosystem through a dedicated recipe step.

  • Maintainer: Blas M. Benito
  • License: MIT + file LICENSE
  • Last published: 2025-12-08