InvariantTargetPrediction function

Invariant target prediction.

Invariant target prediction.

Tests the null hypothesis that Y and E are independent given X.

InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE, fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split, argsTrainTestSplitFunc = NULL, test = fTestTargetY, colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, permute = TRUE, returnModel = FALSE)

Arguments

  • Y: An n-dimensional vector.
  • E: An n-dimensional vector or an nxq dimensional matrix or dataframe.
  • X: A matrix or dataframe with n rows and p columns.
  • alpha: Significance level. Defaults to 0.05.
  • verbose: If TRUE, intermediate output is provided. Defaults to FALSE.
  • fitWithGam: If TRUE, a GAM is used for the nonlinear regression, else a random forest is used. Defaults to TRUE.
  • trainTestSplitFunc: Function to split sample. Defaults to stratified sampling using caTools::sample.split, assuming E is a factor.
  • argsTrainTestSplitFunc: Arguments for sampling splitting function.
  • test: Unconditional independence test that tests whether the out-of-sample prediction accuracy is the same when using X only vs. X and E as predictors for Y. Defaults to fTestTargetY.
  • colNameNoSmooth: Gam parameter: Name of variables that should enter linearly into the model. Defaults to NULL.
  • mtry: Random forest parameter: Number of variables randomly sampled as candidates at each split. Defaults to sqrt(NCOL(X)).
  • ntree: Random forest parameter: Number of trees to grow. Defaults to 100.
  • nodesize: Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
  • maxnodes: Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.
  • permute: Random forest parameter: If TRUE, model that would use X only for predicting Y also includes a random permutation of E. Defaults to TRUE.
  • returnModel: If TRUE, the fitted quantile regression forest model will be returned. Defaults to FALSE.

Returns

A list with the following entries:

  • pvalue The p-value for the null hypothesis that Y and E are independent given X.
  • model The fitted models if returnModel = TRUE.

Examples

# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 3 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, E, X) InvariantTargetPrediction(Y, X, E) InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY) InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)