powerRegAcc function

Functions for computing the region of acceptance

Functions for computing the region of acceptance

getAccRegion - Computes the region of acceptance based on quantiles for a specified level of significance and method.

getAccRegion_sampled - Computes a sampling-based region of acceptance for the given null model based on quantiles for a specified level of significance and method.

getAccRegion_exact - Computes the exact region of acceptance for the given null model based on quantiles for a specified level of significance and method. Currently, this is only implemented for null_model = "yule" or "pda", and n<=20.

computeAccRegion - Computes the bounds of the region of acceptance given the empirical distribution function (specified by the unique values and their probabilities under the null model) for specified cut-offs (e.g., 0.025 on both sides for a symmetric two-tailed test). For values strictly outside of the interval the null hypothesis is rejected.

This function also computes the probabilities to reject the null hypothesis if the value equals the lower or upper bound of the region of acceptance. This probability is 0 for correction method "none" and for "small-sample" it ensures that the probability of rejection exactly corresponds with the specified cut-offs.

getAccRegion( tss, null_model = "yule", n, distribs = "exact_if_possible", N_null = 10000L, N_alt = 1000L, N_intervals = 1000L, test_type = "two-tailed", correction = "small-sample", sig_lvl = 0.05 ) getAccRegion_sampled( tss, null_model = "yule", n, N_null, N_alt = 1000L, N_intervals = 1000L, test_type = "two-tailed", correction = "small-sample", sig_lvl = 0.05 ) getAccRegion_exact( tss, null_model = "yule", n, N_alt = 1000L, N_intervals = 1000L, test_type = "two-tailed", correction = "small-sample", sig_lvl = 0.05 ) computeAccRegion( unique_null_vals, unique_null_probs, correction, cutoff_left, cutoff_right )

Arguments

  • tss: Vector containing the names (as character) of the tree shape statistics that should be compared. You may either use the short names provided in tssInfo to use the already included TSS, or use the name of a list object containing similar information as the entries in tssInfo. Example:

    Use "new_tss" as the name for the list object new_tss containing at least the function new_tss$func = function(tree){...}, and optionally also the information new_tss$short, new_tss$simple, new_tss$name, new_tss$type, new_tss$only_binary, and new_tss$safe_n.

  • null_model: The null model that is to be used to determine the power of the tree shape statistics. In general, it must be a function that produces rooted binary trees in phylo format.

    If the respective model is included in this package, then specify the model and its parameters by using a character or list. Available are all options listed under parameter tm in the documentation of function genTrees (type ?genTrees).

    If you want to include your own tree model, then use the name of a list object containing the function (with the two input parameters n and Ntrees). Example:

    Use "new_tm" for the list object new_tm <- list(func = function(n, Ntrees){...}).

  • n: Integer value that specifies the desired number of leaves, i.e., vertices with in-degree 1 and out-degree 0.

  • distribs: Determines how the distributions (and with that the bounds of the critical region) are computed. Available are:

    • "exact_if_possible" (default): Tries to compute the exact distribution under the null model if possible. Currently, this is only implemented for null_model = "yule", "pda", or "etm", and n<=20. In all other cases the distribution is approximated by sampling N_null many trees under the null model as in the option "sampled" below.
    • "sampled": N_null many trees are sampled under the null model to approximate the distribution.
  • N_null: Sample size (integer >=10) if distributions are sampled (default = 10000L).

  • N_alt: Sample size (integer >=10) for the alternative models to estimate the power (default = 1000L). Only needed here if the test_type is "two-tailed-unbiased".

  • N_intervals: Number (integer >=3, default = 1000L) of different quantile/cut-off pairs investigated as potential bounds of the region of acceptance. This parameter is only necessary if the test_type is "two-tailed-unbiased".

  • test_type: Determines the method. Available are:

    • "two-tailed" (default): The lower and upper bound of the region of acceptance are determined based on the (empirical) distribution function such that P(TSS < lower bound) <= sig_lvl/2 and P(TSS > upper bound) <= sig_lvl/2. See parameter correction

      for specifying how conservative the test should be: the null hypothesis can either be rejected only if the values are strictly outside of this region of acceptance (can be too conservative) or it can also be rejected (with certain probabilities) if the value equals the lower or upper bound.

    • "two-tailed-unbiased": Experimental - Use with caution!

      The region of acceptance is optimized to yield an unbiased test, i.e., a test that identifies non-null models with a probability of at least sig_lvl. The region of acceptance is determined similar to the default method. However, it need not be symmetrical, i.e., not necessarily cutting off sig_lvl/2 on both sides. Also see parameter correction for specifying how conservative the test should be.

  • correction: Specifies the desired correction method. Available are:

    • "small-sample" (default): This method tries to ensure that the critical region, i.e., the range of values for which the null hypothesis is rejected, is as close to sig_lvl as possible (compared with "none" below, which can be too conservative). The idea is that the null hypothesis is also rejected with certain probabilities if the value matches a bound of the region of acceptance.
    • "none": No correction method is applied. With that the test might be slightly too conservative as the null hypothesis is maintained if the values are >= the lower and <= the upper bound.
  • sig_lvl: Level of significance (default=0.05, must be >0 and <1).

  • unique_null_vals: Numeric vector containing all the unique values under the null model.

  • unique_null_probs: Numeric vector containing the corresponding probabilities of the unique values under the null model.

  • cutoff_left: Numeric value (>=0, <1) specifying the cut-off of the distribution for the lower bound of the region of acceptance. The sum of the two cut-offs must be <1.

  • cutoff_right: Numeric value (>=0, <1) specifying the cut-off of the distribution for the upper bound of the region of acceptance. The sum of the two cut-offs must be <1.

Returns

getAccRegion Numeric matrix (one row per TSS) with four columns: The first two columns contain the interval limits of the region of acceptance, i.e., we reject the null hypothesis for values strictly outside of this interval. The third and fourth columns contain the probabilities to reject the null hypothesis if values equal the lower or upper bound, respectively.

getAccRegion_sampled Numeric matrix (one row per TSS) with four columns - similar as getAccRegion.

getAccRegion_exact Numeric matrix (one row per TSS) with four columns - similar as getAccRegion.

computeAccRegion Numeric vector with four columns - similar as getAccRegion.

Examples

getAccRegion(tss = c("Sackin", "Colless", "B1I"), n = 6L) getAccRegion(tss = c("Sackin", "Colless", "B1I"), n = 6L, null_model = "etm", N_null = 20L, correction = "none", distribs = "sampled") getAccRegion(tss = c("Sackin", "Colless", "B1I"), n = 6L, N_null = 20L, test_type = "two-tailed-unbiased", N_intervals = 5L, N_alt = 10L) getAccRegion_sampled(tss = c("Sackin", "Colless", "B1I"), n = 6L, N_null = 20L, correction = "none") getAccRegion_exact(tss = c("Sackin", "Colless", "B1I"), null_model = "etm", n = 8L) computeAccRegion(unique_null_vals = c(1,2,3,4,5), unique_null_probs = c(0.1,0.4,0.1,0.2,0.2), correction = "small-sample", cutoff_left = 0.15, cutoff_right = 0.15)
  • Maintainer: Sophie Kersting
  • License: GPL (>= 3)
  • Last published: 2024-08-16

Useful links