capacity_logreg_testing function

Testing procedures for estimation of channel capacity

Testing procedures for estimation of channel capacity

Diagnostic procedures that allows to compute the uncertainty of estimation of channel capacity by SLEMI approach. Two main procedures are implemented: bootstrap, which execute estimation with using a fraction of data and overfitting test, which divides data into two parts: training and testing. Each of them is repeated specified number of times to obtain a distribution of our estimators. It is recommended to conduct estimation by calling capacity_logreg_main.R.

capacity_logreg_testing( data, signal = "signal", response = "response", side_variables = NULL, cc_maxit = 100, lr_maxit = 1000, MaxNWts = 5000, formula_string = NULL, TestingSeed = 1234, testing_cores = 1, boot_num = 10, boot_prob = 0.8, sidevar_num = 10, traintest_num = 10, partition_trainfrac = 0.6 )

Arguments

  • data: must be a data.frame object. Cannot contain NA values.
  • signal: is a character object with names of columns of dataRaw to be treated as channel's input.
  • response: is a character vector with names of columns of dataRaw to be treated as channel's output
  • side_variables: (optional) is a character vector that indicates side variables' columns of data, if NULL no side variables are included
  • cc_maxit: is the number of iteration of iterative optimisation of the algorithm to estimate channel capacity. Default is 100.
  • lr_maxit: is a maximum number of iteration of fitting algorithm of logistic regression. Default is 1000.
  • MaxNWts: is a maximum acceptable number of weights in logistic regression algorithm. Default is 5000.
  • formula_string: (optional) is a character object that includes a formula syntax to use in logistic regression model. If NULL, a standard additive model of response variables is assumed. Only for advanced users.
  • TestingSeed: is the seed for random number generator used in testing procedures
  • testing_cores: - number of cores to be used in parallel computing (via doParallel package)
  • boot_num: is the number of bootstrap tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.
  • boot_prob: is the proportion of initial size of data to be used in bootstrap. Default is 0.8.
  • sidevar_num: is the number of re-shuffling tests of side variables to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.
  • traintest_num: is the number of overfitting tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.
  • partition_trainfrac: is the fraction of data to be used as a training dataset. Default is 0.6.

Returns

a list with four elements:

  • output$bootstrap - confusion matrix of logistic regression predictions
  • output$resamplingMorph - channel capacity in bits
  • output$traintest - optimal probability distribution
  • output$bootResampMorph - nnet object describing logistic regression model (if model_out=TRUE)

Each of above is a list, where an element is an output of a single repetition of the channel capacity algorithm

Details

If side variables are added within the analysis (side_variables is not NULL), two additional procedures are carried out: reshuffling test and reshuffling with bootstrap test, which are based on permutation of side variables values within the dataset. Additional parameters: lr_maxit and MaxNWts are the same as in definition of multinom function from nnet package. An alternative model formula (using formula_string arguments) should be provided if data are not suitable for description by logistic regression (recommended only for advanced users).

References

[1] Jetka T, Nienaltowski K, Winarski T, Blonski S, Komorowski M, Information-theoretic analysis of multivariate single-cell signaling responses using SLEMI, PLoS Comput Biol, 15(7): e1007132, 2019, https://doi.org/10.1371/journal.pcbi.1007132.

Examples

## Please set boot_num and traintest_num with larger numbers ## for a more reliable testing tempdata=data_example1 outputCLR1_testing=capacity_logreg_testing(data=tempdata, signal="signal", response="response",cc_maxit=10, TestingSeed=11111, boot_num=1,boot_prob=0.8,testing_cores=1, traintest_num=1,partition_trainfrac=0.6)
  • Maintainer: Tomasz Jetka
  • License: GPL (>= 3)
  • Last published: 2023-11-19