integrate_output function

Integration for target variable

Integration for target variable

Calculates an estimator for a derived quantity by summing across multiple predictions. This can be used to approximate an integral when estimating area-expanded abundance, abundance-weighting a covariate to calculate distribution shifts, and/or weighting one model variable by another.

integrate_output( object, newdata, area, type = rep(1, nrow(newdata)), weighting_index, covariate, getsd = TRUE, bias.correct = TRUE, apply.epsilon = FALSE, intern = FALSE )

Arguments

  • object: Output from tinyVAST().

  • newdata: New data-frame of independent variables used to predict the response, where a total value is calculated by combining across these individual predictions. If these locations are randomly drawn from a specified spatial domain, then integrate_output applies midpoint integration to approximate the total over that area. If locations are drawn sysmatically from a domain, then integrate_output is applying a midpoint approximation to the integral.

  • area: vector of values used for area-weighted expansion of estimated density surface for each row of newdata

    with length of nrow(newdata).

  • type: Integer-vector indicating what type of expansion to apply to each row of newdata, with length of nrow(newdata).

    • type=1: Area-weighting: weight predictor by argument area
    • type=2: Abundance-weighted covariate: weight covariate by proportion of total in each row of newdata
    • type=3: Abundance-weighted variable: weight predictor by proportion of total in a prior row of newdata. This option is used to weight a prediction for one category based on predicted proportional density of another category, e.g., to calculate abundance-weighted condition in a bivariate model.
    • type=4: Abundance-expanded variable: weight predictor by density in a prior row of newdata. This option is used to weight a prediction for one category based on predicted density of another category, e.g., to calculate abundance-expanded consumption in a bivariate model.
    • type=0: Exclude from weighting: give weight of zero for a given row of newdata. Including a row of newdata with type=0 is useful, e.g., when calculating abundance at that location, where the eventual index uses abundance as weighting term but without otherwise using the predicted density in calculating a total value.
  • weighting_index: integer-vector used to indicate a previous row that is used to calculate a weighted average that is then applied to the given row of newdata. Only used for when type=3.

  • covariate: numeric-vector used to provide a covariate that is used in expansion, e.g., to provide positional coordinates when calculating the abundance-weighted centroid with respect to that coordinate. Only used for when type=2.

  • getsd: logical indicating whether to get the standard error, where getsd=FALSE is faster during initial exploration

  • bias.correct: logical indicating if bias correction should be applied using standard methods in TMB::sdreport()

  • apply.epsilon: Apply epsilon bias correction using a manual calculation rather than using the conventional method in TMB::sdreport ? See details for more information.

  • intern: Do Laplace approximation on C++ side? Passed to TMB::MakeADFun().

Returns

A vector containing the plug-in estimate, standard error, the epsilon bias-corrected estimate if available, and the standard error for the bias-corrected estimator. Depending upon settings, one or more of these will be NA values, and the function can be repeatedly called to get multiple estimators and/or statistics.

Details

Analysts will often want to calculate some value by combining the predicted response at multiple locations, and potentially from multiple variables in a multivariate analysis. This arises in a univariate model, e.g., when calculating the integral under a predicted density function, which is approximated using a midpoint or Monte Carlo approximation by calculating the linear predictors at each location newdata, applying the inverse-link-trainsformation, and calling this predicted response mu_g. Total abundance is then be approximated by multiplying mu_g by the area associated with each midpoint or Monte Carlo approximation point (supplied by argument area), and summing across these area-expanded values.

In more complicated cases, an analyst can then use covariate

to calculate the weighted average of a covariate for each midpoint location. For example, if the covariate is positional coordinates or depth/elevation, then type=2

measures shifts in the average habitat utilization with respect to that covariate. Alternatively, an analyst fitting a multivariate model might weight one variable based on another using weighting_index, e.g., to calculate abundance-weighted average condition, or predator-expanded stomach contents.

In practice, spatial integration in a multivariate model requires two passes through the rows of newdata when calculating a total value. In the following, we write equations using C++ indexing conventions such that indexing starts with 0, to match the way that integrate_output expects indices to be supplied. Given inverse-link-transformed predictor μg\mu_g, function argument type as typegtype_g

function argument area as aga_g, function argument covariate as xgx_g, function argument weighting_index as \eqn{ h_g }

function argument weighting_index as \eqn{ h_g }

the first pass calculates:

νg=μgag \nu_g = \mu_g a_g

where the total value from this first pass is calculated as:

ν=g=0G1νg \nu^* = \sum_{g=0}^{G-1} \nu_g

The second pass then applies a further weighting, which depends upon typegtype_g, and potentially upon xgx_g and hgh_g.

If typeg=0type_g = 0 then ϕg=0\phi_g = 0

If typeg=1type_g = 1 then ϕg=νg\phi_g = \nu_g

If typeg=2type_g = 2 then ϕg=xgνgν\phi_g = x_g \frac{\nu_g}{\nu^*}

If typeg=3type_g = 3 then ϕg=νhgνμg\phi_g = \frac{\nu_{h_g}}{\nu^*} \mu_g

If typeg=4type_g = 4 then ϕg=νhgμg\phi_g = \nu_{h_g} \mu_g

Finally, the total value from this second pass is calculated as:

ϕ=g=0G1ϕg \phi^* = \sum_{g=0}^{G-1} \phi_g

and ϕ\phi^* is outputted by integrate_output, along with a standard error and potentially using the epsilon bias-correction estimator to correct for skewness and retransformation bias.

Standard bias-correction using bias.correct=TRUE can be slow, and in some cases it might be faster to do apply.epsilon=TRUE and intern=TRUE. However, that option is somewhat experimental, and a user might want to confirm that the two options give identical results. Similarly, using bias.correct=TRUE

will still calculate the standard-error, whereas using apply.epsilon=TRUE and intern=TRUE will not.

  • Maintainer: James T. Thorson
  • License: GPL-3
  • Last published: 2025-03-21