Monte Carlo Sensitivity Analysis for dynaTree Models
Monte Carlo Sensitivity Analysis for dynaTree Models
A Monte Carlo sensitivity analysis using random Latin hypercube samples (LHSs) or bootstrap resamples for each particle to estimate main effects as well as 1st order and total sensitivity indices
## S3 method for class 'dynaTree'sens(object, class =NULL, nns =1000, nME =100, span =0.3, method = c("lhs","boot"), lhs =NULL, categ =NULL, verb =0)
Arguments
object: a "dynaTree"-class object built by dynaTree
class: only valid for object$model = "class", allows the user to specify the subset of class labels in unique(object$y) for which sensitivity indices are calculated. The implementation loops over the vector of labels provided. The default of NULL results in class = unique(object$y)
nns: A positive integer scalar indicating the size of each LHS or bootstrap drawn for use in the Monte Carlo integration scheme underlying the sensitivity analysis; the total number of locations is nn.lhs*(ncol(X)+2)
nME: A positive integer scalar indicating number of grid points, in each input dimension, upon which main effects will be estimated
span: A positive real-valued scalar giving the smoothing parameter for main effects integration: the fraction of nns points that will be included in a moving average window that is used to estimate main effects at the nME locations in each input dimension
method: indicates whether LHS or bootstrap should be used
lhs: if method = "lhs" then this argument should be a list with entries rect, shape
and mode describing the marginal distributions of the Latin Hypercube; specify NULL
for a default specification for method = "boot". The fields should have the following format(s):
rect: Optional rectangle describing the domain of the uncertainty distribution with respect to which the sensitivity is to be determined. This defines the domain from which the LH sample is to be taken. The rectangle should be a matrix or data.frame with ncol(rect) = 2, and number of rows equal to the dimension of the domain. For 1-d data, a vector of length 2 is allowed. The default is the input data range of each column of (object$X).
shape: Optional vector of shape parameters for Beta marginal distributions having length ncol(object$X) and elements \> 1, i.e., concave Beta distributions. If specified, the uncertainty distribution (i.e. the LHS) is proportional to a joint pdf formed by independent Beta distributions in each dimension of the domain, scaled and shifted to have support defined by rect. If unspecified, the uncertainty distribution is uniform over rect. The specification shape[i]=0 instructs sens to treat the i'th dimension as a binary variable. In this case, mode[i]
is the probability parameter for a bernoulli uncertainty distribution, and we must also have rect[i,]=c(0,1).
mode: Optional vector of mode values for the Beta uncertainty distribution. Vector of length equal to the dimension of the domain, with elements within the support defined by rect. If shape is specified, but this is not, then the scaled Beta distributions will be symmetric.
categ: A vector of logicals of length ncol(object$X) indicating which, if any, dimensions of the input space should be treated as categorical; this input is used to help set the default lhs$shape argument if not specified; the default categ
argument is NULL meaning that the categorical inputs are derived from object$X in a sensible way
verb: a positive scalar integer indicating how many predictive locations (iterations) after which a progress statement should be printed to the console; a (default) value of verb = 0 is quiet
Details
Saltelli (2002) describes a Latin Hypercube sampling based method for estimation of the 'Sobol' sensitivity indices:
All moments are with respect to the appropriate marginals of the uncertainty distribution U -- that is, the probability distribution on the inputs with respect to which sensitivity is being investigated. Under this approach, the integrals involved are approximated through averages over properly chosen samples based on two LH samples proportional to U. If nns is the sample size for the Monte Carlo estimate, this scheme requires nns*(ncol(X)+2)
function evaluations.
The sens.dynaTree function implements the method for unknown functions f, through prediction via one of the tgp regression models conditional on an observed set of X locations. For each particle, treated as sample from the dynaTree
model posterior, the nns*(ncol(X)+2) locations are drawn randomly from the LHS scheme and realizations of the sensitivity indices are calculated. Thus we obtain a posterior sample of the indices, incorporating variability from both the Monte Carlo estimation and uncertainty about the function output. Since a subset of the predictive locations are actually an LHS proportional to the uncertainty distribution, we can also estimate the main effects through simple non-parametric regression (a moving average).
See the Gramacy, Taddy, & Wild (2011) reference below for more details.
If method = "boot" is used then simply replace LHS above with a bootstrap resample of the object$X locations.
As with prediction, the dynaTrees function enables repeated calls to sens.dynaTree
Returns
The object returned is of class "dynaTree", which includes a copy of the list elements from the object passed in, with the following (sensitivity-analysis specific) additions.
MEgrid: An nME-by-ncol(object$X) matrix containing the main effects predictive grid at which the following MEmean, MEq1, and MEq2 quantities were obtained
MEmean: A matrix with ncol(object$X)
columns and nME rows containing the mean main effects for each input dimension
MEq1: same as MEmean but containing the 5% quantiles
MEq2: same as MEmean but containing the 95% quantiles
S: An object$N-row and ncol(object$X)
matrix containing the posterior (samples) of the 1st Order Sobol sensitivity indices
T: same as S but containing the Total Effect indices
In the case of object$model = "class" the entries listed above will themselves be lists with an entry for each class specified on input, or all classes as is the default
References
Saltelli, A. (2002) Making best use of model evaluations to compute sensitivity indices.
Computer Physics Communications, 145, 280-297.
Gramacy, R.B., Taddy, M.A., and S. Wild (2011). Variable Selection and Sensitivity Analysis via Dynamic Trees with anApplication to Computer Code Performance Tuning
The quality of sensitivity analysis is dependent on the size of the LHSs used for integral approximation; as with any Monte Carlo integration scheme, the sample size (nns) must increase with the dimensionality of the problem. The total sensitivity indices T are forced non-negative, and if negative values occur it is necessary to increase nnd. Postprocessing replaces negative values with NA
## friedman dataif(require("tgp")){ f <- friedman.1.data(1000) X <- f[,1:6] Z <- f$Y
## fit the model and do the sensitivity analysis N <-100## use N >= 1000 for better results## small N is for fast CRAN checks out <- dynaTree(X=X, y=Z, N=N, ab=c(0.01,2))## also try with model="linear"## gather relevance statistics out <- relevance(out) boxplot(out$relevance) abline(h=0, col=2, lty=2)## relevance stats are not as useful when model="linear"## since it will appear that x4 and x5 not helpful; these## interact linearly with the response## full simulation-based sensitivity analysis, the dynaTree::## part is only needed if the tgp package is loaded out <- dynaTree::sens(out, verb=100)## plot the main effects r <- range(rbind(c(out$MEmean, out$MEq1, out$MEq2))) par(mfrow=c(1,ncol(out$X)), mar=c(5,3,2,1)) plot(out$MEgrid[,1], out$MEmean[,1], type="l", ylim=r, lwd=2, ylab="", xlab=colnames(out$MEmean)[1]) lines(out$MEgrid[,1], out$MEq1[,1], lty=2, lwd=2) lines(out$MEgrid[,1], out$MEq2[,1], lty=2, lwd=2)if(ncol(out$X)>1){for(d in2:ncol(out$X)){ plot(out$MEgrid[,d], out$MEmean[,d], col=d, type="l", ylim=r, lwd=2, xlab=colnames(out$MEmean)[d], ylab="") lines(out$MEgrid[,d], out$MEq1[,d], col=d, lty=2) lines(out$MEgrid[,d], out$MEq2[,d], col=d, lty=2)}}## Sobol indices par(mfrow=c(1,2), mar=c(5,4,4,2)) boxplot(out$S, main="first order indices", xlab="inputs") boxplot(out$T, main="total indices", xlab="inputs")## these look better when model="linear"## clean up deletecloud(out)## for a classification example using the sensitivity hooks## in the dynaTrees function, see the class2d demo## i.e., demo("class2d")}