formula: an object of class "Formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
data: the data frame to be modeled.
family: a vector of character of length q specifying the distributions of the responses. Bernoulli, binomial, poisson and gaussian are allowed.
K: number of components, default is one.
folds: number of folds, default is 10. Although folds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. folds can also be provided as a vector (same length as data) of fold identifiers.
type: loss function to use for cross-validation. Currently six options are available depending on whether the responses are of the same distribution family. If the responses are all bernoulli distributed, then the prediction performance may be measured through the area under the ROC curve: type = "auc" In any other case one can choose among the following five options ("likelihood","aic","aicc","bic","mspe").
size: specifies the number of trials of the binomial variables included in the model. A (n*qb) matrix is expected for qb binomial variables.
offset: used for the poisson dependent variables. A vector or a matrix of size: number of observations * number of Poisson dependent variables is expected.
na.action: a function which indicates what should happen when the data contain NAs. The default is set to the na.omit.
crit: a list of two elements : maxit and tol, describing respectively the maximum number of iterations and the tolerance convergence criterion for the Fisher scoring algorithm. Default is set to 50 and 10e-6 respectively.
method: Regularization criterion type. Object of class "method.SCGLR" built by methodSR for Structural Relevance.
nfolds: deprecated. Use fold parameter instead.
mc.cores: deprecated
Returns
a matrix containing the criterion values for each response (rows) and each number of components (columns).
Examples
## Not run:library(SCGLR)# load sample datadata(genus)# get variable names from datasetn <- names(genus)ny <- n[grep("^gen",n)]# Y <- names that begins with "gen"nx <- n[-grep("^gen",n)]# X <- remaining names# remove "geology" and "surface" from nx# as surface is offset and we want to use geology as additional covariatenx <-nx[!nx%in%c("geology","surface")]# build multivariate formula# we also add "lat*lon" as computed covariateform <- multivariateFormula(ny,c(nx,"I(lat*lon)"),A=c("geology"))# define familyfam <- rep("poisson",length(ny))# cross validationgenus.cv <- scglrCrossVal(formula=form, data=genus, family=fam, K=12, offset=genus$surface)# find best Kmean.crit <- colMeans(log(genus.cv))#plot(mean.crit, type="l")## End(Not run)
References
Bry X., Trottier C., Verron T. and Mortier F. (2013) Supervised Component Generalized Linear Regression using a PLS-extension of the Fisher scoring algorithm. Journal of Multivariate Analysis, 119, 47-60.