x: a data matrix, a vector, or a sparse 'design matrix' (object of class Matrix provided by the Matrix package, or of class matrix.csr
provided by the SparseM package, or of class simple_triplet_matrix provided by the slam
package).
y: a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).
weights: the weight of each subject. It should be in the same length of y.
cfun: character, type of convex cap (concave) function.
Valid options are:
"hcave"
"acave"
"bcave"
"ccave"
"dcave"
"ecave"
"gcave"
"tcave"
s: tuning parameter of cfun. s > 0 and can be equal to 0 for cfun="tcave". If s is too close to 0 for cfun="acave", "bcave", "ccave", the calculated weights can become 0 for all observations, thus crash the program.
type: irsvm can be used as a classification machine, or as a regression machine. Depending of whether y is a factor or not, the default setting for type is C-classification or eps-regression, respectively, but may be overwritten by setting an explicit value.
Valid options are:
C-classification
nu-classification
eps-regression
nu-regression
kernel, gamma: the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type.
linear:: u′∗v
polynomial:: (gamma∗u′∗v+coef0)degree
radial basis:: exp(−gamma∗∣u−v∣2)
sigmoid:: tanh(gamma∗u′∗v+coef0)
cost: cost of constraints violation (default: 1)---it is the C -constant of the regularization term in the Lagrange formulation. This is proportional to the inverse of lambda in irglmreg.
epsilon: epsilon in the insensitive-loss function (default: 0.1)
balance: for type="C-classification", "nu-classification" only
nfolds: number of folds >=3, default is 10
foldid: an optional vector of values between 1 and nfold
identifying what fold each observation is in. If supplied, nfold can be missing and will be ignored.
trim_ratio: a number between 0 and 1 for trimmed least squares, useful if type="eps-regression" or "nu-regression".
n.cores: The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores.
...: Other arguments that can be passed to irsvm.
Details
This function is the driving force behind cv.irsvm. Does a K-fold cross-validation to determine optimal tuning parameters in SVM: cost and gamma if kernel is nonlinear. It can also choose s used in cfun.
Returns
an object of class "cv.irsvm" is returned, which is a list with the ingredients of the cross-validation fit. - residmat: matrix with row values for kernel="linear" are s, cost, error, k, where k is the number of cross-validation fold. For nonlinear kernels, row values are s, gamma, cost, error, k.
cost: a value of cost that gives minimum cross-validated value in irsvm.
gamma: a value of gamma that gives minimum cross-validated value in irsvm
s: value of s for cfun that gives minimum cross-validated value in irsvm.
References
Zhu Wang (2024) Unified Robust Estimation, Australian & New Zealand Journal of Statistics. 66(1):77-102.