cv.irsvm_fit function

Internal function of cross-validation for irsvm

Internal function of cross-validation for irsvm

Internal function to conduct k-fold cross-validation for irsvm

cv.irsvm_fit(x, y, weights, cfun="ccave", s=c(1, 5), type=NULL, kernel="radial", gamma=2^(-4:10), cost=2^(-4:4), epsilon=0.1, balance=TRUE, nfolds=10, foldid, trim_ratio=0.9, n.cores=2, ...)

Arguments

  • x: a data matrix, a vector, or a sparse 'design matrix' (object of class Matrix provided by the Matrix package, or of class matrix.csr

    provided by the SparseM package, or of class simple_triplet_matrix provided by the slam

    package).

  • y: a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).

  • weights: the weight of each subject. It should be in the same length of y.

  • cfun: character, type of convex cap (concave) function.

    Valid options are:

    • "hcave"
    • "acave"
    • "bcave"
    • "ccave"
    • "dcave"
    • "ecave"
    • "gcave"
    • "tcave"
  • s: tuning parameter of cfun. s > 0 and can be equal to 0 for cfun="tcave". If s is too close to 0 for cfun="acave", "bcave", "ccave", the calculated weights can become 0 for all observations, thus crash the program.

  • type: irsvm can be used as a classification machine, or as a regression machine. Depending of whether y is a factor or not, the default setting for type is C-classification or eps-regression, respectively, but may be overwritten by setting an explicit value.

    Valid options are:

    • C-classification
    • nu-classification
    • eps-regression
    • nu-regression
  • kernel, gamma: the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type.

    • linear:: uvu'*v
    • polynomial:: (gammauv+coef0)degree(gamma*u'* v + coef0)^degree
    • radial basis:: exp(gammauv2)exp(-gamma*|u-v|^2)
    • sigmoid:: tanh(gammauv+coef0)tanh(gamma*u'*v + coef0)
  • cost: cost of constraints violation (default: 1)---it is the C -constant of the regularization term in the Lagrange formulation. This is proportional to the inverse of lambda in irglmreg.

  • epsilon: epsilon in the insensitive-loss function (default: 0.1)

  • balance: for type="C-classification", "nu-classification" only

  • nfolds: number of folds >=3, default is 10

  • foldid: an optional vector of values between 1 and nfold

    identifying what fold each observation is in. If supplied, nfold can be missing and will be ignored.

  • trim_ratio: a number between 0 and 1 for trimmed least squares, useful if type="eps-regression" or "nu-regression".

  • n.cores: The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores.

  • ...: Other arguments that can be passed to irsvm.

Details

This function is the driving force behind cv.irsvm. Does a K-fold cross-validation to determine optimal tuning parameters in SVM: cost and gamma if kernel is nonlinear. It can also choose s used in cfun.

Returns

an object of class "cv.irsvm" is returned, which is a list with the ingredients of the cross-validation fit. - residmat: matrix with row values for kernel="linear" are s, cost, error, k, where k is the number of cross-validation fold. For nonlinear kernels, row values are s, gamma, cost, error, k.

  • cost: a value of cost that gives minimum cross-validated value in irsvm.

  • gamma: a value of gamma that gives minimum cross-validated value in irsvm

  • s: value of s for cfun that gives minimum cross-validated value in irsvm.

References

Zhu Wang (2024) Unified Robust Estimation, Australian & New Zealand Journal of Statistics. 66(1):77-102.

Author(s)

Zhu Wang zwang145@uthsc.edu

See Also

cv.irsvm and irsvm

  • Maintainer: Zhu Wang
  • License: GPL-2
  • Last published: 2024-06-27