kmmd function

Kernel Maximum Mean Discrepancy.

Kernel Maximum Mean Discrepancy.

The Kernel Maximum Mean Discrepancy kmmd performs a non-parametric distribution test.

## S4 method for signature 'matrix' kmmd(x, y, kernel="rbfdot",kpar="automatic", alpha = 0.05, asymptotic = FALSE, replace = TRUE, ntimes = 150, frac = 1, ...) ## S4 method for signature 'kernelMatrix' kmmd(x, y, Kxy, alpha = 0.05, asymptotic = FALSE, replace = TRUE, ntimes = 100, frac = 1, ...) ## S4 method for signature 'list' kmmd(x, y, kernel="stringdot", kpar = list(type = "spectrum", length = 4), alpha = 0.05, asymptotic = FALSE, replace = TRUE, ntimes = 150, frac = 1, ...)

Arguments

  • x: data values, in a matrix, list, or kernelMatrix

  • y: data values, in a matrix, list, or kernelMatrix

  • Kxy: kernlMatrix between xx and yy values (only for the kernelMatrix interface)

  • kernel: the kernel function used in training and predicting. This parameter can be set to any function, of class kernel, which computes a dot product between two vector arguments. kernlab provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:

    • rbfdot Radial Basis kernel function "Gaussian"
    • polydot Polynomial kernel function
    • vanilladot Linear kernel function
    • tanhdot Hyperbolic tangent kernel function
    • laplacedot Laplacian kernel function
    • besseldot Bessel kernel function
    • anovadot ANOVA RBF kernel function
    • splinedot Spline kernel
    • stringdot String kernel

    The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.

  • kpar: the list of hyper-parameters (kernel parameters). This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :

    • sigma inverse kernel width for the Radial Basis kernel function "rbfdot" and the Laplacian kernel "laplacedot".
    • degree, scale, offset for the Polynomial kernel "polydot"
    • scale, offset for the Hyperbolic tangent kernel function "tanhdot"
    • sigma, order, degree for the Bessel kernel "besseldot".
    • sigma, degree for the ANOVA kernel "anovadot".
    • lenght, lambda, normalized for the "stringdot" kernel where length is the length of the strings considered, lambda the decay factor and normalized a logical parameter determining if the kernel evaluations should be normalized.

    Hyper-parameters for user defined kernels can be passed through the kpar parameter as well. In the case of a Radial Basis kernel function (Gaussian) kpar can also be set to the string "automatic" which uses the heuristics in 'sigest' to calculate a good 'sigma' value for the Gaussian RBF or Laplace kernel, from the data. (default = "automatic").

  • alpha: the confidence level of the test (default: 0.05)

  • asymptotic: calculate the bounds asymptotically (suitable for smaller datasets) (default: FALSE)

  • replace: use replace when sampling for computing the asymptotic bounds (default : TRUE)

  • ntimes: number of times repeating the sampling procedure (default : 150)

  • frac: fraction of points to sample (frac : 1)

  • ...: additional parameters.

Details

kmmd calculates the kernel maximum mean discrepancy for samples from two distributions and conducts a test as to whether the samples are from different distributions with level alpha.

Returns

An S4 object of class kmmd containing the results of whether the H0 hypothesis is rejected or not. H0 being that the samples xx and yy come from the same distribution. The object contains the following slots : - H0: is H0 rejected (logical)

  • AsympH0: is H0 rejected according to the asymptotic bound (logical)

  • kernelf: the kernel function used.

  • mmdstats: the test statistics (vector of two)

  • Radbound: the Rademacher bound

  • Asymbound: the asymptotic bound

see kmmd-class for more details.

References

Gretton, A., K. Borgwardt, M. Rasch, B. Schoelkopf and A. Smola

A Kernel Method for the Two-Sample-Problem

Neural Information Processing Systems 2006, Vancouver

https://papers.neurips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf

Author(s)

Alexandros Karatzoglou

alexandros.karatzoglou@ci.tuwien.ac.at

See Also

ksvm

Examples

# create data x <- matrix(runif(300),100) y <- matrix(runif(300)+1,100) mmdo <- kmmd(x, y) mmdo
  • Maintainer: Alexandros Karatzoglou
  • License: GPL-2
  • Last published: 2024-08-13

Useful links