Gaussianize function

Gaussianize matrix-like objects

Gaussianize matrix-like objects

Gaussianize is probably the most useful function in this package. It works the same way as scale, but instead of just centering and scaling the data, it actually Gaussianizes the data (works well for unimodal data). See Goerg (2011, 2016) and Examples.

Important: For multivariate input X it performs a column-wise Gaussianization (by simply calling apply(X, 2, Gaussianize)), which is only a marginal Gaussianization. This does not mean (and is in general definitely not the case) that the transformed data is then jointly Gaussian.

By default Gaussianize returns the XN(μx,σx2)X \sim N(\mu_x, \sigma_x^2)

input, not the zero-mean, unit-variance UN(0,1)U \sim N(0, 1) input. Use return.u = TRUE to obtain UU.

Gaussianize( data = NULL, type = c("h", "hh", "s"), method = c("IGMM", "MLE"), return.tau.mat = FALSE, inverse = FALSE, tau.mat = NULL, verbose = FALSE, return.u = FALSE, input.u = NULL )

Arguments

  • data: a numeric matrix-like object; either the data that should be Gaussianized; or the data that should ''DeGaussianized'' (inverse = TRUE), i.e., converted back to the original space.

  • type: what type of non-normality: symmetric heavy-tails "h"

    (default), skewed heavy-tails "hh", or just skewed "s".

  • method: what estimator should be used: "MLE" or "IGMM". "IGMM" gives exactly Gaussian characteristics (kurtosis \equiv 3 for "h" or skewness \equiv 0 for "s"), "MLE" comes close to this. Default: "IGMM" since it is much faster than "MLE".

  • return.tau.mat: logical; if TRUE it also returns the estimated τ\tau parameters as a matrix (same number of columns as data). This matrix can then be used to Gaussianize new data with pre-estimated τ\tau. It can also be used to ``DeGaussianize'' data by passing it as an argument (tau.mat) to Gaussianize() and set inverse = TRUE.

  • inverse: logical; if TRUE it performs the inverse transformation using tau.mat to "DeGaussianize" the data back to the original space again.

  • tau.mat: instead of estimating τ\tau from the data you can pass it as a matrix (usually obtained via Gaussianize(..., return.tau.mat = TRUE)). If inverse = TRUE it uses this tau matrix to ``DeGaussianize'' the data again. This is useful to back-transform new data in the Gaussianized space, e.g., predictions or fits, back to the original space.

  • verbose: logical; if TRUE, it prints out progress information in the console. Default: FALSE.

  • return.u: logical; if TRUE it returns the zero-mean, unit variance Gaussian input. If FALSE (default) it returns the input XX.

  • input.u: optional; if you used return.u = TRUE in a previous step, and now you want to convert the data back to original space, then you have to pass it as input.u. If you pass numeric data as data, Gaussianize assumes that data is the input corresponding to XX, not UU.

Returns

numeric matrix-like object with same dimension/size as input data. If inverse = FALSE it is the Gaussianize matrix / vector; if TRUE it is the ``DeGaussianized'' matrix / vector.

The numeric parameters of mean, scale, and skewness/heavy-tail parameters that were used in the Gaussianizing transformation are returned as attributes of the output matrix: 'Gaussianized:mu', 'Gaussianized:sigma', and for

  • type = "h":: 'Gaussianized:delta' & 'Gaussianized:alpha',

  • type = "hh":: 'Gaussianized:delta_l' and 'Gaussianized:delta_r' & 'Gaussianized:alpha_l' and 'Gaussianized:alpha_r',

  • type = "s":: 'Gaussianized:gamma'.

They can also be returned as a separate matrix using return.tau.mat = TRUE. In this case Gaussianize returns a list with elements: - input: Gaussianized input data x\boldsymbol x (or u\boldsymbol u if return.u = TRUE), - tau.mat: matrix with τ\tau estimates that we used to get x; has same number of columns as x, and 3, 5, or 6 rows (depending on type='s', 'h', or 'hh').

Examples

# Univariate example set.seed(20) y1 <- rcauchy(n = 100) out <- Gaussianize(y1, return.tau.mat = TRUE) x1 <- get_input(y1, c(out$tau.mat[, 1])) # same as out$input test_normality(out$input) # Gaussianized a Cauchy! kStartFrom <- 20 y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)] x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)] plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, main="CLT in practice", xlab = "n", ylab="Cumulative sample average", ylim = range(y.cum.avg, x.cum.avg)) lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2) abline(h = 0) grid() legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), box.lty = 0, lwd = 2, lty = 1) plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output") grid() ## Not run: # multivariate example y2 <- 0.5 * y1 + rnorm(length(y1)) YY <- cbind(y1, y2) plot(YY) XX <- Gaussianize(YY, type = "hh") plot(XX) out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, verbose = TRUE, method = "IGMM") plot(out$input) out$tau.mat YY.hat <- Gaussianize(data = out$input, tau.mat = out$tau.mat, inverse = TRUE) plot(YY.hat[, 1], YY[, 1]) ## End(Not run)
  • Maintainer: Georg M. Goerg
  • License: GPL (>= 2)
  • Last published: 2023-11-30