RobustNormalization function

RobustNormalization

RobustNormalization

RobustNormalization as described in [Milligan/Cooper, 1988].

RobustNormalization(Data,Centered=FALSE,Capped=FALSE, na.rm=TRUE,WithBackTransformation=FALSE, pmin=0.01,pmax=0.99)

Arguments

  • Data: [1:n,1:d] data matrix of n cases and d features
  • Centered: centered data around zero by median if TRUE
  • Capped: TRUE: outliers are capped above 1 or below -1 and set to 1 or -1.
  • na.rm: If TRUE, infinite vlaues are disregarded
  • WithBackTransformation: If in the case for forecasting with neural networks a backtransformation is required, this parameter can be set to 'TRUE'.
  • pmin: defines outliers on the lower end of scale
  • pmax: defines outliers on the higher end of scale

Details

Normalizes features either between -1 to 1 (Centered=TRUE) or 0-1 (Centered=TRUE) without changing the distribution of a feature itself. For a more precise description please read [Thrun, 2018, p.17].

"[The] scaling of the inputs determines the effective scaling of the weights in the last layer of a MLP with BP neural netowrk, it can have a large effect on the quality of the final solution. At the outset it is besto to standardize all inputs to have mean zero and standard deviation 1 [(or at least the range under 1)]. This ensures all inputs are treated equally in the regularization prozess, and allows to choose a meaningful range for the random starting weights."[Friedman et al., 2012]

Returns

if WithBackTransformation=FALSE: TransformedData[1:n,1:d] i.e., normalized data matrix of n cases and d features

if WithBackTransformation=TRUE: List with - TransformedData: [1:n,1:d] normalized data matrix of n cases and d features

  • MinX: [1:d] numerical vector used for manual back-transformation of each feature

  • MaxX: [1:d] numerical vector used for manual back-transformation of each feature

  • Denom: [1:d] numerical vector used for manual back-transformation of each feature

  • Center: [1:d] numerical vector used for manual back-transformation of each feature

References

[Milligan/Cooper, 1988] Milligan, G. W., & Cooper, M. C.: A study of standardization of variables in cluster analysis, Journal of Classification, Vol. 5(2), pp. 181-204. 1988.

[Friedman et al., 2012] Friedman, J., Hastie, T., & Tibshirani, R.: The Elements of Statistical Learning, (Second ed. Vol. 1), Springer series in statistics New York, NY, USA:, ISBN, 2012.

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, tools:::Rd_expr_doi("10.1007/978-3-658-20540-9") , 2018.

Author(s)

Michael Thrun

Examples

Scaled = RobustNormalization(rnorm(1000, 2, 100), Capped = TRUE) hist(Scaled) m = cbind(c(1, 2, 3), c(2, 6, 4)) List = RobustNormalization(m, FALSE, FALSE, FALSE, TRUE) TransformedData = List$TransformedData mback = RobustNorm_BackTrafo(TransformedData, List$MinX, List$Denom, List$Center) sum(m - mback)

See Also

RobustNorm_BackTrafo

  • Maintainer: Michael Thrun
  • License: GPL-3
  • Last published: 2024-06-20