svymean_winsorized function

Weighted Winsorized Mean and Total

Weighted Winsorized Mean and Total

Weighted winsorized mean and total

svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...) svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)

Arguments

  • x: a one-sided [formula], e.g., ~myVariable.
  • design: an object of class survey.design; see svydesign.
  • LB: [double] lower bound of winsorization such that 00 \leq LB << UB 1\leq 1.
  • UB: [double] upper bound of winsorization such that 00 \leq LB << UB 1\leq 1.
  • na.rm: [logical] indicating whether NA values should be removed before the computation proceeds (default: FALSE).
  • trim_var: [logical] indicating whether the variance should be approximated by the variance estimator of the trimmed mean/ total (default: FALSE).
  • k: [integer] number of observations to be winsorized at the top of the distribution.
  • ...: additional arguments (currently not used).

Details

Package survey must be attached to the search path in order to use the functions (see library or require).

  • Characteristic.: Population mean or total. Let μ\mu

     denote the estimated winsorized population mean; then, the estimated winsorized total is given by $Nhat \mu$ with $Nhat = sum(w[i])$, where summation is over all observations in the sample.
    
  • Modes of winsorization.: The amount of winsorization can be specified in relative or absolute terms:

      * **Relative:** By specifying `LB` and `UB`, the method winsorizes the `LB`$~\cdot 100\%$
        
        of the smallest observations and the (1 - `UB`)$~\cdot 100\%$ of the largest observations from the data.
      * **Absolute:** By specifying argument `k` in the functions with the "infix" `_k_` in their name (e.g., `svymean_k_winsorized`), the largest $k$ observations are winsorized, $0\<k\<n$, where $n$ denotes the sample size. E.g., `k = 2`
        
        implies that the largest and the second largest observation are winsorized.
    
  • Variance estimation.: Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994). Two estimators are available:

     - **`simple_var = FALSE`**: Variance estimator of the winsorized mean/ total. The estimator depends on the estimated probability density function evaluated at the winsorization thresholds, which can be -- depending on the context -- numerically unstable. As a remedy, a simplified variance estimator is available by setting `simple_var = TRUE`.
     - **`simple_var = TRUE`**: Variance is approximated using the variance estimator of the trimmed mean/ total.
    
  • Utility functions.: summary, coef, SE, vcov, residuals, fitted and robweights.

  • Bare-bone functions.: See:

      * `weighted_mean_winsorized`,
      * `weighted_mean_k_winsorized`,
      * `weighted_total_winsorized`,
      * `weighted_total_k_winsorized`.
    

Returns

Object of class svystat_rob

References

Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. tools:::Rd_expr_doi("10.1002/9780470434697")

Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22 , 976--967. tools:::Rd_expr_doi("10.1214/aos/1176325505")

See Also

Overview (of all implemented functions)

weighted_mean_winsorized, weighted_mean_k_winsorized, weighted_total_winsorized and weighted_total_k_winsorized

Examples

head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated winsorized population mean (5% symmetric winsorization) svymean_winsorized(~employment, dn, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) svytotal_k_winsorized(~employment, dn, k = 2)
  • Maintainer: Tobias Schoch
  • License: GPL (>= 2)
  • Last published: 2024-08-22