reweightData function

Function to Reweight Data

Function to Reweight Data

reweightData( data, argvals, vars, longvars = NULL, weights, index, idvars = NULL, compress = FALSE )

Arguments

  • data: a named list or data.frame.
  • argvals: character (vector); name(s) for entries in data giving the index for observed grid points; must be supplied if vars is not supplied.
  • vars: character (vector); name(s) for entries in data, which are subsetted according to weights or index. Must be supplied if argvals is not supplied.
  • longvars: variables in long format, e.g., a response that is observed at curve specific grids.
  • weights: vector of weights for observations. Must be supplied if index is not supplied.
  • index: vector of indices for observations. Must be supplied if weights is not supplied.
  • idvars: character (vector); index, which is needed to expand vars to be conform with the hmatrix structure when using bhistx-base-learners or to be conform with variables in long format specified in longvars.
  • compress: logical; whether hmatrix objects are saved in compressed form or not. Default is TRUE. Should be set to FALSE when using reweightData for nested resampling.

Returns

A list with the reweighted or subsetted data.

Details

reweightData indexes the rows of matrices and / or positions of vectors by using either the index or the weights-argument. To prevent the function from indexing the list entry / entries, which serve as time index for observed grid points of each trajectory of functional observations, the argvals argument (vector of character names for these list entries) can be supplied. If argvals is not supplied, vars must be supplied and it is assumed that argvals is equal to names(data)[!names(data) %in% vars].

When using weights, a weight vector of length N must be supplied, where N is the number of observations. When using index, the vector must contain the index of each row as many times as it shall be included in the new data set.

Examples

## load data data("viscosity", package = "FDboost") interval <- "101" end <- which(viscosity$timeAll == as.numeric(interval)) viscosity$vis <- log(viscosity$visAll[ , 1:end]) viscosity$time <- viscosity$timeAll[1:end] ## what does data look like str(viscosity) ## do some reweighting # correct weights str(reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), argvals = "time", weights = c(0, 32, 32, rep(0, 61)))) str(visNew <- reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), argvals = "time", weights = c(0, 32, 32, rep(0, 61)))) # check the result # visNew$vis[1:5, 1:5] ## image(visNew$vis) # incorrect weights str(reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), argvals = "time", weights = sample(1:64, replace = TRUE)), 1) # supply meaningful index str(visNew <- reweightData(viscosity, vars = c("vis", "T_C", "T_A", "rspeed", "mflow"), argvals = "time", index = rep(1:32, each = 2))) # check the result # visNew$vis[1:5, 1:5] # errors if(FALSE){ reweightData(viscosity, argvals = "") reweightData(viscosity, argvals = "covThatDoesntExist", index = rep(1,64)) }

Author(s)

David Ruegamer, Sarah Brockhaus

  • Maintainer: David Ruegamer
  • License: GPL-2
  • Last published: 2023-08-12