auto_data_cleaning function

Perform automatic data cleaning of time series data

Perform automatic data cleaning of time series data

Returns a matrix or a list of matrices with imputed missing values and outliers. The function automatizes the usage of functions model_missing_data , detect_outliers and impute_modelled_data . The function is designed for numerical data only.

auto_data_cleaning( data, S, tau = NULL, no.of.last.indices.to.fix = S[1], indices.to.fix = NULL, model.missing.pars = list(), detect.outliers.pars = list() )

Arguments

  • data: an input vector, matrix or data frame of dimension nobs x nvars containing missing values; each column is a variable.
  • S: a number or vector describing the seasonalities (S_1, ..., S_K) in the data, e.g. c(24, 168) if the data consists of 24 observations per day and there is a weekly seasonality in the data.
  • tau: the quantile(s) of the missing values to be estimated in the quantile regression. Tau accepts all values in (0,1). If NULL, then the weighted lasso regression is performed.
  • no.of.last.indices.to.fix: a number of observations in the tail of the data to be fixed, by default set to S.
  • indices.to.fix: indices of the data to be fixed. If NULL, then it is calculated based on the no.of.last.indices.to.fix parameter. Otherwise, the no.of.last.indices.to.fix parameter is ignored.
  • model.missing.pars: named list containing additional arguments for the model_missing_data function.
  • detect.outliers.pars: named list containing additional arguments for the detect_outliers function.

Returns

A list which contains a matrix or a list of matrices with imputed missing values or outliers, the indices of the data that were modelled, and the given quantile values.

Details

The function calls model_missing_data to clean the data from missing values, detect_outliers to detect outliers, removes them and finally applies again model_missing_data function. For details see the functions' respective help sections. if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="*",package="tsrobprep",cached_env=.Rdpack.currefs,dont_cite=TRUE)

Examples

## Not run: autoclean <- auto_data_cleaning( data = GBload[,-1], S = c(48, 7*48), no.of.last.indices.to.fix = dim(GBload)[1], model.missing.pars = list(consider.as.missing = 0, min.val = 0) ) autoclean$replaced.indices ## End(Not run)

References

if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_all_ref(.Rdpack.currefs)

See Also

model_missing_data, detect_outliers , impute_modelled_data

  • Maintainer: Michał Narajewski
  • License: MIT + file LICENSE
  • Last published: 2022-02-22

Useful links