IpSdEwma function

Incremental Processing Shift-Detection based on EWMA (SD-EWMA).

Incremental Processing Shift-Detection based on EWMA (SD-EWMA).

IpSdEwma allows the calculation of anomalies using SD-EWMA in an incremental processing mode. See also OipSdEwma, the optimized and faster function of this function SD-EWMA algorithm is a novel method for covariate shift-detection tests based on a two-stage structure for univariate time-series. It works in an online mode and it uses an exponentially weighted moving average (EWMA) model based control chart to detect the covariate shift-point in non-stationary time-series.

IpSdEwma(data, n.train, threshold = 0.01, l = 3, last.res = NULL)

Arguments

  • data: Numerical vector with training and test dataset.
  • n.train: Number of points of the dataset that correspond to the training set.
  • threshold: Error smoothing constant.
  • l: Control limit multiplier.
  • last.res: Last result returned by the algorithm.

Returns

A list of the following items.

  • result: dataset conformed by the following columns.

  • is.anomaly 1 if the value is anomalous 0 otherwise.
  • ucl Upper control limit.
  • lcl Lower control limit.
  • last.res: Last result returned by the algorithm. Is a dataset containing the parameters calculated in the last iteration and necessary for the next one.

Details

data must be a numerical vector without NA values. threshold must be a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. l is the parameter that determines the control limits. By default, 3 is used. Finally last.res is the last result returned by some previous execution of this algorithm. The first time the algorithm is executed its value is NULL. However, to run a new batch of data without having to include it in the old dataset and restart the process, the two parameters returned by the last run are only needed.

This algorithm can be used for both classical and incremental processing. It should be noted that in case of having a finite dataset the CpSdEwma or OcpSdEwma algorithms are faster. Incremental processing can be used in two ways. 1) Processing all available data and saving last.res for future runs in which there is new data. 2) Using the stream library for when there is too much data and it does not fit into memory. An example has been made for this use case.

Examples

## EXAMPLE 1: ---------------------- ## It can be used in the same way as with CpSdEwma passing the whole dataset as ## an argument. ## Generate data set.seed(100) n <- 200 x <- sample(1:100, n, replace = TRUE) x[70:90] <- sample(110:115, 21, replace = TRUE) x[25] <- 200 x[150] <- 170 df <- data.frame(timestamp = 1:n, value = x) ## Calculate anomalies result <- IpSdEwma( data = df$value, n.train = 5, threshold = 0.01, l = 3 ) res <- cbind(df, result$result) ## Plot results PlotDetections(res, title = "SD-EWMA ANOMALY DETECTOR") ## EXAMPLE 2: ---------------------- ## You can use it in an incremental way. This is an example using the stream ## library. This library allows the simulation of streaming operation. # install.packages("stream") library("stream") ## Generate data set.seed(100) n <- 350 x <- sample(1:100, n, replace = TRUE) x[70:90] <- sample(110:115, 21, replace = TRUE) x[25] <- 200 x[320] <- 170 df <- data.frame(timestamp = 1:n, value = x) dsd_df <- DSD_Memory(df) ## Initialize parameters for the loop last.res <- NULL res <- NULL nread <- 100 numIter <- n%/%nread ## Calculate anomalies for(i in 1:numIter) { # read new data newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore") # calculate if it's an anomaly last.res <- IpSdEwma( data = newRow$value, n.train = 5, threshold = 0.01, l = 3, last.res = last.res$last.res ) # prepare the result if(!is.null(last.res$result)){ res <- rbind(res, cbind(newRow, last.res$result)) } } ## Plot results PlotDetections(res, title = "SD-EWMA ANOMALY DETECTOR")

References

Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.

  • Maintainer: Alaiñe Iturria
  • License: AGPL (>= 3)
  • Last published: 2019-09-06