x: the sequence of observations. Missing values are permitted and will be replaced.
span: the maximum number of observations on each side of each range of missing values to use in constructing the time-series model. See Details .
Dates: an optional vector of dates/times associated with each value in x. Useful if there are gaps in dates/times.
max.fill: the maximum gap to fill.
Returns
The observations in x with missing values replaced by interpolation.
Details
Missing values at the beginning and end of x will not be replaced.
The argument span is used to help set the range of values used to construct the StructTS model. If span is set small, then the variance of epsilon dominates and the estimates are not smooth. If span is large, then the variance of level dominates and the estimates are linear interpolations. The variances of level and epsilon are components of the state-space model used to interpolate values, see StructTS for details. See Note for more information about the method.
If span is set larger than 99, then the entire time series is used to estimate all missing values. This approach may be useful if there are many periods of missing values. If span is set to any number less than 4, then simple linear interpolation will be used to replace missing values.
Added from smwrBase.
Note
The method used to interpolate missing values is based on tsSmooth constructed using StructTS on x with type set to "trend." The smoothing method basically uses the information (slope) from two values previous to missing values and the two values following missing values to smoothly interpolate values accounting for any change in slope. Beauchamp (1989) used time-series methods for synthesizing missing streamflow records. The group that is used to define the statistics that control the interpolation is very simply defined by span rather than the more in-depth measures described in Elshorbagy and others (2000).
If the data have gaps rather than missing values, then fillMissing will return a vector longer than x if Dates is given and the return data cannot be inserted into the original data frame. If Dates is not given, then the gap will be recognized and not be filled. The function insertMissing can be used to create a data frame with the complete sequence of dates.
Examples
## Not run:#library(smwrData)data(Q05078470)# Create missing values in flow, the first sequence is a peak and the second is a recessionQ05078470$FlowMiss <- Q05078470$FLOW
Q05078470$FlowMiss[c(109:111,198:201)]<-NA# Interpolate the missing valuesQ05078470$FlowFill <- fillMissing(Q05078470$FlowMiss)# How did we do (line is actual, points are filled values)?par(mfrow=c(2,1), mar=c(5.1,4.1,1.1,1.1))with(Q05078470[100:120,], plot(DATES, FLOW, type="l"))with(Q05078470[109:111,], points(DATES, FlowFill))with(Q05078470[190:210,], plot(DATES, FLOW, type="l"))with(Q05078470[198:201,], points(DATES, FlowFill))## End(Not run)
References
Beauchamp, J.J., 1989, Comparison of regression and time-series methods for synthesizing missing streamflow records: Water Resources Bulletin, v. 25, no. 5, p. 961--975.
Elshorbagy, A.A., Panu, U.S., and Simonovic, S.P., 2000, Group-based estimation of missing hydrological data, I. Approach and general methodology: Hydrological Sciences Journal, v. 45, no. 6, p. 849--866.