Fast (Quasi-, Log-) Differences for Time Series and Panel Data
Fast (Quasi-, Log-) Differences for Time Series and Panel Data
fdiff is a S3 generic to compute (sequences of) suitably lagged / leaded and iterated differences, quasi-differences or (quasi-)log-differences. The difference and log-difference operators D and Dlog also exists as parsimonious wrappers around fdiff, providing more flexibility than fdiff when applied to data frames.
fdiff(x, n =1, diff =1,...) D(x, n =1, diff =1,...) Dlog(x, n =1, diff =1,...)## Default S3 method:fdiff(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, log =FALSE, rho =1, stubs =TRUE,...)## Default S3 method:D(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]],...)## Default S3 method:Dlog(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]],...)## S3 method for class 'matrix'fdiff(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, log =FALSE, rho =1, stubs = length(n)+ length(diff)>2L,...)## S3 method for class 'matrix'D(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]],...)## S3 method for class 'matrix'Dlog(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]],...)## S3 method for class 'data.frame'fdiff(x, n =1, diff =1, g =NULL, t =NULL, fill =NA, log =FALSE, rho =1, stubs = length(n)+ length(diff)>2L,...)## S3 method for class 'data.frame'D(x, n =1, diff =1, by =NULL, t =NULL, cols = is.numeric, fill =NA, rho =1, stubs = .op[["stub"]], keep.ids =TRUE,...)## S3 method for class 'data.frame'Dlog(x, n =1, diff =1, by =NULL, t =NULL, cols = is.numeric, fill =NA, rho =1, stubs = .op[["stub"]], keep.ids =TRUE,...)# Methods for indexed data / compatibility with plm:## S3 method for class 'pseries'fdiff(x, n =1, diff =1, fill =NA, log =FALSE, rho =1, stubs = length(n)+ length(diff)>2L, shift ="time",...)## S3 method for class 'pseries'D(x, n =1, diff =1, fill =NA, rho =1, stubs = .op[["stub"]], shift ="time",...)## S3 method for class 'pseries'Dlog(x, n =1, diff =1, fill =NA, rho =1, stubs = .op[["stub"]], shift ="time",...)## S3 method for class 'pdata.frame'fdiff(x, n =1, diff =1, fill =NA, log =FALSE, rho =1, stubs = length(n)+ length(diff)>2L, shift ="time",...)## S3 method for class 'pdata.frame'D(x, n =1, diff =1, cols = is.numeric, fill =NA, rho =1, stubs = .op[["stub"]], shift ="time", keep.ids =TRUE,...)## S3 method for class 'pdata.frame'Dlog(x, n =1, diff =1, cols = is.numeric, fill =NA, rho =1, stubs = .op[["stub"]], shift ="time", keep.ids =TRUE,...)# Methods for grouped data frame / compatibility with dplyr:## S3 method for class 'grouped_df'fdiff(x, n =1, diff =1, t =NULL, fill =NA, log =FALSE, rho =1, stubs = length(n)+ length(diff)>2L, keep.ids =TRUE,...)## S3 method for class 'grouped_df'D(x, n =1, diff =1, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]], keep.ids =TRUE,...)## S3 method for class 'grouped_df'Dlog(x, n =1, diff =1, t =NULL, fill =NA, rho =1, stubs = .op[["stub"]], keep.ids =TRUE,...)
Arguments
x: a numeric vector / time series, (time series) matrix, data frame, 'indexed_series' ('pseries'), 'indexed_frame' ('pdata.frame') or grouped data frame ('grouped_df').
n: integer. A vector indicating the number of lags or leads.
diff: integer. A vector of integers > 1 indicating the order of differencing / log-differencing.
g: a factor, GRP object, or atomic vector / list of vectors (internally grouped with group) used to group x. Note that without t, all values in a group need to be consecutive and in the right order. See Details of flag.
by: data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.
t: a time vector or list of vectors. See flag.
cols: data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by.
fill: value to insert when vectors are shifted. Default is NA.
log: logical. TRUE computes log-differences. See Details.
rho: double. Autocorrelation parameter. Set to a value between 0 and 1 for quasi-differencing. Any numeric value can be supplied.
stubs: logical. TRUE (default) will rename all differenced columns by adding prefixes "LnDdiff." / "FnDdiff." for differences "LnDlogdiff." / "FnDlogdiff." for log-differences and replacing "D" / "Dlog" with "QD" / "QDlog" for quasi-differences.
shift: pseries / pdata.frame methods: character. "time" or "row". See flag for details.
keep.ids: data.frame / pdata.frame / grouped_df methods: Logical. Drop all identifiers from the output (which includes all variables passed to by or t using formulas). Note: For 'grouped_df' / 'pdata.frame' identifiers are dropped, but the "groups" / "index" attributes are kept.
...: arguments to be passed to or from other methods.
Details
By default, fdiff/D/Dlog return x with all columns differenced / log-differenced. Differences are computed as repeat(diff) x[i] - rho*x[i-n], and log-differences as log(x[i]) - rho*log(x[i-n]) for diff = 1 and repeat(diff-1) x[i] - rho*x[i-n] is used to compute subsequent differences (usually diff = 1 for log-differencing). If rho < 1, this becomes quasi- (or partial) differencing, which is a technique suggested by Cochrane and Orcutt (1949) to deal with serial correlation in regression models, where rho is typically estimated by running a regression of the model residuals on the lagged residuals.
It is also possible to compute forward differences by passing negative n values. n also supports arbitrary vectors of integers (lags), and diff supports positive sequences of integers (differences):
If more than one value is passed to n and/or diff, the data is expanded-wide as follows: If x is an atomic vector or time series, a (time series) matrix is returned with columns ordered first by lag, then by difference. If x is a matrix or data frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff) columns ordered first by column name, then by lag, then by difference.
For further computational details and efficiency considerations see the help page of flag.
Returns
x differenced diff times using lags n of itself. Quasi and log-differences are toggled by the rho and log arguments or the Dlog operator. Computations can be grouped by g/by and/or ordered by t. See Details and Examples.
References
Cochrane, D.; Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms. Journal of the American Statistical Association. 44 (245): 32-61.
Prais, S. J. & Winsten, C. B. (1954). Trend Estimators and Serial Correlation. Cowles Commission Discussion Paper No. 383. Chicago.
See Also
flag/L/F, fgrowth/G, Time Series and Panel Series , Collapse Overview
Examples
## Simple Time Series: AirPassengersD(AirPassengers)# 1st difference, same as fdiff(AirPassengers)D(AirPassengers,-1)# Forward differenceDlog(AirPassengers)# Log-differenceD(AirPassengers,1,2)# Second differenceDlog(AirPassengers,1,2)# Second log-differenceD(AirPassengers,12)# Seasonal difference (data is monthly)D(AirPassengers,# Quasi-difference, see a better example below rho = pwcor(AirPassengers, L(AirPassengers)))head(D(AirPassengers,-2:2,1:3))# Sequence of leaded/lagged and iterated differences# let's do some visual analysisplot(AirPassengers)# Plot the series - seasonal pattern is evidentplot(stl(AirPassengers,"periodic"))# Seasonal decompositionplot(D(AirPassengers,c(1,12),1:2))# Plotting ordinary and seasonal first and second differencesplot(stl(window(D(AirPassengers,12),# Taking seasonal differences removes most seasonal variation1950),"periodic"))## Time Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per yearplot(D(EuStockMarkets, c(0,260)))# Plot series and annual differncesmod <- lm(DAX ~., L(EuStockMarkets, c(0,260)))# Regressing the DAX on its annual lagsummary(mod)# and the levels and annual lags othersr <- residuals(mod)# Obtain residualspwcor(r, L(r))# Residual AutocorrelationfFtest(r, L(r))# F-test of residual autocorrelation# (better use lmtest :: bgtest)modCO <- lm(QD1.DAX ~., D(L(EuStockMarkets, c(0,260)),# Cochrane-Orcutt (1949) estimation rho = pwcor(r, L(r))))summary(modCO)rCO <- residuals(modCO)fFtest(rCO, L(rCO))# No more autocorrelation## World Development Panel Datahead(fdiff(num_vars(wlddev),1,1,# Computes differences of numeric variables wlddev$country, wlddev$year))# fdiff requires external inputs..head(D(wlddev,1,1,~country,~year))# Differences of numeric variableshead(D(wlddev,1,1,~country))# Without t: Works because data is orderedhead(D(wlddev,1,1, PCGDP + LIFEEX ~ country,~year))# Difference of GDP & Life Expectancyhead(D(wlddev,0:1,1,~ country,~year, cols =9:10))# Same, also retaining original serieshead(D(wlddev,0:1,1,~ country,~year,9:10,# Dropping id columns keep.ids =FALSE))## Indexed computations:wldi <- findex_by(wlddev, iso3c, year)# Dynamic Panel Data Models:summary(lm(D(PCGDP)~ L(PCGDP)+ D(LIFEEX), data = wldi))# Simple casesummary(lm(Dlog(PCGDP)~ L(log(PCGDP))+ Dlog(LIFEEX), data = wldi))# In log-differneces# Adding a lagged difference...summary(lm(D(PCGDP)~ L(D(PCGDP,0:1))+ L(D(LIFEEX),0:1), data = wldi))summary(lm(Dlog(PCGDP)~ L(Dlog(PCGDP,0:1))+ L(Dlog(LIFEEX),0:1), data = wldi))# Same thing:summary(lm(D1.PCGDP ~., data = L(D(wldi,0:1,1,9:10),0:1,keep.ids =FALSE)[,-1]))## Grouped datalibrary(magrittr)wlddev |> fgroup_by(country)|> fselect(PCGDP,LIFEEX)|> fdiff(0:1,1:2)# Adding a first and second differencewlddev |> fgroup_by(country)|> fselect(year,PCGDP,LIFEEX)|> D(0:1,1:2,year)# Also using t (safer)wlddev |> fgroup_by(country)|># Dropping id's fselect(year,PCGDP,LIFEEX)|> D(0:1,1:2,year, keep.ids =FALSE)