This function makes the data balanced, i.e., each individual has the same time periods, by filling in or dropping observations
make.pbalanced( x, balance.type = c("fill","shared.times","shared.individuals"),...)## S3 method for class 'pdata.frame'make.pbalanced( x, balance.type = c("fill","shared.times","shared.individuals"),...)## S3 method for class 'pseries'make.pbalanced( x, balance.type = c("fill","shared.times","shared.individuals"),...)## S3 method for class 'data.frame'make.pbalanced( x, balance.type = c("fill","shared.times","shared.individuals"), index =NULL,...)
Arguments
x: an object of class pdata.frame, data.frame, or pseries;
balance.type: character, one of "fill", "shared.times", or "shared.individuals", see Details ,
...: further arguments.
index: only relevant for data.frame interface; if NULL, the first two columns of the data.frame are assumed to be the index variables; if not NULL, both dimensions ('individual', 'time') need to be specified by index as character of length 2 for data frames, for further details see pdata.frame(),
Returns
An object of the same class as the input x, i.e., a pdata.frame, data.frame or a pseries which is made balanced based on the index variables. The returned data are sorted as a stacked time series.
Details
(p)data.frame and pseries objects are made balanced, meaning each individual has the same time periods. Depending on the value of balance.type, the balancing is done in different ways:
balance.type = "fill" (default): The union of available time periods over all individuals is taken (w/o NA values). Missing time periods for an individual are identified and corresponding rows (elements for pseries) are inserted and filled with NA for the non--index variables (elements for a pseries). This means, only time periods present for at least one individual are inserted, if missing.
balance.type = "shared.times": The intersect of available time periods over all individuals is taken (w/o NA values). Thus, time periods not available for all individuals are discarded, i. e., only time periods shared by all individuals are left in the result).
balance.type = "shared.individuals": All available time periods are kept and those individuals are dropped for which not all time periods are available, i. e., only individuals shared by all time periods are left in the result (symmetric to "shared.times").
The data are not necessarily made consecutive (regular time series with distance 1), because balancedness does not imply consecutiveness. For making the data consecutive, use make.pconsecutive() (and, optionally, set argument balanced = TRUE to make consecutive and balanced, see also Examples for a comparison of the two functions.
Note: Rows of (p)data.frames (elements for pseries) with NA
values in individual or time index are not examined but silently dropped before the data are made balanced. In this case, it cannot be inferred which individual or time period is meant by the missing value(s) (see also Examples ). Especially, this means: NA values in the first/last position of the original time periods for an individual are dropped, which are usually meant to depict the beginning and ending of the time series for that individual. Thus, one might want to check if there are any NA values in the index variables before applying make.pbalanced, and especially check for NA values in the first and last position for each individual in original data and, if so, maybe set those to some meaningful begin/end value for the time series.
Examples
# take data and make it unbalanced# by deletion of 2nd row (2nd time period for first individual)data("Grunfeld", package ="plm")nrow(Grunfeld)# 200 rowsGrunfeld_missing_period <- Grunfeld[-2,]pdim(Grunfeld_missing_period)$balanced # check if balanced: FALSEmake.pbalanced(Grunfeld_missing_period)# make it balanced (by filling)make.pbalanced(Grunfeld_missing_period, balance.type ="shared.times")# (shared periods)nrow(make.pbalanced(Grunfeld_missing_period))nrow(make.pbalanced(Grunfeld_missing_period, balance.type ="shared.times"))# more complex data:# First, make data unbalanced (and non-consecutive) # by deletion of 2nd time period (year 1936) for all individuals# and more time periods for first individual onlyGrunfeld_unbalanced <- Grunfeld[Grunfeld$year !=1936,]Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4),]pdim(Grunfeld_unbalanced)$balanced # FALSEall(is.pconsecutive(Grunfeld_unbalanced))# FALSEg_bal <- make.pbalanced(Grunfeld_unbalanced)pdim(g_bal)$balanced # TRUEunique(g_bal$year)# all years but 1936nrow(g_bal)# 190 rowshead(g_bal)# 1st individual: years 1935, 1939 are NA# NA in 1st, 3rd time period (years 1935, 1937) for first individualGrunfeld_NA <- Grunfeld
Grunfeld_NA[c(1,3),"year"]<-NAg_bal_NA <- make.pbalanced(Grunfeld_NA)head(g_bal_NA)# years 1935, 1937: NA for non-index varsnrow(g_bal_NA)# 200# pdata.frame interfacepGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period)make.pbalanced(Grunfeld_missing_period)# pseries interfacemake.pbalanced(pGrunfeld_missing_period$inv)# comparison to make.pconsecutiveg_consec <- make.pconsecutive(Grunfeld_unbalanced)all(is.pconsecutive(g_consec))# TRUEpdim(g_consec)$balanced # FALSEhead(g_consec,22)# 1st individual: no years 1935/6; 1939 is NA; # other indviduals: years 1935-1954, 1936 is NAnrow(g_consec)# 198 rowsg_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced =TRUE)all(is.pconsecutive(g_consec_bal))# TRUEpdim(g_consec_bal)$balanced # TRUEhead(g_consec_bal)# year 1936 is NA for all individualsnrow(g_consec_bal)# 200 rowshead(g_bal)# no year 1936 at allnrow(g_bal)# 190 rows
See Also
is.pbalanced() to check if data are balanced; is.pconsecutive() to check if data are consecutive; make.pconsecutive() to make data consecutive (and, optionally, also balanced).
punbalancedness()
for two measures of unbalancedness, pdim() to check the dimensions of a 'pdata.frame' (and other objects), pvar() to check for individual and time variation of a 'pdata.frame' (and other objects), lag() for lagging (and leading) values of a 'pseries' object.