R6 Class for Storing / Accessing & Sampling Longitudinal Data
A longdata
object allows for efficient storage and recall of longitudinal datasets for use in bootstrap sampling. The object works by de-constructing the data into lists based upon subject id thus enabling efficient lookup.
The object also handles multiple other operations specific to rbmi
such as defining whether an outcome value is MAR / Missing or not as well as tracking which imputation strategy is assigned to each subject.
It is recognised that this objects functionality is fairly overloaded and is hoped that this can be split out into more area specific objects / functions in the future. Further additions of functionality to this object should be avoided if possible.
data
: The original dataset passed to the constructor (sorted by id and visit)
vars
: The vars object (list of key variables) passed to the constructor
visits
: A character vector containing the distinct visit levels
ids
: A character vector containing the unique ids of each subject in self$data
formula
: A formula expressing how the design matrix for the data should be constructed
strata
: A numeric vector indicating which strata each corresponding value of self$ids
belongs to. If no stratification variable is defined this will default to 1 for all subjects (i.e. same group). This field is only used as part of the self$sample_ids()
function to enable stratified bootstrap sampling
ice_visit_index
: A list indexed by subject storing the index number of the first visit affected by the ICE. If there is no ICE then it is set equal to the number of visits plus 1.
values
: A list indexed by subject storing a numeric vector of the original (unimputed) outcome values
group
: A list indexed by subject storing a single character indicating which imputation group the subject belongs to as defined by self$data[id, self$ivars$group]
It is used to determine what reference group should be used when imputing the subjects data.
is_mar
: A list indexed by subject storing logical values indicating if the subjects outcome values are MAR or not. This list is defaulted to TRUE for all subjects & outcomes and is then modified by calls to self$set_strategies()
. Note that this does not indicate which values are missing, this variable is True for outcome values that either occurred before the ICE visit or are post the ICE visit and have an imputation strategy of MAR
strategies
: A list indexed by subject storing a single character value indicating the imputation strategy assigned to that subject. This list is defaulted to "MAR" for all subjects and is then modified by calls to either self$set_strategies()
or self$update_strategies()
strategy_lock
: A list indexed by subject storing a single logical value indicating whether a patients imputation strategy is locked or not. If a strategy is locked it means that it can't change from MAR to non-MAR. Strategies can be changed from non-MAR to MAR though this will trigger a warning. Strategies are locked if the patient is assigned a MAR strategy and has non-missing after their ICE date. This list is populated by a call to self$set_strategies()
.
indexes
: A list indexed by subject storing a numeric vector of indexes which specify which rows in the original dataset belong to this subject i.e. to recover the full data for subject "pt3" you can use self$data[self$indexes[["pt3"]],]
. This may seem redundant over filtering the data directly however it enables efficient bootstrap sampling of the data i.e.
```
indexes <- unlist(self$indexes[c("pt3", "pt3")])
self$data[indexes,]
```
This list is populated during the object initialisation.
is_missing
: A list indexed by subject storing a logical vector indicating whether the corresponding outcome of a subject is missing. This list is populated during the object initialisation.
is_post_ice
: A list indexed by subject storing a logical vector indicating whether the corresponding outcome of a subject is post the date of their ICE. If no ICE data has been provided this defaults to False for all observations. This list is populated by a call to self$set_strategies()
.
get_data()
Returns a data.frame
based upon required subject IDs. Replaces missing values with new ones if provided.
longDataConstructor$get_data(
obj = NULL,
nmar.rm = FALSE,
na.rm = FALSE,
idmap = FALSE
)
obj
: Either NULL
, a character vector of subjects IDs or a imputation list object. See details.
nmar.rm
: Logical value. If TRUE
will remove observations that are not regarded as MAR (as determined from self$is_mar
).
na.rm
: Logical value. If TRUE
will remove outcome values that are missing (as determined from self$is_missing
).
idmap
: Logical value. If TRUE
will add an attribute idmap
which contains a mapping from the new subject ids to the old subject ids. See details.
If obj
is NULL
then the full original dataset is returned.
If obj
is a character vector then a new dataset consisting of just those subjects is returned; if the character vector contains duplicate entries then that subject will be returned multiple times.
If obj
is an imputation_df
object (as created by imputation_df()
) then the subject ids specified in the object will be returned and missing values will be filled in by those specified in the imputation list object. i.e.
obj <- imputation_df(
imputation_single( id = "pt1", values = c(1,2,3)),
imputation_single( id = "pt1", values = c(4,5,6)),
imputation_single( id = "pt3", values = c(7,8))
)
longdata$get_data(obj)
Will return a data.frame
consisting of all observations for pt1
twice and all of the observations for pt3
once. The first set of observations for pt1
will have missing values filled in with c(1,2,3)
and the second set will be filled in by c(4,5,6)
. The length of the values must be equal to sum(self$is_missing[[id]])
.
If obj
is not NULL
then all subject IDs will be scrambled in order to ensure that they are unique i.e. If the pt2
is requested twice then this process guarantees that each set of observations be have a unique subject ID number. The idmap
attribute (if requested) can be used to map from the new ids back to the old ids.
A data.frame
.
add_subject()
This function decomposes a patient data from self$data
and populates all the corresponding lists i.e. self$is_missing
, self$values
, self$group
, etc. This function is only called upon the objects initialization.
longDataConstructor$add_subject(id)
id
: Character subject id that exists within self$data
.
validate_ids()
Throws an error if any element of ids
is not within the source data self$data
.
longDataConstructor$validate_ids(ids)
ids
: A character vector of ids.
TRUE
sample_ids()
Performs random stratified sampling of patient ids (with replacement) Each patient has an equal weight of being picked within their strata (i.e is not dependent on how many non-missing visits they had).
longDataConstructor$sample_ids()
Character vector of ids.
extract_by_id()
Returns a list of key information for a given subject. Is a convenience wrapper to save having to manually grab each element.
longDataConstructor$extract_by_id(id)
id
: Character subject id that exists within self$data
.
update_strategies()
Convenience function to run self$set_strategies(dat_ice, update=TRUE) kept for legacy reasons.
longDataConstructor$update_strategies(dat_ice)
dat_ice
: A data.frame
containing ICE information see impute()
for the format of this dataframe.
set_strategies()
Updates the self$strategies
, self$is_mar
, self$is_post_ice
variables based upon the provided ICE information.
longDataConstructor$set_strategies(dat_ice = NULL, update = FALSE)
dat_ice
: a data.frame
containing ICE information. See details.
update
: Logical, indicates that the ICE data should be used as an update. See details.
See draws()
for the specification of dat_ice
if update=FALSE
. See impute()
for the format of dat_ice
if update=TRUE
. If update=TRUE
this function ensures that MAR strategies cannot be changed to non-MAR in the presence of post-ICE observations.
check_has_data_at_each_visit()
Ensures that all visits have at least 1 observed "MAR" observation. Throws an error if this criteria is not met. This is to ensure that the initial MMRM can be resolved.
longDataConstructor$check_has_data_at_each_visit()
set_strata()
Populates the self$strata
variable. If the user has specified stratification variables The first visit is used to determine the value of those variables. If no stratification variables have been specified then everyone is defined as being in strata 1.
longDataConstructor$set_strata()
new()
Constructor function.
longDataConstructor$new(data, vars)
data
: longitudinal dataset.
vars
: an ivars
object created by set_vars()
.
clone()
The objects of this class are cloneable with this method.
longDataConstructor$clone(deep = FALSE)
deep
: Whether to make a deep clone.
Useful links