createRemDataset function

Create REM data set with dynamic risk sets

Create REM data set with dynamic risk sets

The function creates counting process data sets with dynamic risk sets for relational event models. For each event in the event sequence, null-events are generated and represent possible events that could have happened at that time but did not. A data set with true and null-events is returned with an event dummy for whether the event occurred or was simply possible (variable eventdummy). The returned data set also includes a variable eventTime which represents the true time of the reported event.

createRemDataset(data, sender, target, eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE)

Arguments

  • data: A data frame containing all the events.
  • sender: A string (or factor or numeric) variable that represents the sender of the event.
  • target: A string (or factor or numeric) variable that represents the target of the event.
  • eventSequence: Numeric variable that represents the event sequence. The variable has to be sorted in ascending order.
  • eventAttribute: An optional variable that represents an attribute to an event. Repeated events affect the construction of the counting process data set. Use the eventAttribute-variable to specify the uniqueness of an event. If eventAttribute = NULL, events are defines as sender-target nodes only.
  • time: An optional date variable that represents the date an event took place. The variable is used if startDate or endDate are specified. timeformat should be used to specify which format the date variable is in, in case it was not yet converted to a Date-variable.
  • start: An optional numeric variable that indicates at which point in the event sequence a specific event was at risk. The variable has to be numerical and correspond to the variable eventSequence. If this option is used, each event in the event data set will be considered at risk from the specified value onwards. If it is not specified, start is defined as the first value in the event sequence. In case of repeated events, the start-value for each duplicated event is one event-unit after the last such event.
  • startDate: An optional date variable that represents the date an event started being at risk. timeformat should be used to specify which format the date variable is in, incase it was not yet converted to a Date-variable.
  • end: An optional numeric variable that indicates at which point in the event sequence a specific event stopped being at risk. The variable has to be numerical and correspond to the variable eventSequence. If this option is used, each event in the event data set will be considered at risk until the specified value.
  • endDate: An optional date variable that represents the date an event stoped being at risk. timeformat should be used to specify which format the date variable is in, incase it was not yet converted to a Date-variable.
  • timeformat: A character string indicating the format of the datevar. see as.Date
  • atEventTimesOnly: TRUE/FALSE. Boolean option for continuous event sequences. If atEventTimesOnly = TRUE, null-events are only created at times, when an event occurred. If atEventTimesOnly = FALSE, null-events are created on each event-unit from min(eventSequence):max(eventSequence). For instance: Given an event sequence with three events at c(1, 4, 6): If atEventTimesOnly = TRUE null events are created for events 1, 4 and 6. If atEventTimesOnly = FALSE null-events are also created for days 2, 3 and 5.
  • untilEventOccurrs: TRUE/FALSE. Boolean option to define whether null events should be an option even after an event takes place. If untilEventOccurrs = TRUE a conditional logisitc logic is applied in that events are only at risk as long as they have not taken place yet. If untilEventOccurrs = FALSE events continue to be at risk after they have occurred. Note that untilEventOccurrs = TRUE overwrites the end-Variable, if specified.
  • includeAllPossibleEvents: TRUE/FALSE. Boolean option to allow a more dynamic and specified creation of the risk set. If includeAllPossibleEvents = TRUE, a data set has to be provided to possibleEvents.
  • possibleEvents: An optional data set with the form: column 1 = sender, column 2 = target, 3 = start, 4 = end, 5 = event attribute, 6... . The data set provides all possible events for the entire event sequence and gives each possible event a start and end value to determine when each event could have been possible. This is useful if the risk set follows a complex pattern that cannot be resolved with the above options. E.g., providing a startDate-variable and setting atEventTimesOnly == FALSE will result in an error since in a continuous time setting the start variable will be matched to the closest date, rather than to the exact value of said date in the event sequence. Manually coding the possible events is neccessary.
  • returnInputData: TRUE/FALSE. Boolean option to check the original data set (handed over in data) against the created start and stop variables. If returnInputData = TRUE, a list of two data sets is returned. The first data set is the counting process data set with null-events, the second the modified data.

Details

To follow.

Author(s)

Laurence Brandenberger laurence.brandenberger@eawag.ch

See Also

rem-package

Examples

## Example 1: standard conditional logistic set-up dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 2: add 2 attributes to the event-classification dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), pro.con = c('pro', 'pro', 'con', 'pro', 'con', 'pro', 'pro'), attack = c('yes', 'no', 'no', 'yes', 'yes', 'no', 'yes'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = paste0(dt$pro.con, dt$attack), time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 3: adding start and end variables # Note: the start and end variables will be overwritten # if there are duplicate events. If you want to # keep the strict start and stop values that you set, use # includeAllPossibleEvents = TRUE and specify a # possibleEvents-data set. # Note 2: if untilEventOccurrs = TRUE and an end # variable is provided, this end variable is # overwritten. Set untilEventOccurrs 0 FALSE and # provide the end variable if you want the events # possibilities to stop at these exact event times. dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), start = c(0, 0, 1, 1, 1, 3, 3), end = rep(6, 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = dt$start, startDate = NULL, end = dt$end, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 4: using start (and stop) dates dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), date = c('01.02.1971', rep('02.02.1971', 2), rep('03.02.1971', 2), '04.02.1971', '06.02.1971'), dateAtRisk = c(rep('21.01.1971', 2), rep('01.02.1971', 5)), dateRiskEnds = rep('01.03.1971', 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = dt$date, start = NULL, startDate = dt$dateAtRisk, end = NULL, endDate = NULL, timeformat = '%d.%m.%Y', atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) # if you want to include null-events at times when no event happened, # either see Example 5 or create a start-variable by yourself # by using the eventSequence()-command with the option # 'returnDateSequenceData = TRUE' in this package. With the # generated sequence, dates from startDate can be matched # to the event sequence values (using the match()-command). ## Example 5: using start and stop dates and including # possible events whenever no event occurred. possible.events <- data.frame( sender = c('a', 'c', 'd', 'f'), target = c('b', 'd', 'd', 'a'), start = c(0, 0, 1, 1), end = c(rep(8, 4))) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # now you can set 'atEventTimesOnly = FALSE' to include # null-events where none occurred until the events happened count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # plus you can set to get the full range of the events # (bounded by max(possible.events$end)) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = FALSE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE)
  • Maintainer: Laurence Brandenberger
  • License: GPL (>= 2)
  • Last published: 2018-10-25

Useful links