Helper functions for exportRecordsTyped Validation and Casting
Helper functions for exportRecordsTyped Validation and Casting
This set of functions assists in validating that the content of fields coming from REDCap match the MetaData, allowing for a validation report to provided. The cast helpers allow for transforming the REDCap data into R data types and allowing the user to customize the end product.
data
...: Consumes anything else passed to function. I.e., field_name and coding.
rx: character. The regular expression pattern to check.
field_name: character(1). Name of the field(s)
coding: named character vector. The defined coding from the meta data.
FUN: function. A function that takes a character vector.
checked: character. Values to recognize as checked in a checkbox field.
dec_symbol: character(1). The symbol in the field used to denote a decimal.
n_dec: integerish(1). The number of decimal places permitted by the field validation.
Returns
Validation and casting functions return the objects indicated in the following table:
Function Name
Object Type Returned
isNAOrBlank
logical
valRx
logical
valChoice
logical
valPhone
logical
valSkip
logical
castLabel
factor
castLabelCharacter
character
castCode
factor
castCodeCharacter
character
castRaw
character
castChecked
factor
castCheckedCharacter
character
castCheckLabel
factor
castCheckLabelCharacter
character
castCheckCode
factor
castCheckCodeCharacter
character
castCheckForImport
numeric
castDpNumeric
numeric
castDpCharacter
character
castTimeHHMM
character
castTimeMMSS
character
castLogical
logical
Details
Functions passed to the na, validation, and cast parameter of exportRecordsTyped() all take the form of function(x, coding, field_name). na and validation
functions are expected to return a logical vector of the same length as the column processed. Helper routines are provided here for common cases to construct these functions.
Missing Data Detection
na_values is a helper function to create a list of functions to test for NA based on field type. Useful for bulk override of NA detection for a project. The output can be directly passed to the na
parameter of exportRecordsTyped().
Missing data detection is performed ahead of validation. Data that are found to be missing are excluded from validation reports.
REDCap users may define project-level missing value codes. If such codes are defined, they can be seen in Project Setup > Additional Customizations > Missing Data Codes. They will also be displayed in the project's Codebook. Project-level missing data codes cannot be accessed via the API, meaning redcapAPI is unable to assist in determining if a project has any. The most likely symptom of project-level codes is a high frequency of values failing validation (See vignette("redcapAPI-missing-data-detection")).
Validation Functions
isNAorBlank returns TRUE/FALSE if field is NA or blank. Helper function for constructing na overrides in exportRecordsTyped().
valRx constructs a validation function from a regular expression pattern. The function returns a TRUE/FALSE if the value matches the pattern.
valChoice constructs a validation function from a set of choices defined in the MetaData. The functions returns a TRUE/FALSE if the value matches one of the choices.
valPhone constructs a validation function for (North American) phone numbers. It removes punctuation and spaces prior to validating with the regular expression.
valSkip is a function that supports skipping the validation for a field type. It returns a TRUE value for each record, regardless of its value. Validation skipping has occasional utility when importing certain field types (such as bioportal or sql) where not all of the eventual choices are available in the project yet.
skip_validation is a list of functions that just returns TRUE for all data passed in.
Casting Functions
castLabel constructs a casting function for multiple choice variables. The field will be cast to return the choice label (generally more human readable). castLabelCharacter is an equivalent casting function that returns a character vector instead of a factor.
castCode constructs a casting function for multiple choice variables. Similar to castLabel, but the choice value is returned instead. The values are typically more compact and their meaning may not be obvious. castCodeCharacter is an equivalent casting function that retuns a character vector instead of a factor.
castRaw constructs a casting function that returns the content from REDCap as it was received. It is functionally equivalent to identity. For multiple choice variables, the result will be coerced to numeric, if possible; otherwise, the result is character vector.
castChecked constructs a casting function for checkbox fields. It returns values in the form of Unchecked/Checked. castCheckedCharacter
is an equivalent casting function that returns a character vector instead of a factor.
castCheckLabel and castCheckCode also construct casting functions for checkbox fields. For both, unchecked variables are cast to an empty string (""). Checked variables are cast to the option label and option code, respectively. castCheckLabelCharacter and castCheckCodeCharacter
are equivalent casting functions that returns a character vector instead of a factor.
castCheckForImport is a special case function to allow the user to specify exactly which values are to be considered "Checked". Values that match are returned as 1 and all other values are returned as 0. This is motivated by the special case where the coding on a checkbox includes "0, Option". In the resulting field checkbox___0, a coded value of 0 actually implies the choice was selected. In order to perform an import on such data, it is necessary to cast it using castCheckForImport(c("0")).
castDpNumeric is a casting function for fields that use the number_ndp_comma field type (where n is the number of decimal places). This function will convert the values to numeric values for use in analysis. This is a function that returns the appropriate casting function, thus the appropriate usage when using the defaults is cast = list(number_1dp_comma = castDpNumeric())
(using the parentheses).
castDpCharacter is a casting function to return fields that use number_ndp_comma field types to character strings for import. This is a function that returns the appropriate casting function, thus the appropriate usage when casting for one decimal place is cast = list(number_1dp_comma = castDpCharacter(1)).
castTimeHHMM and castTimeMMSS are casting functions to facilitate importing data. They convert time data into a character format that will pass the API requirements.
castLogical is a casting function that returns a logical vector for common, binary-type responses. It is well suited to changing true/false, yes/no, and checkbox fields into logical vectors, as it returns TRUE if the value is one of c("1", "true", "yes") and returns FALSE otherwise.
Casting Lists
raw_cast overrides all casting if passed as the cast
parameter. It is important the the validation specified matches the chosen cast. For fully raw it should be skip_validation.
default_cast_no_factor is a list of casting functions that matches all of the default casts but with the exception that any fields that would have been cast to factors will instead be cast to characters. It is provided for the user that prefers to work absent factors. The list default_cast_character is equivalent and is provided for those that prefer to describe their casting in terms of what the result is (and not what it is not).
Examples
## Not run:# Make a custom function to give special treatment to a field.# In this function, the field "field_name_to_skip" will # be cast using `castRaw`. All other fields will be cast # using `castCode`customCastCode <-function(x, field_name, coding){if(field_name =="field_name_to_skip"){ castRaw(x, field_name, coding)}else{ castCode(x, field_name, coding)}}## End(Not run)