Clean and Standardize Epidemiological Data
Identify and return duplicated rows in a data frame or linelist.
Get sum of numbers from a string
Get the names of the columns from which duplicates will be found
Check order of a sequence of date-events
Convert Redcap data dictionary into {matchmaker} dictionary format
Detects whether a string contains only numbers or not.
Print the detected misspelled values
Detect the special character that is the separator in the date values
Get format from a simple Date value
Detect date format from a date column
Get part1 of date value
Get part2 of date value
Get part3 of date value
Try and guess dates from a characters
Detect misspelled options in columns to be cleaned
Detect the numeric columns that appears as characters due to the prese...
Make data dictionary for 1 field
Convert numeric to date
Add an element to the data dictionary
Add an element to the report object
Detect the appropriate abbreviation for day or month value
Check whether the order of the sequence of date-events is valid
Check whether the subject IDs comply with the expected format. When in...
Checks the uniqueness in values of the sample IDs column
Clean and standardize data
Perform dictionary-based cleaning
cleanepi: Clean and Standardize Epidemiological Data
Build the report for the detected misspelled values during dictionary-...
Detect a date format with only 1 separator
Convert columns into numeric
Correct the wrong subject IDs based on the user-provided values.
Check if date column exists in the given dataset
Check date time frame
Choose the first non-missing date from a data frame of dates
Convert characters to dates
Convert and update the date values
Detect complex date format
Guess if a character vector contains Date values, and convert them to ...
Extract date from a character string
Guess date format of a character string
Build the auto-detected format
Check whether the number of provided formats matches the number of tar...
Process date variable
Find the dates that lubridate couldn't
Trim dates outside of the defined boundaries
Set clean_data()
default parameters
Generate report from data cleaning operations
Remove empty rows and columns and constant column
Remove duplicates
Replace missing values with NA
Get column names
Calculate the percentage of missing and other data type values in a ve...
Scan a data frame to determine the percentage of missing
, numeric
,...
Standardize column names of a data frame or linelist
Standardize date variables
Calculate time span between dates
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, ensuring seamless integration into existing workflows. Additionally, it generates a comprehensive report detailing the outcomes of each cleaning task.
Useful links