Clean and Standardize Epidemiological Data
Add an element to the data dictionary
Add an element to the report object
Checks whether the order in a sequence of date events is chronological...
Checks the uniqueness in values of the sample IDs column
Check whether the subject IDs comply with the expected format. When in...
Clean and standardize data
Perform dictionary-based cleaning
cleanepi: Clean and Standardize Epidemiological Data
Build the report for the detected misspelled values during dictionary-...
Convert numeric to date
Convert columns into numeric
Correct misspelled values by using approximate string matching techniq...
Correct the wrong subject IDs based on the user-provided values.
Convert and update date values
Check date time frame
Choose the first non-missing date from a data frame of dates
Convert characters to dates
Detect complex date format
Detect the appropriate abbreviation for day or month value
Detect a date format with only 1 separator
Detect the special character that is the separator in the date values
Get format from a simple Date value
Infer date format from a vector or characters
Split a string based on a pattern and return the first element of the ...
Get part2 of date value
Get part3 of date value
Guess if a character vector contains Date values, and convert them to ...
Try and guess dates from a characters
Extract date from a character vector
Build the auto-detected format
Check whether the number of provided formats matches the number of tar...
Process date variable
Find the dates that lubridate couldn't
Trim dates outside of the defined timeframe
Detect misspelled options in columns to be cleaned
Detect the numeric columns that appears as characters due to the prese...
Make data dictionary for 1 field
Identify and return duplicated rows in a data frame or linelist.
Transform scanning result format into user-chosen format
Set and return clean_data
default parameters
Get the names of the columns from which duplicates will be found
Check order of a sequence of date-events
Make column names unique when duplicated column names are found after ...
Update clean_data
default argument's values with the user-provided v...
Detects whether a string contains only numbers or not.
Remove constant data.
Pipe operator
Print the detected misspelled values
Generate report from data cleaning operations
Remove constant data, including empty rows, empty columns, and columns...
Remove duplicates
Replace missing values with NA
Detect and replace values with NA
from a vector
Get column names
Scan through a data frame and return the proportion of missing
, `num...
Scan through a character column
Standardize column names of a data frame or line list
Standardize date variables
Calculate time span between dates
Flag out what message will be translated using the potools
package
Unnest an element of the data cleaning report
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.
Useful links