cleanepi1.1.1 package

Clean and Standardize Epidemiological Data

add_to_dictionary

Add an element to the data dictionary

add_to_report

Add an element to the report object

check_date_sequence

Checks whether the order in a sequence of date events is chronological...

check_subject_ids_oness

Checks the uniqueness in values of the sample IDs column

check_subject_ids

Check whether the subject IDs comply with the expected format. When in...

clean_data

Clean and standardize data

clean_using_dictionary

Perform dictionary-based cleaning

cleanepi-package

cleanepi: Clean and Standardize Epidemiological Data

construct_misspelled_report

Build the report for the detected misspelled values during dictionary-...

convert_numeric_to_date

Convert numeric to date

convert_to_numeric

Convert columns into numeric

correct_misspelled_values

Correct misspelled values by using approximate string matching techniq...

correct_subject_ids

Correct the wrong subject IDs based on the user-provided values.

date_check_outsiders

Convert and update date values

date_check_timeframe

Check date time frame

date_choose_first_good

Choose the first non-missing date from a data frame of dates

date_convert

Convert characters to dates

date_detect_complex_format

Detect complex date format

date_detect_day_or_month

Detect the appropriate abbreviation for day or month value

date_detect_format

Detect a date format with only 1 separator

date_detect_separator

Detect the special character that is the separator in the date values

date_detect_simple_format

Get format from a simple Date value

date_get_format

Infer date format from a vector or characters

date_get_part1

Split a string based on a pattern and return the first element of the ...

date_get_part2

Get part2 of date value

date_get_part3

Get part3 of date value

date_guess_convert

Guess if a character vector contains Date values, and convert them to ...

date_guess

Try and guess dates from a characters

date_i_guess_and_convert

Extract date from a character vector

date_make_format

Build the auto-detected format

date_match_format_and_column

Check whether the number of provided formats matches the number of tar...

date_process

Process date variable

date_rescue_lubridate_failures

Find the dates that lubridate couldn't

date_trim_outliers

Trim dates outside of the defined timeframe

detect_misspelled_options

Detect misspelled options in columns to be cleaned

detect_to_numeric_columns

Detect the numeric columns that appears as characters due to the prese...

dictionary_make_metadata

Make data dictionary for 1 field

find_duplicates

Identify and return duplicated rows in a data frame or linelist.

get_appropriate_format

Transform scanning result format into user-chosen format

get_default_params

Set and return clean_data default parameters

get_target_column_names

Get the names of the columns from which duplicates will be found

is_date_sequence_ordered

Check order of a sequence of date-events

make_unique_column_names

Make column names unique when duplicated column names are found after ...

modify_default_params

Update clean_data default argument's values with the user-provided v...

numbers_only

Detects whether a string contains only numbers or not.

perform_remove_constants

Remove constant data.

pipe

Pipe operator

print_misspelled_values

Print the detected misspelled values

print_report

Generate report from data cleaning operations

remove_constants

Remove constant data, including empty rows, empty columns, and columns...

remove_duplicates

Remove duplicates

replace_missing_values

Replace missing values with NA

replace_with_na

Detect and replace values with NA from a vector

retrieve_column_names

Get column names

scan_data

Scan through a data frame and return the proportion of missing, `num...

scan_in_character

Scan through a character column

standardize_column_names

Standardize column names of a data frame or line list

standardize_dates

Standardize date variables

timespan

Calculate time span between dates

tr_

Flag out what message will be translated using the potools package

unnest_report

Unnest an element of the data cleaning report

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

  • Maintainer: Karim ManĂ©
  • License: MIT + file LICENSE
  • Last published: 2025-07-16