cleanepi1.0.2 package

Clean and Standardize Epidemiological Data

find_duplicates

Identify and return duplicated rows in a data frame or linelist.

get_sum

Get sum of numbers from a string

get_target_column_names

Get the names of the columns from which duplicates will be found

is_date_sequence_ordered

Check order of a sequence of date-events

make_readcap_dictionary

Convert Redcap data dictionary into {matchmaker} dictionary format

numbers_only

Detects whether a string contains only numbers or not.

print_misspelled_values

Print the detected misspelled values

date_detect_separator

Detect the special character that is the separator in the date values

date_detect_simple_format

Get format from a simple Date value

date_get_format

Detect date format from a date column

date_get_part1

Get part1 of date value

date_get_part2

Get part2 of date value

date_get_part3

Get part3 of date value

date_guess

Try and guess dates from a characters

detect_misspelled_options

Detect misspelled options in columns to be cleaned

detect_to_numeric_columns

Detect the numeric columns that appears as characters due to the prese...

dictionary_make_metadata

Make data dictionary for 1 field

convert_numeric_to_date

Convert numeric to date

add_to_dictionary

Add an element to the data dictionary

add_to_report

Add an element to the report object

date_detect_day_or_month

Detect the appropriate abbreviation for day or month value

check_date_sequence

Check whether the order of the sequence of date-events is valid

check_subject_ids

Check whether the subject IDs comply with the expected format. When in...

check_subject_ids_oness

Checks the uniqueness in values of the sample IDs column

clean_data

Clean and standardize data

clean_using_dictionary

Perform dictionary-based cleaning

cleanepi-package

cleanepi: Clean and Standardize Epidemiological Data

construct_misspelled_report

Build the report for the detected misspelled values during dictionary-...

date_detect_format

Detect a date format with only 1 separator

convert_to_numeric

Convert columns into numeric

correct_subject_ids

Correct the wrong subject IDs based on the user-provided values.

date_check_column_existence

Check if date column exists in the given dataset

date_check_timeframe

Check date time frame

date_choose_first_good

Choose the first non-missing date from a data frame of dates

date_convert

Convert characters to dates

date_convert_and_update

Convert and update the date values

date_detect_complex_format

Detect complex date format

date_guess_convert

Guess if a character vector contains Date values, and convert them to ...

date_i_extract_string

Extract date from a character string

date_i_find_format

Guess date format of a character string

date_make_format

Build the auto-detected format

date_match_format_and_column

Check whether the number of provided formats matches the number of tar...

date_process

Process date variable

date_rescue_lubridate_failures

Find the dates that lubridate couldn't

date_trim_outliers

Trim dates outside of the defined boundaries

default_cleanepi_settings

Set clean_data() default parameters

print_report

Generate report from data cleaning operations

remove_constants

Remove empty rows and columns and constant column

remove_duplicates

Remove duplicates

replace_missing_values

Replace missing values with NA

retrieve_column_names

Get column names

scan_columns

Calculate the percentage of missing and other data type values in a ve...

scan_data

Scan a data frame to determine the percentage of missing, numeric,...

standardize_column_names

Standardize column names of a data frame or linelist

standardize_dates

Standardize date variables

timespan

Calculate time span between dates

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, ensuring seamless integration into existing workflows. Additionally, it generates a comprehensive report detailing the outcomes of each cleaning task.

  • Maintainer: Karim ManĂ©
  • License: MIT + file LICENSE
  • Last published: 2024-06-17