Quality Control and Semantic Enrichment of Datasets
Apply quality control measures to a dataset
Assess completeness of a dataset
Assess quality of a dataset
Assume variable classes in data
Kable logical data highlighting
Compare Completeness between Datasets
Information Content Comparison Plot
Information Content Comparison Table
Completeness Heatmap
Compare unique values before and after data modification
Calculate mutual information of a matrix of discrete values
Find highly distant value for data frame
Convert edge table to tidygraph graph
'eHDPrep': Quality Control and Semantic Enrichment of Datasets
Convert data frame to numeric matrix
Encode a categorical vector with binary categories
Encode categorical variables as binary factors
Encode categorical variables using one-hot encoding.
Encode a genotype/SNP vector
Encode genotype/SNP variables in data frame
Encode ordinal variables
Calculate Entropy of a Vector
Exact kernel density estimation
Export data to delimited file
Extract information from free text
Geometric mean
Identify inconsistencies in a dataset
Import data into 'R'
Import corrected variable classes
Calculate Information Content (Continuous Variable)
Calculate Information Content (Discrete Variable)
Join Mapping Table to Ontology Network Graph
Find maximum of vector safely
Find mean of vector safely
Merge columns in data frame
Aggregate Data by Metavariable
Compute Metavariable Information
Extract metavariables' descendant variables
Calculate Mutual Information Content
Find minimum of vector safely
Data modification tracking
Calculate Node Information Content (Zhou et al 2008 method)
Min max normalization
Replace numeric values in numeric columns with NA
One hot encode a vector
Extract labels and levels of ordinal variables in a dataset
Plot Completeness of a Dataset
Find product of vector safely
Track changes to dataset variables
Review Quality Control
Calculate Row Completeness in a Data Frame
Semantic enrichment
Append Skipgram Presence Variables to Dataset
Report Skipgram Frequency
Identify Neighbouring Words (Skipgrams) in a free-text vector
Replace values in non-numeric columns with NA
Sum vector safely for semantic enrichment
Validate internal consistency table
Validate mapping table for semantic enrichment
Validate ontology network for semantic enrichment
Calculate Variable Completeness in a Data Frame
Calculate Entropy of Each Variable in Data Frame
Variable bandwidth Kernel Density Estimation
Missing dots warning
Identify variables with zero entropy
A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.
Useful links