Inferring Age-Dependent Disease Topic from Diagnosis Data
imputing missing age if you can't find some of them The function does ...
AgeTopicModels: Inferring Age-Dependent Disease Topic from Diagnosis D...
Disease matrix reformatting for ATM
Mapping the disease code from icd10 to phecode
Mapping individuals to fixed topic loadings.
Title
Pipe operator
Title plot the topic loadings across age.
Title plot topic loadings for LFA.
Title Compute prediction odds ratio for a testing data set using pre-t...
Simulate genetic-disease-topic structure (step 2)
Simulate genetic-disease-topic structure (step 1)
Run ATM on diagnosis data.
Run LFA on diagnosis data.
We propose an age-dependent topic modelling (ATM) model, providing a low-rank representation of longitudinal records of hundreds of distinct diseases in large electronic health record data sets. The model assigns to each individual topic weights for several disease topics; each disease topic reflects a set of diseases that tend to co-occur as a function of age, quantified by age-dependent topic loadings for each disease. The model assumes that for each disease diagnosis, a topic is sampled based on the individual’s topic weights (which sum to 1 across topics, for a given individual), and a disease is sampled based on the individual’s age and the age-dependent topic loadings (which sum to 1 across diseases, for a given topic at a given age). The model generalises the Latent Dirichlet Allocation (LDA) model by allowing topic loadings for each topic to vary with age. References: Jiang (2023) <doi:10.1038/s41588-023-01522-8>.