estimate_transmission_flows_and_ci function

estimate_transmission_flows_and_ci Estimates transmission flows and corresponding confidence intervals

estimate_transmission_flows_and_ci Estimates transmission flows and corresponding confidence intervals

This function estimates transmission flows or the relative probability of transmission within and between population groups accounting for variable sampling among population groups.

Corresponding confidence intervals are provided with the following methods: Goodman, Goodman with a continuity correction, Sison-Glaz and Queensbury-Hurst.

estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, ... ) ## Default S3 method: estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, detailed_report = FALSE, verbose_output = FALSE, ... )

Arguments

  • group_in,: A character vector indicating population groups/strata (e.g. communities, age-groups, genders or trial arms) between which transmission flows will be evaluated,

  • individuals_sampled_in,: A numeric vector indicating the number of individuals sampled per population group,

  • individuals_population_in,: A numeric vector of the estimated number of individuals per population group,

  • linkage_counts_in: A data.frame of counts of linked pairs identified between samples of each population group pairing of interest.

    The data.frame should contain the following three fields:

    • H1_group (character) Name of population group 1
    • H2_group (character) Name of population group 2
    • number_linked_pairs_observed (numeric) Number of observed directed transmission pairs between samples from population groups 1 and 2
  • ...: Further arguments.

  • detailed_report,: A boolean value to produce detailed output of the analysis

  • verbose_output,: A boolean value to display intermediate output (Default is FALSE)

Returns

Returns a data.frame containing:

  • H1_group, Name of population group 1
  • H2_group, Name of population group 2
  • number_hosts_sampled_group_1, Number of individuals sampled from population group 1
  • number_hosts_sampled_group_2, Number of individuals sampled from population group 2
  • number_hosts_population_group_1, Estimated number of individuals in population group 1
  • number_hosts_population_group_2, Estimated number of individuals in population group 2
  • max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
  • max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
  • num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
  • p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
  • est_linkedpairs_in_population, Estimated transmission pairs between population groups 1 and 2
  • theta_hat, Estimated transmission flows or relative probability of transmission within and between population groups 1 and 2 adjusted for sampling heterogeneity. More precisely, the conditional probability that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked.
  • obs_trm_pairs_est_goodman, Point estimate, Goodman method Confidence intervals for observed transmission pairs
  • obs_trm_pairs_lwr_ci_goodman, Lower bound of Goodman confidence interval
  • obs_trm_pairs_upr_ci_goodman, Upper bound of Goodman confidence interval
  • est_goodman, Point estimate, Goodman method Confidence intervals for estimated transmission flows
  • lwr_ci_goodman, Lower bound of Goodman confidence interval
  • upr_ci_goodman, Upper bound of Goodman confidence interval

The following additional fields are returned if the detailed_report flag is set

  • prob_group_pairing_and_linked, Probability that a pair of pathogen sequences is from a specific population group pairing and is linked
  • c_hat, Probability that a randomly selected pathogen sequence in one population group links to at least one pathogen sequence in another population group i.e. probability of clustering
  • est_goodman_cc, Point estimate, Goodman method Confidence intervals with continuity correction
  • lwr_ci_goodman_cc, Lower bound of Goodman confidence interval
  • upr_ci_goodman_cc, Upper bound of Goodman confidence interval
  • est_sisonglaz, Point estimate, Sison-Glaz method Confidence intervals
  • lwr_ci_sisonglaz, Lower bound of Sison-Glaz confidence interval
  • upr_ci_sisonglaz, Upper bound of Sison-Glaz confidence interval
  • est_qhurst_acswr, Point estimate, Queensbury-Hurst method Confidence intervals via ACSWR r package
  • lwr_ci_qhurst_acswr, Lower bound of Queensbury-Hurst confidence interval
  • upr_ci_qhurst_acswr, Upper bound of Queensbury-Hurst confidence interval
  • est_qhurst_coinmind, Point estimate, Queensbury-Hurst method Confidence intervals via CoinMinD r package
  • lwr_ci_qhurst_coinmind, Lower bound of Queensbury-Hurst confidence interval
  • upr_ci_qhurst_coinmind, Upper bound of Queensbury-Hurst confidence interval
  • lwr_ci_qhurst_adj_coinmind, Lower bound of Queensbury-Hurst confidence interval adjusted
  • upr_ci_qhurst_adj_coinmind, Upper bound of Queensbury-Hurst confidence interval adjusted

Details

Counts of observed directed transmission pairs can be obtained from deep-sequence phylogenetic data (via phyloscanner) or from known epidemiological contacts. Note: Deep-sequence data is also commonly referred to as high-throughput or next-generation sequence data. See references to learn more about phyloscanner.

The estimate_transmission_flows_and_ci() function is a wrapper function that calls the following functions:

  1. The prep_p_hat() function to determine all possible combinations of the population groups/strata provided by the user. Type ?prep_p_hat() at R prompt to learn more.
  2. The estimate_p_hat() function to compute the probability of linkage between pathogen sequences from two individuals randomly sampled from their respective population groups. Type ?estimate_p_hat() at R prompt to learn more.
  3. The estimate_theta_hat() function that uses p_hat estimates to compute the conditional probability of linkage that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked. The conditional probability, theta_hat represents transmission flows or the relative probability of transmission within and between population groups adjusted for variable sampling among population groups. Type ?estimate_theta_hat() at R prompt to learn more.
  4. The estimate_multinom_ci() function to estimate corresponding confidence intervals for the computed transmission flows.

Further to estimating transmission flows and corresponding confidence intervals the estimate_transmission_flows_and_ci() function provides estimates for:

  1. prob_group_pairing_and_linked, the joint probability that a pair of pathogen sequences is from a specific population group pairing and linked. Type ?estimate_prob_group_pairing_and_linked()

    at R prompt to learn more.

  2. c_hat, the probability of clustering that a pathogen sequence from a population group of interest is linked to one or more pathogen sequences in another population group of interest. Type ?estimate_c_hat() at R prompt to learn more.

Methods (by class)

  • default: Estimates transmission flows and accompanying confidence intervals

Examples

library(bumblebee) library(dplyr) # Estimate transmission flows and confidence intervals # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency # Estimate transmission flows within and between intervention and control communities # accounting for variable sampling among population groups. # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs) # View results results_estimate_transmission_flows_and_ci # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci$flows_dataset # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci_detailed$flows_dataset # Options: # To show intermediate output set verbose_output = TRUE # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed

References

  1. Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
  2. Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
  3. Cherry, S., A Comparison of Confidence Interval Methods for Habitat Use-Availability Studies. The Journal of Wildlife Management, 1996. 60(3): p. 653-658.
  4. Ratmann, O., et al., Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nature Communications, 2019. 10(1): p. 1411.
  5. Wymant, C., et al., PHYLOSCANNER: Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Molecular Biology and Evolution, 2017. 35(3): p. 719-733.
  6. Goodman, L. A. On Simultaneous Confidence Intervals for Multinomial Proportions Technometrics, 1965. 7, 247-254.
  7. Sison, C.P and Glaz, J. Simultaneous confidence intervals and sample size determination for multinomial proportions. Journal of the American Statistical Association, 1995. 90:366-369.
  8. Glaz, J., Sison, C.P. Simultaneous confidence intervals for multinomial proportions. Journal of Statistical Planning and Inference, 1999. 82:251-262.
  9. May, W.L., Johnson, W.D. Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells. Journal of Statistical Software, 2000. 5(6). Paper and code available at https://www.jstatsoft.org/v05/i06.

See Also

estimate_theta_hat and estimate_multinom_ci to learn more about estimation of transmission flows and confidence intervals.

  • Maintainer: Lerato E Magosi
  • License: MIT + file LICENSE
  • Last published: 2021-05-11