optimal_design function

Optimal design for growth reference centile studies

Optimal design for growth reference centile studies

Two functions for estimating optimal sample size and sample composition when constructing growth reference centiles.

optimal_design(z = -2, lambda = NA, N = NA, SEz = NA, age = 10) n_agegp( z = -2, lambda = NA, N = NA, SEz = NA, minage = 0, maxage = 20, n_groups = 20 )

Arguments

  • z: z-score on which to base the design, with default -2 which equates to the 2nd centile. If NA, optimal z is calculated from lambda.
  • lambda: power of age that defines the sample composition. The default NA means calculate optimal lambda from z.
  • N: total sample size per sex. The default NA means calculate from z or lambda, and SEz if provided.
  • SEz: target z-score standard error. The default NA means calculate from z or lambda, and N if provided.
  • age: age at which to calculate SEz. The default 10 returns mean SEz, and if z or lambda are optimal SEz is independent of age.
  • minage: youngest age (default 0).
  • maxage: oldest age (default 20).
  • n_groups: number of age groups (default 20).

Returns

For optimal_design, a tibble with columns: - z: as above.

  • lambda: as above.

  • N: as above.

  • SEz: as above.

  • age: as above.

  • p: the centile corresponding to z.

  • plo: lower 95% confidence interval for p.

  • phi: upper 95% confidence interval for p.

For n_agegp, a tibble giving the numbers of measurements to be collected per equal width age group, with columns: - n_varying: numbers for equal width age groups.

  • age: mean ages for equal width age groups.

  • n: number for each unequal width age group (only for longitudinal studies).

  • age_varying: target ages for unequal width age groups (only for longitudinal studies).

Details

Studies to construct growth reference centiles using GAMLSS need to be of optimal size. Cole (SMMR, 2020) has shown that the sample composition, i.e. the age distribution of the measurements, needs to be optimised as well as the sample size. Sample composition is defined in terms of the age power lambda which determines the degree of infant oversampling.

There are two criteria that determine the optimal sample size and sample composition: the centile of interest (as z-score z) and the required level of precision for that centile (as the z-score standard error SEz).

Examples

## estimate optimal sample composition lambda and precision SEz for 9 centiles ## spaced 2/3 of a z-score apart, based on a sample of 10,000 children optimal_design(z = -4:4*2/3, N = 10000) ## calculate age group sizes optimised for centiles from the 50th to the 99.6th ## (or equivalently from the 50th to the 0.4th) ## with a sample of 10,000 children from 0 to 20 years in one-year groups purrr::map_dfc(0:4*2/3, ~{ n_agegp(z = .x, N = 10000) %>% dplyr::select(!!z2cent(.x) := n_varying) }) %>% dplyr::bind_cols(tibble::tibble(age = paste(0:19, 1:20, sep='-')), .)

See Also

gamlss to fit the centiles with the BCCG, BCT or BCPE family.

Author(s)

Tim Cole tim.cole@ucl.ac.uk