age_group: the name of a column in the data frame that defines the age group categories. Defaults to "age_group"
split_by: the name of a column in the data frame that defines the the bivariate column. Defaults to "sex". See NOTE
stack_by: the name of the column in the data frame to use for shading the bars. Defaults to NULL which will shade the bars by the split_by
variable.
count: for pre-computed data the name of the column in the data frame for the values of the bars. If this represents proportions, the values should be within [0, 1].
proportional: If TRUE, bars will represent proportions of cases out of the entire population. Otherwise (FALSE, default), bars represent case counts
na.rm: If TRUE, this removes NA counts from the age groups. Defaults to TRUE.
show_midpoint: When TRUE (default), a dashed vertical line will be added to each of the age bars showing the halfway point for the un-stratified age group. When FALSE, no halfway point is marked.
vertical_lines: If you would like to add dashed vertical lines to help visual interpretation of numbers. Default is to not show (FALSE), to turn on write TRUE.
horizontal_lines: If TRUE (default), horizontal dashed lines will appear behind the bars of the pyramid
pyramid: if TRUE, then binary split_by variables will result in a population pyramid (non-binary variables cannot form a pyramid). If FALSE, a pyramid will not form.
pal: a color palette function or vector of colors to be passed to ggplot2::scale_fill_manual() defaults to the first "qual" palette from ggplot2::scale_fill_brewer().
Note
If the split_by variable is bivariate (e.g. an indicator for a specific symptom), then the result will show up as a pyramid, otherwise, it will be presented as a facetted barplot with with empty bars in the background indicating the range of the un-facetted data set. Values of split_by will show up as labels at top of each facet.
Examples
library(ggplot2)old <- theme_set(theme_classic(base_size =18))# with pre-computed data ----------------------------------------------------# 2018/2008 US census data by age and genderdata(us_2018)data(us_2008)age_pyramid(us_2018, age_group = age, split_by = gender, count = count)age_pyramid(us_2008, age_group = age, split_by = gender, count = count)# 2018 US census data by age, gender, and insurance statusdata(us_ins_2018)age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = count
)us_ins_2018$prop <- us_ins_2018$percent/100age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = prop, proportion =TRUE)# from linelist data --------------------------------------------------------set.seed(2018-01-15)ages <- cut(sample(80,150, replace =TRUE), breaks = c(0,5,10,30,90), right =FALSE)sex <- sample(c("Female","Male"),150, replace =TRUE)gender <- sex
gender[sample(5)]<-"NB"ill <- sample(c("case","non-case"),150, replace =TRUE)dat <- data.frame( AGE = ages, sex = factor(sex, c("Male","Female")), gender = factor(gender, c("Male","NB","Female")), ill = ill, stringsAsFactors =FALSE)# Create the age pyramid, stratifying by sexprint(ap <- age_pyramid(dat, age_group = AGE))# Create the age pyramid, stratifying by gender, which can include non-binaryprint(apg <- age_pyramid(dat, age_group = AGE, split_by = gender))# Remove NA categories with na.rm = TRUEdat2 <- dat
dat2[1,1]<-NAdat2[2,2]<-NAdat2[3,3]<-NAprint(ap <- age_pyramid(dat2, age_group = AGE))print(ap <- age_pyramid(dat2, age_group = AGE, na.rm =TRUE))# Stratify by case definition and customize with ggplot2ap <- age_pyramid(dat, age_group = AGE, split_by = ill)+ theme_bw(base_size =16)+ labs(title ="Age groups by case definition")print(ap)# Stratify by multiple factorsap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, vertical_lines =TRUE)+ labs(title ="Age groups by case definition and sex")print(ap)# Display proportionsap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, proportional =TRUE, vertical_lines =TRUE)+ labs(title ="Age groups by case definition and sex")print(ap)# empty group levels will still be displayeddat3 <- dat2
dat3[dat$AGE =="[0,5)","sex"]<-NAage_pyramid(dat3, age_group = AGE)theme_set(old)