response.var: A character string with the name of the column indicating the response variable in the data set or a vector (either numeric or character string) with information of the response variable for all the units.
phat.var: A character string with the name of the column indicating the estimated probabilities in the data set or a numeric vector containing estimated probabilities for all the units.
weights.var: A character string indicating the name of the column with sampling weights or a numeric vector containing information of the sampling weights. It could be NULL if the sampling design is indicated in the design argument. For unweighted estimates, set all the sampling weight values to 1.
tag.event: A character string indicating the label used to indicate the event of interest in response.var. The default option is tag.event = NULL, which selects the class with the lowest number of units as event.
tag.nonevent: A character string indicating the label used for non-event in response.var. The default option is tag.nonevent = NULL, which selects the class with the greatest number of units as non-event.
data: A data frame which, at least, must incorporate information on the columns response.var, phat.var and weights.var. If data=NULL, then specific numerical vectors must be included in response.var, phat.var and weights.var, or the sampling design should be indicated in the argument design.
design: An object of class survey.design generated by survey::svydesign indicating the complex sampling design of the data. If design = NULL, information on the data set (argument data) and/or sampling weights (argument weights.var) must be included.
Returns
The output object of this function is a list of 4 elements containing the following information:
AUCw: the weighted estimate of the AUC.
tags: a list containing two elements with the following information:
tag.event: a character string indicating the event of interest.
tag.nonevent: a character string indicating the non-event.
basics: a list containing information of the following 4 elements:
n.event: number of units with the event of interest in the data set.
n.nonevent: number of units without the event of interest in the data set.
hatN.event: number of units with the event of interest represented in the population by all the event units in the data set, i.e., the sum of the sampling weights of the units with the event of interest in the data set.
hatN.nonevent: a numeric value indicating the number of non-event units in the population represented by means of the non-event units in the data set, i.e., the sum of the sampling weights of the non-event units in the data set.
call: an object saving the information about the way in which the function has been run.
Details
S indicate a sample of n observations of the vector of random variables (Y,X), and ∀i=1,…,n,yi indicate the ith observation of the response variable Y, and xi the observations of the vector covariates X. Let wi indicate the sampling weight corresponding to the unit i and p^i the estimated probability of event. Let S0 and S1 be subsamples of S, formed by the units without the event of interest (yi=0) and with the event of interest (yi=1), respectively. Then, the AUC is estimated as follows:
Iparragirre, A., Barrio, I. and Arostegui, I. (2023). Estimation of the ROC curve and the area under it with complex survey data. Stat 12 (1), e635. (https://doi.org/10.1002/sta4.635)