Matching model and selection correction for group formation
Matching model and selection correction for group formation
The function provides a Gibbs sampler for a structural matching model that estimates preferences and corrects for sample selection bias when the selection process is a one-sided matching game; that is, group/coalition formation.
The input is individual-level data of all group members from one-sided matching marktes; that is, from group/coalition formation games.
In a first step, the function generates a model matrix with characteristics of all feasible
groups of the same size as the observed groups in the market.
For example, in the stable roommates problem with n=4 students 1,2,3,4
sorting into groups of 2, we have choose(4,2)=6 feasible groups: (1,2)(3,4) (1,3)(2,4) (1,4)(2,3).
In the group formation problem with n=6 students 1,2,3,4,5,6
sorting into groups of 3, we have choose(6,3)=20 feasible groups. For the same students sorting into groups of sizes 2 and 4, we have choose(6,2)+choose(6,4)=30 feasible groups.
The structural model consists of a selection and an outcome equation. The Selection Equation
determines which matches are observed (D=1) and which are not (D=0). [REMOVE_ME]DV==1[V∈Γ]Wα+ηD=1[VinΓ]withV=Wα+η[REMOVEME2]
Here, V is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ
that satisfy the equilibrium condition (see Klein, 2015a). This condition differs for matching games with transferable and non-transferable utility and can be specified using the method
argument. The match valuation V is a linear function of W, a matrix of characteristics for all feasible groups, and η, a vector of random errors. α is a paramter vector to be estimated.
The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the binary
argument. In the binary case, the dependent variable R is determined by a threshold rule for the latent variable Y. [REMOVE_ME]RY==1[Y>c]Xβ+ϵR=1[Y>c]withY=Xβ+ϵ[REMOVEME2]
Here, Y is a linear function of X, a matrix of characteristics for observed
matches, and ϵ, a vector of random errors. β is a paramter vector to be estimated.
The structural model imposes a linear relationship between the error terms of both equations as ϵ=δη+ξ, where ξ is a vector of random errors and δ
is the covariance paramter to be estimated. If δ were zero, the marginal distributions of ϵ and η would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.
x: data frame with individual-level characteristics of all group members including market- and group-identifiers.
m.id: character string giving the name of the market identifier variable. Defaults to "m.id".
g.id: character string giving the name of the group identifier variable. Defaults to "g.id".
R: dependent variable in outcome equation. Defaults to "R".
selection: list containing variables and pertaining operators in the selection equation. The format is operation = "variable". See the Details and Examples sections.
outcome: list containing variables and pertaining operators in the outcome equation. The format is operation = "variable". See the Details and Examples sections.
simulation: should the values of dependent variables in selection and outcome equations be simulated? Options are "none" for no simulation, "NTU" for non-transferable utility matching, "TU" for transferable utility or "random" for random matching of individuals to groups. Simulation settings are (i) all model coefficients set to alpha=beta=1; (ii) covariance between error terms delta=0.5; (iii) error terms eta and xi are draws from a standard normal distribution.
seed: integer setting the state for random number generation if simulation=TRUE.
max.combs: integer (divisible by two) giving the maximum number of feasible groups to be used for generating group-level characteristics.
method: estimation method to be used. Either "NTU" or "TU" for selection correction using non-transferable or transferable utility matching as selection rule; "outcome" for estimation of the outcome equation only; or "model.frame" for no estimation.
binary: logical: if TRUE outcome variable is taken to be binary; if FALSE outcome variable is taken to be continuous.
offsetOut: vector of integers indicating the indices of columns in X for which coefficients should be forced to 1. Use 0 for none.
offsetSel: vector of integers indicating the indices of columns in W for which coefficients should be forced to 1. Use 0 for none.
marketFE: logical: if TRUE market-level fixed effects are used in outcome equation; if FALSE no market fixed effects are used.
censored: draws of the delta parameter that estimates the covariation between the error terms in selection and outcome equation are 0:not censored, 1:censored from below, 2:censored from above.
gPrior: logical: if TRUE the g-prior (Zellner, 1986) is used for the variance-covariance matrix.
dropOnes: logical: if TRUE one-group-markets are exluded from estimation.
interOut: two-colum matrix indicating the indices of columns in X that should be interacted in estimation. Use 0 for none.
interSel: two-colum matrix indicating the indices of columns in W that should be interacted in estimation. Use 0 for none.
standardize: numeric: if standardize>0 the independent variables will be standardized by dividing by standardize times their standard deviation. Defaults to no standardization standardize=0.
niter: number of iterations to use for the Gibbs sampler.
verbose: logical. When set to TRUE, writes information messages on the console (recommended). Defaults to FALSE, which suppresses such messages.
Returns
stabit returns for method = "model.frame", a list of data from a NTU or TU matching market with the following elements. - OUT: Model matrix of the outcome data, where m.id and g.id are categorical variables for market and group identifier.
SEL: Model matrix of the selection data, again with categorical variables m.id and g.id for market and group identifier.
combs: List of length of the number of markets with each element containing a matrix of all counterfactual group constellations in a market.
For any other setting of method, a list of the estimation results is returned.
Description
The function provides a Gibbs sampler for a structural matching model that estimates preferences and corrects for sample selection bias when the selection process is a one-sided matching game; that is, group/coalition formation.
The input is individual-level data of all group members from one-sided matching marktes; that is, from group/coalition formation games.
In a first step, the function generates a model matrix with characteristics of all feasible
groups of the same size as the observed groups in the market.
For example, in the stable roommates problem with n=4 students 1,2,3,4
sorting into groups of 2, we have choose(4,2)=6 feasible groups: (1,2)(3,4) (1,3)(2,4) (1,4)(2,3).
In the group formation problem with n=6 students 1,2,3,4,5,6
sorting into groups of 3, we have choose(6,3)=20 feasible groups. For the same students sorting into groups of sizes 2 and 4, we have choose(6,2)+choose(6,4)=30 feasible groups.
The structural model consists of a selection and an outcome equation. The Selection Equation
determines which matches are observed (D=1) and which are not (D=0).
DV==1[V∈Γ]Wα+ηD=1[VinΓ]withV=Wα+η
Here, V is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ
that satisfy the equilibrium condition (see Klein, 2015a). This condition differs for matching games with transferable and non-transferable utility and can be specified using the method
argument. The match valuation V is a linear function of W, a matrix of characteristics for all feasible groups, and η, a vector of random errors. α is a paramter vector to be estimated.
The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the binary
argument. In the binary case, the dependent variable R is determined by a threshold rule for the latent variable Y.
RY==1[Y>c]Xβ+ϵR=1[Y>c]withY=Xβ+ϵ
Here, Y is a linear function of X, a matrix of characteristics for observed
matches, and ϵ, a vector of random errors. β is a paramter vector to be estimated.
The structural model imposes a linear relationship between the error terms of both equations as ϵ=δη+ξ, where ξ is a vector of random errors and δ
is the covariance paramter to be estimated. If δ were zero, the marginal distributions of ϵ and η would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.
Details
Operators for variable transformations in selection and outcome arguments.
add: sum over all group members and divide by group size.
int: sum over all possible two-way interactions x∗y of group members and divide by the number of those, given by choose(n,2).
ieq: sum over all possible two-way equality assertions 1[x=y] and divide by the number of those.
ive: sum over all possible two-way interactions of vectors of variables of group members and divide by number of those.
inv: ...
dst: sum over all possible two-way distances between players and divide by number of those, where distance is defined as exp(−∣x−y∣).
Examples
## --- SIMULATED EXAMPLE ---## 1. Simulate one-sided matching data for 200 markets (m=200) with 2 groups## per market (gpm=2) and 5 individuals per group (ind=5). True parameters ## in selection equation is wst=1, in outcome equation wst=0. ## 1-a. Simulate individual-level, independent variables idata <- stabsim(m=200, ind=5, seed=123, gpm=2) head(idata)## 1-b. Simulate group-level variables mdata <- stabit(x=idata, simulation="NTU", method="model.frame", selection = list(add="wst"), outcome = list(add="wst"), verbose=FALSE) head(mdata$OUT) head(mdata$SEL)## 2. Bias from sorting## 2-a. Naive OLS estimation lm(R ~ wst.add, data=mdata$OUT)$coefficients
## 2-b. epsilon is correlated with independent variables with(mdata$OUT, cor(epsilon, wst.add))## 2-c. but xi is uncorrelated with independent variables with(mdata$OUT, cor(xi, wst.add))## 3. Correction of sorting bias when valuations V are observed## 3-a. 1st stage: obtain fitted value for etalm.sel <- lm(V ~-1+ wst.add, data=mdata$SEL)lm.sel$coefficients
eta <- lm.sel$resid[mdata$SEL$D==1]## 3-b. 2nd stage: control for eta lm(R ~ wst.add + eta, data=mdata$OUT)$coefficients
## 4. Run Gibbs sampler fit1 <- stabit(x=idata, method="NTU", simulation="NTU", censored=1, selection = list(add="wst"), outcome = list(add="wst"), niter=2000, verbose=FALSE)## 5. Coefficient table summary(fit1)## 6. Plot MCMC draws for coefficients plot(fit1)## --- REPLICATION, Klein (2015a) ---## 1. Load data data(baac00)## 2. Run Gibbs sampler klein15a <- stabit(x=baac00, selection = list(inv="pi",ieq="wst"), outcome = list(add="pi",inv="pi",ieq="wst", add=c("loan_size","loan_size2","lngroup_agei")), offsetOut=1, method="NTU", binary=TRUE, gPrior=TRUE, marketFE=TRUE, niter=800000)## 3. Marginal effects summary(klein15a, mfx=TRUE)## 4. Plot MCMC draws for coefficients plot(klein15a)
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions, volume 6, pages 233--243. North-Holland, Amsterdam.