stabit2 function

Matching model and selection correction for college admissions

Matching model and selection correction for college admissions

The function provides a Gibbs sampler for a structural matching model that estimates preferences and corrects for sample selection bias when the selection process is a two-sided matching game; i.e., a matching of students to colleges.

The structural model consists of a selection and an outcome equation. The Selection Equation

determines which matches are observed (D=1D=1) and which are not (D=0D=0). [REMOVE_ME]D=1[VΓ]V=Wβ+ηD=1[VinΓ]withV=Wβ+η[REMOVEME2] \begin{array}{lcl}D &= & 1[V \in \Gamma] \\V &= & W\beta + \eta\end{array}D = 1[V in \Gamma] with V = W\beta + \eta [REMOVE_ME_2]

Here, VV is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.]1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ\Gamma

that satisfy the equilibrium condition (see Sorensen, 2007). The match valuation VV is a linear function of WW, a matrix of characteristics for all feasible matches, and η\eta, a vector of random errors. β\beta is a paramter vector to be estimated.

The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the binary

argument. In the binary case, the dependent variable RR is determined by a threshold rule for the latent variable YY. [REMOVE_ME]R=1[Y>c]Y=Xα+ϵR=1[Y>c]withY=Xα+ϵ[REMOVEME2] \begin{array}{lcl}R &= & 1[Y > c] \\Y &= & X\alpha + \epsilon\end{array}R = 1[Y > c] with Y = X\alpha + \epsilon [REMOVE_ME_2]

Here, YY is a linear function of XX, a matrix of characteristics for observed

matches, and ϵ\epsilon, a vector of random errors. α\alpha is a paramter vector to be estimated.

The structural model imposes a linear relationship between the error terms of both equations as ϵ=κη+ν\epsilon = \kappa\eta + \nu, where ν\nu is a vector of random errors and κ\kappa

is the covariance paramter to be estimated. If κ\kappa were zero, the marginal distributions of ϵ\epsilon and η\eta would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.

stabit2( OUT = NULL, SEL = NULL, colleges = NULL, students = NULL, outcome = NULL, selection, binary = FALSE, niter, gPrior = FALSE, censored = 1, thin = 1, nCores = max(1, detectCores() - 1), verbose = FALSE, ... )

Arguments

  • OUT: data frame with characteristics of all observed matches, including market identifier m.id, college identifier c.id and student identifier s.id.
  • SEL: optional: data frame with characteristics of all observed and unobserved matches, including market identifier m.id, college identifier c.id and student identifier s.id.
  • colleges: character vector of variable names for college characteristics. These variables carry the same value for any college.
  • students: character vector of variable names for student characteristics. These variables carry the same value for any student.
  • outcome: formula for match outcomes.
  • selection: formula for match valuations.
  • binary: logical: if TRUE outcome variable is taken to be binary; if FALSE outcome variable is taken to be continuous.
  • niter: number of iterations to use for the Gibbs sampler.
  • gPrior: logical: if TRUE the g-prior (Zellner, 1986) is used for the variance-covariance matrix. (Not yet implemented)
  • censored: draws of the kappa parameter that estimates the covariation between the error terms in selection and outcome equation are 0:not censored, 1:censored from below, 2:censored from above.
  • thin: integer indicating the level of thinning in the MCMC draws. The default thin=1 saves every draw, thin=2 every second, etc.
  • nCores: number of cores to be used in parallel Gibbs sampling.
  • verbose: logical. When set to TRUE, writes information messages on the console (recommended). Defaults to FALSE, which suppresses such messages.
  • ...: .

Returns

stabit2 returns a list of the estimation results with the following elements. - sigma: numeric scalar: standard deviation fixed to 1.

  • eta: numeric vector: residuals of the selection equation.

  • vcov: List of variance covariance matrices for coefficients alpha and beta of selection and outcome equations.

  • coefficients: numeric vector: coefficients of selection and outcome equations.

  • fitted.values: numeric vector: fitted values for outcome data.

  • residuals: numeric vector: residuals of the outcome equation.

  • df: integer: degrees of freedom.

  • binary: logical: if TRUE outcome variable was taken to be binary; if FALSE outcome variable was taken to be continuous.

  • formula: estimated formula.

  • call: function call.

  • method: One of "Sorensen", "Klein" or "Klein-selection". Method "Sorensen" is used when a single selection equation is passed. It assumes an equal sharing rule for student and college utility. Method "Klein" is used when two selection equations (one for students, one for schools) and one outcome equations are passed. Method "Klein-selection" only models selection and therefore does not require an outcome equations.

  • draws: List of Gibbs sampling draws for alpha and beta coefficients.

  • coefs: Posterior means of the Gibbs sampling draws.

  • variables: List of data used in the estimation.

Description

The function provides a Gibbs sampler for a structural matching model that estimates preferences and corrects for sample selection bias when the selection process is a two-sided matching game; i.e., a matching of students to colleges.

The structural model consists of a selection and an outcome equation. The Selection Equation

determines which matches are observed (D=1D=1) and which are not (D=0D=0).

D=1[VΓ]V=Wβ+ηD=1[VinΓ]withV=Wβ+η \begin{array}{lcl}D &= & 1[V \in \Gamma] \\V &= & W\beta + \eta\end{array}D = 1[V in \Gamma] with V = W\beta + \eta

Here, VV is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.]1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ\Gamma

that satisfy the equilibrium condition (see Sorensen, 2007). The match valuation VV is a linear function of WW, a matrix of characteristics for all feasible matches, and η\eta, a vector of random errors. β\beta is a paramter vector to be estimated.

The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the binary

argument. In the binary case, the dependent variable RR is determined by a threshold rule for the latent variable YY.

R=1[Y>c]Y=Xα+ϵR=1[Y>c]withY=Xα+ϵ \begin{array}{lcl}R &= & 1[Y > c] \\Y &= & X\alpha + \epsilon\end{array}R = 1[Y > c] with Y = X\alpha + \epsilon

Here, YY is a linear function of XX, a matrix of characteristics for observed

matches, and ϵ\epsilon, a vector of random errors. α\alpha is a paramter vector to be estimated.

The structural model imposes a linear relationship between the error terms of both equations as ϵ=κη+ν\epsilon = \kappa\eta + \nu, where ν\nu is a vector of random errors and κ\kappa

is the covariance paramter to be estimated. If κ\kappa were zero, the marginal distributions of ϵ\epsilon and η\eta would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.

Examples

## --- SIMULATED EXAMPLE --- ## 1. Simulate two-sided matching data for 20 markets (m=20) with 100 students ## (nStudents=100) per market and 20 colleges with quotas of 5 students, each ## (nSlots=rep(5,20)). True parameters in selection and outcome equations are ## all equal to 1. xdata <- stabsim2(m=20, nStudents=100, nSlots=rep(5,20), verbose=FALSE, colleges = "c1", students = "s1", outcome = ~ c1:s1 + eta + nu, selection = ~ -1 + c1:s1 + eta ) head(xdata$OUT) ## 2. Correction for sorting bias when match valuations V are observed ## 2-a. Bias from sorting lm1 <- lm(y ~ c1:s1, data=xdata$OUT) summary(lm1) ## 2-b. Cause of the bias with(xdata$OUT, cor(c1*s1, eta)) ## 2-c. Correction for sorting bias lm2a <- lm(V ~ -1 + c1:s1, data=xdata$SEL); summary(lm2a) etahat <- lm2a$residuals[xdata$SEL$D==1] lm2b <- lm(y ~ c1:s1 + etahat, data=xdata$OUT) summary(lm2b) ## 3. Correction for sorting bias when match valuations V are unobserved ## 3-a. Run Gibbs sampler (when SEL is given) fit2 <- stabit2(OUT = xdata$OUT, SEL = xdata$SEL, outcome = y ~ c1:s1, selection = ~ -1 + c1:s1, niter=1000 ) ## 3-b. Alternatively: Run Gibbs sampler (when SEL is not given) fit2 <- stabit2(OUT = xdata$OUT, colleges = "c1", students = "s1", outcome = y ~ c1:s1, selection = ~ -1 + c1:s1, niter=1000 ) ## 4. Implemented methods ## 4-a. Get coefficients fit2 ## 4-b. Coefficient table summary(fit2) ## 4-c. Get marginal effects summary(fit2, mfx=TRUE) ## 4-d. Also try the following functions #coef(fit2) #fitted(fit2) #residuals(fit2) #predict(fit2, newdata=NULL) ## 5. Plot MCMC draws for coefficients plot(fit2)

References

Sorensen, M. (2007). How Smart is Smart Money? A Two-Sided Matching Model of Venture Capital. Journal of Finance, 62 (6): 2725-2762.

Author(s)

Thilo Klein