Constrained density estimation from censored data with given mean and variance using Laplace P-splines
Constrained density estimation from censored data with given mean and variance using Laplace P-splines
P-spline estimation of the density (pdf), cumulative distribution (cdf), hazard and cumulative hazard functions from interval- or right-censored data under possible marginal mean and/or variance constraints. The penalty parameter τ tuning the smoothness of the log-hazard can be selected using the Laplace P-splines (LPS) method maximizing an approximation to the marginal posterior of τ (also named the 'evidence') or using Schall's method.
obj.data: a list created from potentially right- or interval-censored data using Dens1d. It includes summary statistics, the assumed density support, the knots for the B-spline basis, etc.
is.density: (optional) logical indicating whether the estimated density should integrate to 1.0 over the range of the knots in obj.data$knots (default: TRUE).
Mean0: (optional) constrained value for the mean of the fitted density (defaut: NULL).
Var0: (optional) constrained value for the variance of the fitted density (defaut: NULL).
fixed.penalty: (optional) logical indicating whether the penalty parameter should be selected from the data (fixed.penalty=FALSE) or fixed (fixed.penalty=TRUE) at its initial value τ0.
method: method used for the selection of the penalty parameter: "LPS" (by maximizing the marginal posterior for τ, cf. "Laplace P-splines") or "Schall" (Schall's method).
fixed.phi: (optional) logical indicating whether the spline parameters are fixed (fixed.phi=TRUE) or estimated from the data (default: fixed.phi=FALSE).
phi.ref: (optional) reference value for the spline parameters with respect to which deviations are penalized (default: zero vector).
phi0: starting value for the spline parameters (default: spline parameters corresponding to a Student density with 5 DF).
tau0: (optional) initial value for the penalty parameter τ (default: exp(5)).
tau.min: (optional) minimal value for the penalty parameter τ (default: .1).
verbose: (optional) logical indicating whether estimation step details should be displayed (default: FALSE).
Returns
a densLPS.object containing the density estimation results.
Examples
library(DALSM)## Example 1: estimation of the error density in a DALSM modelrequire(DALSM)data(DALSM_IncomeData)resp = DALSM_IncomeData[,1:2]fit = DALSM(y=resp, formula1 =~twoincomes+s(age)+s(eduyrs), formula2 =~twoincomes+s(age)+s(eduyrs), data = DALSM_IncomeData)plot(fit$derr)## Plot the estimated error densityprint(fit$derr)## ... and provide summary statistics for it## Example 2: density estimation from censored income datadata(DALSM_IncomeData)resp = DALSM_IncomeData[,1:2]head(resp,n=20)temp = Dens1d(y=resp,ymin=0)## Create Dens1d object from positive censored dataobj = densityLPS(temp)## Density estimation from IC & RC dataprint(obj)## Summary information on the estimated densityplot(obj,hist=TRUE)## Visualize the estimated densitylegend("topright",col=c("black","grey"),lwd=c(2,20), legend=c("Fitted density","Pseudo-data"),bty="n")## Example 3: density estimation from simulated RC and IC data## Data generationset.seed(123)n =500## Sample sizex = rgamma(n,10,2)## Exact (unobserved) datawidth = runif(n,1,3)## Width of the IC data (mean width = 2)w = runif(n)## Positioning of the exact data within the intervalxmat = cbind(pmax(0,x-w*width),x+(1-w)*width)## Generated IC datat.cens = rexp(n,1/15)## Right-censoring valuesidx.RC =(1:n)[t.cens<x]## Id's of the right-censored unitsxmat[idx.RC,]= cbind(t.cens[idx.RC],Inf)## Data for RC units: (t.cens,Inf)head(xmat,15)## Density estimation with mean and variance constraintsobj.data = Dens1d(xmat,ymin=0)## Prepare the data for estimationobj = densityLPS(obj.data,Mean0=10/2,Var0=10/4)## Density estimationprint(obj)plot(obj)## Plot the estimated densitycurve(dgamma(x,10,2),## ... and compare it to the true density (in red) add=TRUE,col="red",lwd=2,lty=2)legend("topright",col=c("black","red"),lwd=c(2,2),lty=c(1,2), legend=c("Estimated density","True density"),bty="n")## Same story for the cdfwith(obj, curve(pdist(x),ymin,ymax,lwd=2,xlab="",ylab="F(x)"))curve(pgamma(x,10,2),add=TRUE,col="red",lwd=2,lty=2)legend("right",col=c("black","red"),lwd=c(2,2),lty=c(1,2), legend=c("Estimated cdf","True cdf"),bty="n")
References
Lambert, P. (2021). Fast Bayesian inference using Laplace approximations in nonparametric double additive location-scale models with right- and interval-censored data. Computational Statistics and Data Analysis, 161: 107250. doi:10.1016/j.csda.2021.107250