# Generic function for the computation of asymmetric total variation distance of two distributions Generic function for the computation of asymmetric total variation distance $d_v(rho)$ of two distributions $P$ and $Q$ where the distributions may be defined for an arbitrary sample space $(Omega, A)$. For given ratio of inlier and outlier probability $rho$, this distance is defined as [REMOVE_ME]$$ d_v(\rho)(P,Q)=\int (dQ-c\,dP)_+d_v(rho)(P,Q)=\int \max(dQ-c dP,0) [REMOVE_ME_2]$$ for $c$ defined by [REMOVE_ME]$$ \rho \int (dQ-c\,dP)_+ = \int (dQ-c\,dP)_-rho \int \max(dQ-c dP,0) = \int \max(c dP-dQ,0) [REMOVE_ME_2]$$ It coincides with total variation distance for $rho=1$. ## Description Generic function for the computation of asymmetric total variation distance $d_v(rho)$ of two distributions $P$ and $Q$ where the distributions may be defined for an arbitrary sample space $(Omega, A)$. For given ratio of inlier and outlier probability $rho$, this distance is defined as $$ d_v(\rho)(P,Q)=\int (dQ-c\,dP)_+d_v(rho)(P,Q)=\int \max(dQ-c dP,0) $$ for $c$ defined by $$ \rho \int (dQ-c\,dP)_+ = \int (dQ-c\,dP)_-rho \int \max(dQ-c dP,0) = \int \max(c dP-dQ,0) $$ It coincides with total variation distance for $rho=1$. ```r AsymTotalVarDist(e1, e2, ...) ## S4 method for signature 'AbscontDistribution,AbscontDistribution' AsymTotalVarDist(e1,e2, rho = 1, rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000, TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ..., diagnostic = FALSE) ## S4 method for signature 'AbscontDistribution,DiscreteDistribution' AsymTotalVarDist(e1,e2, rho = 1, ...) ## S4 method for signature 'DiscreteDistribution,AbscontDistribution' AsymTotalVarDist(e1,e2, rho = 1, ...) ## S4 method for signature 'DiscreteDistribution,DiscreteDistribution' AsymTotalVarDist(e1,e2, rho = 1, ...) ## S4 method for signature 'numeric,DiscreteDistribution' AsymTotalVarDist(e1, e2, rho = 1, ...) ## S4 method for signature 'DiscreteDistribution,numeric' AsymTotalVarDist(e1, e2, rho = 1, ...) ## S4 method for signature 'numeric,AbscontDistribution' AsymTotalVarDist(e1, e2, rho = 1, asis.smooth.discretize = "discretize", n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e2), up.discr = getUp(e2), h.smooth = getdistrExOption("hSmooth"), rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000, TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ..., diagnostic = FALSE) ## S4 method for signature 'AbscontDistribution,numeric' AsymTotalVarDist(e1, e2, rho = 1, asis.smooth.discretize = "discretize", n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e1), up.discr = getUp(e1), h.smooth = getdistrExOption("hSmooth"), rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000, TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ..., diagnostic = FALSE) ## S4 method for signature 'AcDcLcDistribution,AcDcLcDistribution' AsymTotalVarDist(e1, e2, rho = 1, rel.tol = .Machine$double.eps^0.3, maxiter=1000, Ngrid = 10000, TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ..., diagnostic = FALSE) ``` ## Arguments - `e1`: object of class `"Distribution"` or `"numeric"` - `e2`: object of class `"Distribution"` or `"numeric"` - `asis.smooth.discretize`: possible methods are `"asis"`, `"smooth"` and `"discretize"`. Default is `"discretize"`. - `n.discr`: if `asis.smooth.discretize` is equal to `"discretize"` one has to specify the number of lattice points used to discretize the abs. cont. distribution. - `low.discr`: if `asis.smooth.discretize` is equal to `"discretize"` one has to specify the lower end point of the lattice used to discretize the abs. cont. distribution. - `up.discr`: if `asis.smooth.discretize` is equal to `"discretize"` one has to specify the upper end point of the lattice used to discretize the abs. cont. distribution. - `h.smooth`: if `asis.smooth.discretize` is equal to `"smooth"` -- i.e., the empirical distribution of the provided data should be smoothed -- one has to specify this parameter. - `rho`: ratio of inlier/outlier radius - `rel.tol`: relative tolerance for `distrExIntegrate` and `uniroot` - `maxiter`: parameter for `uniroot` - `Ngrid`: How many grid points are to be evaluated to determine the range of the likelihood ratio? , - `TruncQuantile`: Quantile the quantile based integration bounds (see details) - `IQR.fac`: Factor for the scale based integration bounds (see details) - ``...``: further arguments to be used in particular methods -- (in package `distrEx`: just used for distributions with a.c. parts, where it is used to pass on arguments to `distrExIntegrate`). - `diagnostic`: logical; if `TRUE`, the return value obtains an attribute `"diagnostic"` with diagnostic information on the integration, i.e., a list with entries `method` (`"integrate"` or `"GLIntegrate"`), `call`, `result` (the complete return value of the method), `args` (the args with which the method was called), and `time` (the time to compute the integral). ## Details For distances between absolutely continuous distributions, we use numerical integration; to determine sensible bounds we proceed as follows: by means of `min(getLow(e1,eps=TruncQuantile),getLow(e2,eps=TruncQuantile))`, `max(getUp(e1,eps=TruncQuantile),getUp(e2,eps=TruncQuantile))` we determine quantile based bounds `c(low.0,up.0)`, and by means of `s1 <- max(IQR(e1),IQR(e2));` `m1<- median(e1);` `m2 <- median(e2)` and `low.1 <- min(m1,m2)-s1*IQR.fac`, `up.1 <- max(m1,m2)+s1*IQR.fac` we determine scale based bounds; these are combined by `low <- max(low.0,low.1)`, `up <- max(up.0,up1)`. Again in the absolutely continuous case, to determine the range of the likelihood ratio, we evaluate this ratio on a grid constructed as follows: `x.range <- c(seq(low, up, length=Ngrid/3),q.l(e1)(seq(0,1,length=Ngrid/3)*.999),q.l(e2)(seq(0,1,length=Ngrid/3)*.999))` Finally, for both discrete and absolutely continuous case, we clip this ratio downwards by `1e-10` and upwards by `1e10` In case we want to compute the total variation distance between (empirical) data and an abs. cont. distribution, we can specify the parameter `asis.smooth.discretize` to avoid trivial distances (distance = 1). Using `asis.smooth.discretize = "discretize"`, which is the default, leads to a discretization of the provided abs. cont. distribution and the distance is computed between the provided data and the discretized distribution. Using `asis.smooth.discretize = "smooth"` causes smoothing of the empirical distribution of the provided data. This is, the empirical data is convoluted with the normal distribution `Norm(mean = 0, sd = h.smooth)` which leads to an abs. cont. distribution. Afterwards the distance between the smoothed empirical distribution and the provided abs. cont. distribution is computed. Diagnostics on the involved integrations are available if argument `diagnostic` is `TRUE`. Then there is attribute `diagnostic` attached to the return value, which may be inspected and accessed through `showDiagnostic` and `getDiagnostic`. ## Returns Asymmetric Total variation distance of `e1` and `e2` ## Methods - **e1 = "AbscontDistribution", e2 = "AbscontDistribution":**: total variation distance of two absolutely continuous univariate distributions which is computed using `distrExIntegrate`. - **e1 = "AbscontDistribution", e2 = "DiscreteDistribution":**: total variation distance of absolutely continuous and discrete univariate distributions (are mutually singular; i.e., have distance `=1`). - **e1 = "DiscreteDistribution", e2 = "DiscreteDistribution":**: total variation distance of two discrete univariate distributions which is computed using `support` and `sum`. - **e1 = "DiscreteDistribution", e2 = "AbscontDistribution":**: total variation distance of discrete and absolutely continuous univariate distributions (are mutually singular; i.e., have distance `=1`). - **e1 = "numeric", e2 = "DiscreteDistribution":**: Total variation distance between (empirical) data and a discrete distribution. - **e1 = "DiscreteDistribution", e2 = "numeric":**: Total variation distance between (empirical) data and a discrete distribution. - **e1 = "numeric", e2 = "AbscontDistribution":**: Total variation distance between (empirical) data and an abs. cont. distribution. - **e1 = "AbscontDistribution", e1 = "numeric":**: Total variation distance between (empirical) data and an abs. cont. distribution. - **e1 = "AcDcLcDistribution", e2 = "AcDcLcDistribution":**: Total variation distance of mixed discrete and absolutely continuous univariate distributions. ## References to be filled; Agostinelli, C and Ruckdeschel, P. (2009): A simultaneous inlier and outlier model by asymmetric total variation distance. ## Author(s) Peter Ruckdeschel peter.ruckdeschel@uni-oldenburg.de ## See Also `TotalVarDist-methods`, `ContaminationSize`, `KolmogorovDist`, `HellingerDist`, `Distribution-class` ## Examples ```r AsymTotalVarDist(Norm(), UnivarMixingDistribution(Norm(1,2),Norm(0.5,3), mixCoeff=c(0.2,0.8)), rho=0.3) AsymTotalVarDist(Norm(), Td(10), rho=0.3) AsymTotalVarDist(Norm(mean = 50, sd = sqrt(25)), Binom(size = 100), rho=0.3) # mutually singular AsymTotalVarDist(Pois(10), Binom(size = 20), rho=0.3) x <- rnorm(100) AsymTotalVarDist(Norm(), x, rho=0.3) AsymTotalVarDist(x, Norm(), asis.smooth.discretize = "smooth", rho=0.3) y <- (rbinom(50, size = 20, prob = 0.5)-10)/sqrt(5) AsymTotalVarDist(y, Norm(), rho=0.3) AsymTotalVarDist(y, Norm(), asis.smooth.discretize = "smooth", rho=0.3) AsymTotalVarDist(rbinom(50, size = 20, prob = 0.5), Binom(size = 20, prob = 0.5), rho=0.3) ```

AsymTotalVarDist function