calculate_features function

Computes several features associated with a categorical time series

Computes several features associated with a categorical time series

calculate_features computes several features associated with a categorical time series or between a categorical and a real-valued time series UTF-8

calculate_features(series, n_series = NULL, lag = 1, type = NULL)

Arguments

  • series: An object of type tsibble (see R package tsibble), whose column named Value contains the values of the corresponding CTS. This column must be of class factor and its levels must be determined by the range of the CTS.
  • n_series: A real-valued time series.
  • lag: The considered lag (default is 1).
  • type: String indicating the feature one wishes to compute.

Returns

The corresponding feature.

Details

Assume we have a CTS of length TT with range V={1,2,,r}\mathcal{V}=\{1, 2, \ldots, r\}, Xt={X1,,XT}\overline{X}_t=\{\overline{X}_1,\ldots, \overline{X}_T\}, with p^i\widehat{p}_i

being the natural estimate of the marginal probability of the iith category, and p^ij(l)\widehat{p}_{ij}(l) being the natural estimate of the joint probability for categories ii and jj at lag l, i,j=1,,ri,j=1, \ldots, r. Assume also that we have a real-valued time series of length TT, Zt={Z1,,ZT}\overline{Z}_t=\{\overline{Z}_1,\ldots, \overline{Z}_T\}. The function computes the following quantities depending on the argument type:

  • If type=gini_index, the function computes the estimated gini index, g^=rr1(1i=1rp^i2)\widehat{g}=\frac{r}{r-1}(1-\sum_{i=1}^{r}\widehat{p}_i^2).
  • If type=entropy, the function computes the estimated entropy, e^=1ln(r)i=1rp^ilnp^i\widehat{e}=\frac{-1}{\ln(r)}\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i.
  • If type=chebycheff_dispersion, the function computes the estimated chebycheff dispersion, c^=rr1(1maxip^i)\widehat{c}=\frac{r}{r-1}(1-\max_i\widehat{p}_i).
  • If type=gk_tau, the function computes the estimated Goodman and Kruskal's tau, τ^(l)=i,j=1rp^ij(l)2p^ji=1rp^i21i=1rp^i2\widehat{\tau}(l)=\frac{\sum_{i,j=1}^{r}\frac{\widehat{p}_{ij}(l)^2}{\widehat{p}_j}-\sum_{i=1}^r\widehat{p}_i^2}{1-\sum_{i=1}^r\widehat{p}_i^2}.
  • If type=gk_lambda, the function computes the estimated Goodman and Kruskal's lambda, λ^(l)=j=1rmaxip^ij(l)maxip^i1maxip^i\widehat{\lambda}(l)=\frac{\sum_{j=1}^{r}\max_i\widehat{p}_{ij}(l)-\max_i\widehat{p}_i}{1-\max_i\widehat{p}_i}.
  • If type=uncertainty_coefficient, the function computes the estimated uncertainty coefficient, u^(l)=i,j=1rp^ij(l)ln(p^ij(l)p^ip^j)i=1rp^ilnp^i\widehat{u}(l)=-\frac{\sum_{i, j=1}^{r}\widehat{p}_{ij}(l)\ln\big(\frac{\widehat{p}_{ij}(l)}{\widehat{p}_i\widehat{p}_j}\big)}{\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i}.
  • If type=pearson_measure, the function computes the estimated Pearson measure, X^T2(l)=Ti,j=1r(p^ij(l)p^ip^j)2p^ip^j\widehat{X}_T^2(l)=T\sum_{i,j=1}^{r}\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}.
  • If type=phi2_measure, the function computes the estimated Phi2 measure, Φ^2(l)=X^T2(l)T\widehat{\Phi}^2(l)=\frac{\widehat{X}_T^2(l)}{T}.
  • If type=sakoda_measure, the function computes the estimated Sakoda measure, p^(l)=rΦ^2(l)(r1)(1+Φ^2(l))\widehat{p}^*(l)=\sqrt{\frac{r\widehat{\Phi}^2(l)}{(r-1)(1+\widehat{\Phi}^2(l))}}.
  • If type=cramers_vi, the function computes the estimated Cramer's vi, v^(l)=1r1i,j=1r(p^ij(l)p^ip^j)2p^ip^j\widehat{v}(l)=\sqrt{\frac{1}{r-1}\sum_{i,j=1}^r\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}}.
  • If type=cohens_kappa, the function computes the estimated Cohen's kappa, κ^(l)=j=1r(p^jj(l)p^j2)1i=1rp^i2\widehat{\kappa}(l)=\frac{\sum_{j=1}^{r}(\widehat{p}_{jj}(l)-\widehat{p}_j^2)}{1-\sum_{i=1}^r\widehat{p}_i^2}.
  • If type=total_correlation, the function computes the the estimated sum Ψ^(l)=1r2i,j=1rψ^ij(l)2\widehat{\Psi}(l)=\frac{1}{r^2}\sum_{i,j=1}^{r}\widehat{\psi}_{ij}(l)^2, where ψ^ij(l)\widehat{\psi}_{ij}(l) is the estimated correlation Corr^(Yt,i,Ytl,j)\widehat{Corr}(Y_{t, i}, Y_{t-l, j}), i,j=1,,ri,j=1,\ldots,r, being Yt={Y1,,YT}\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with Yk=(Yk,1,,Yk,r)\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of Xt\overline{X}_t.
  • If type=spectral_envelope, the function computes the estimated spectral envelope.
  • If type=total_mixed_correlation_1, the function computes the estimated total mixed l-correlation given by
Ψ^1(l)=1ri=1rψ^i(l)2, \widehat{\Psi}_1(l)=\frac{1}{r}\sum_{i=1}^{r}\widehat{\psi}_{i}(l)^2,

where ψ^i(l)=Corr^(Yt,i,Ztl)\widehat{\psi}_{i}(l)=\widehat{Corr}(Y_{t,i}, Z_{t-l}), being Yt={Y1,,YT}\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with Yk=(Yk,1,,Yk,r)\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of Xt\overline{X}_t.

  • If type=total_mixed_correlation_2, the function computes the estimated total mixed q-correlation given by
Ψ^2(l)=1ri=1r01ψ^iρ(l)2dρ, \widehat{\Psi}_2(l)=\frac{1}{r}\sum_{i=1}^{r}\int_{0}^{1}\widehat{\psi}^\rho_{i}(l)^2d\rho,

where ψ^iρ(l)=Corr^(Yt,i,I(ZtlqZt(ρ)))\widehat{\psi}_{i}^\rho(l)=\widehat{Corr}\big(Y_{t,i}, I(Z_{t-l}\leq q_{Z_t}(\rho)) \big), being Yt={Y1,,YT}\overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with Yk=(Yk,1,,Yk,r)\overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of Xt\overline{X}_t, ρ(0,1)\rho \in (0, 1) a probability level, I()I(\cdot) the indicator function and qZtq_{Z_t} the quantile function of the corresponding real-valued process.

Examples

sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] uc <- calculate_features(series = sequence_1, type = 'uncertainty_coefficient' ) # Computing the uncertainty coefficient # for the first series in dataset GeneticSequences se <- calculate_features(series = sequence_1, type = 'spectral_envelope' ) # Computing the spectral envelope # for the first series in dataset GeneticSequences

References

Rdpack::insert_ref(key="weiss2008measuring",package="ctsfeatures")

Author(s)

Ángel López-Oriona, José A. Vilar

  • Maintainer: Angel Lopez-Oriona
  • License: GPL-2
  • Last published: 2024-01-29

Useful links