Prediction and Interpretation in Decision Trees for Classification and Regression
The basic undersampling loop for classification
Conditional inference tree (ctree) based on all observations
Regression tree resampling by the PrInDT method
Two-stage estimation for classification
Two-stage estimation for classification-regression mixtures
Nested PrInDT with additional undersampling of a factor with two unb...
Optimisation of undersampling percentages for classification
Posterior analysis of conditional inference trees: distribution of a s...
Conditional inference trees (ctrees) based on consecutive parts of the...
Structured subsampling for classification
Multiple label classification based on resampling by PrInDT
Multiple label classification based on all observations
PrInDT analysis for a classification problem with multiple classes.
Conditional inference tree (ctree) for multiple classes on all observa...
Regression tree based on all observations
Structured subsampling for regression
Two-stage estimation for regression
Repeated PrInDT for specified percentage combinations
Interdependent estimation for classification
Interdependent estimation for classification-regression mixtures
Interdependent estimation for regression
Optimization of conditional inference trees from the package 'party' for classification and regression. For optimization, the model space is searched for the best tree on the full sample by means of repeated subsampling. Restrictions are allowed so that only trees are accepted which do not include pre-specified uninterpretable split results (cf. Weihs & Buschfeld, 2021a). The function PrInDT() represents the basic resampling loop for 2-class classification (cf. Weihs & Buschfeld, 2021a). The function RePrInDT() (repeated PrInDT()) allows for repeated applications of PrInDT() for different percentages of the observations of the large and the small classes (cf. Weihs & Buschfeld, 2021c). The function NesPrInDT() (nested PrInDT()) allows for an extra layer of subsampling for a specific factor variable (cf. Weihs & Buschfeld, 2021b). The functions PrInDTMulev() and PrInDTMulab() deal with multilevel and multilabel classification. In addition to these PrInDT() variants for classification, the function PrInDTreg() has been developed for regression problems. Finally, the function PostPrInDT() allows for a posterior analysis of the distribution of a specified variable in the terminal nodes of a given tree. In version 2, additionally structured sampling is implemented in functions PrInDTCstruc() and PrInDTRstruc(). In these functions, repeated measurements data can be analyzed, too. Moreover, multilabel 2-stage versions of classification and regression trees are implemented in functions C2SPrInDT() and R2SPrInDT() as well as interdependent multilabel models in functions SimCPrInDT() and SimRPrInDT(). Finally, for mixtures of classification and regression models functions Mix2SPrInDT() and SimMixPrInDT() are implemented. Most of these extensions of PrInDT are described in Buschfeld & Weihs (2025Fc). References: -- Buschfeld, S., Weihs, C. (2025Fc) "Optimizing decision trees for the analysis of World Englishes and sociolinguistic data", Cambridge Elements. -- Weihs, C., Buschfeld, S. (2021a) "Combining Prediction and Interpretation in Decision Trees (PrInDT) - a Linguistic Example" <doi:10.48550/arXiv.2103.02336>; -- Weihs, C., Buschfeld, S. (2021b) "NesPrInDT: Nested undersampling in PrInDT" <doi:10.48550/arXiv.2103.14931>; -- Weihs, C., Buschfeld, S. (2021c) "Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles" <doi:10.48550/arXiv.2108.05129>.