Simplifying Classification Trees: A Statistical Approach

Carmela Cappelli

Department of Statistics, University of Nepoli, Italy



riginal CART algorithm has been implemented in various forms including code in S Plus. It is now becoming very popular in a variety of scientific disciplines including medicine. This talk will consist of first reviewing the basics of tree based methods and the underlying advantages and especially disadvantages of these methods. From these disadvantages, arises the problem of simplifying trees. I will discuss the main classification tree pruning methods, showing an empirical comparison between them. Despite the fact that the performance of these methods are significantly different with respect to the two main indicators accuracy and tree size, they all use a pruning criterion based on minimizing an estimate of error rate. In the framework of the original CART pruning, I will discuss a new proposal of an alternative impurity measure which allows one to define a statistical testing procedure in the pruning process based on the chi-square distribution. Properties of this new approach will be shown in examples on real data sets. Finally, a similar recent proposal will be shown for regression trees.

Questions? Jiming Jiang