Simplifying Classification Trees: A Statistical Approach
Carmela Cappelli
Department of Statistics, University of Nepoli, Italy
riginal CART algorithm has been implemented in various forms
including code in S Plus. It is now becoming very popular in a variety of
scientific disciplines including medicine. This talk will consist of first
reviewing the basics of tree based methods and the underlying advantages
and especially disadvantages of these methods. From these disadvantages,
arises the problem of simplifying trees. I will discuss the main
classification tree pruning methods, showing an empirical comparison
between them. Despite the fact that the performance of these methods are
significantly different with respect to the two main indicators accuracy
and tree size, they all use a pruning criterion based on minimizing an
estimate of error rate.
In the framework of the original CART pruning, I will discuss a new
proposal of an alternative impurity measure which allows one to define a
statistical testing procedure in the pruning process based on the
chi-square distribution. Properties of this new approach will be shown in
examples on real data sets. Finally, a similar recent proposal will be
shown for regression trees.
Questions? Jiming Jiang