Backfitting in Smoothing Spline ANOVA

Zhen Luo

Department of Statistics, Penn State University

Smoothing spline ANOVA models, which include additive models, can be very expensive or even infeasible to fit by straightforward computing methods. For large data sets with tens of thousands of observations, such methods are unusable because of the limitation of computer memory. To fit additive models, Buja, Hastie and Tibshirani (1989) use the backfitting algorithm, or the Gauss-Seidel algorithm as usually called in numerical analysis literature, to take advantage of the very efficient existing univariate smoothing algorithms. In this paper, we propose a computing scheme for general smoothing spline ANOVA models that uses the backfitting algorithm to take advantage of the near tensor-product design structure when such a structure exists. Such a structure is commonly encountered in spatial-temporal analyses.

Unlike in additive models, pure backfitting can be extremely slow due to the high correlation between various component functions in an ANOVA model with interaction components. Several ways to speed up the backfitting algorithm, such as collapsing the component functions and successive over-relaxation, are discussed. An interesting point is that these speeding-up techniques have very similar counterparts in the Gibbs sampler literature. Considering the close analogy of the backfitting algorithm and the Gibbs sampler, this may not be surprising.

The proposed backfitting algorithm is combined with the EM algorithm to deal with the case in which not every tensor-product design point has an observation.

An application to a global spatial-temporal analysis of historical surface air temperature data is used as motivation and example.


Refreshments: 3:30 - 4:00 p.m. Friday, at 327 Yost
Talk: 4:00 - 5:00 p.m. Friday, at 327 Yost.

Back to General Schedule

Questions? jiayang@sun.cwru.edu
Wed Aug 13 13:54:29 EDT 1997