Backfitting in Smoothing Spline ANOVA
Zhen Luo
Department of Statistics, Penn State University
Smoothing spline ANOVA models, which include additive models, can be
very expensive or even infeasible to fit by straightforward
computing methods. For large data sets with tens of thousands of
observations, such methods are unusable because of the limitation of
computer memory.
To fit additive models, Buja, Hastie and Tibshirani (1989) use the
backfitting algorithm, or the Gauss-Seidel algorithm as usually called in
numerical analysis literature, to take advantage of the very efficient
existing univariate smoothing algorithms. In this paper, we
propose a computing scheme for general smoothing spline ANOVA models
that uses the backfitting algorithm to take advantage of the near
tensor-product design structure when such a structure exists. Such a
structure is commonly encountered in spatial-temporal analyses.
Unlike in additive models, pure backfitting can be extremely slow
due to the high correlation between various component functions in an
ANOVA model with interaction components. Several ways to speed up the
backfitting algorithm, such as collapsing the component functions and
successive over-relaxation, are discussed. An interesting point is
that these speeding-up techniques have very similar counterparts in
the Gibbs sampler literature. Considering the close analogy of the
backfitting algorithm and the Gibbs sampler, this may not be
surprising.
The proposed backfitting algorithm is combined with the EM algorithm
to deal with the case in which not every tensor-product design point has
an observation.
An application to a global spatial-temporal analysis of historical
surface air temperature data is used as motivation and example.
Refreshments: 3:30 - 4:00 p.m. Friday, at 327 Yost
Talk: 4:00 - 5:00 p.m. Friday, at 327 Yost.
Back to General Schedule
Questions? jiayang@sun.cwru.edu
Wed Aug 13 13:54:29 EDT 1997