A Theory of Model Building and its Application to Visualization

William S. Cleveland

Bell Labs, Murray Hill, NJ

Friday, November 2, at 327 Yost
Freshments: 3:30-4:00 p.m, Talk: 4:00 - 5:00 p.m.

Statistical analysis of data typically consists of two stages: (1) the building of a model for the data (2) formal, mathematical-probabilistic inferences conditional on the model. In stage (1), we make model inferences: specifications of systematic and haphazard variation in the data. Model building is complex because it requires combining information from two sources: (1) external --- information from sources external to the data such as subject matter theory and other sets of data, and (2) data --- information that arises from studying the data. A vast array of methods of analysis contribute in practice to the model building process: chi^2-tests, normal quantile plots, regression residual analysis, etc. Often, the model building phase of a data analysis is the salient part of the analysis and the mathematical-probabilistic phase is routine. But the situation is reversed for the theory of statistics. A vast amount of theory exists for stage (2); in fact, there are not one but two distinct paradigms, Bayesian and frequentist, and each governs the whole course that theory takes and results in one are not directly translatable to the other. By contrast there is almost no theory that governs model specification inference. We propose a theory for the model building phase of data analysis. The theory deals with the combination of external and data information in carrying out model inferences. One key aspect invokes the Savage's principle of stable estimation. The theory provides a mechanism for assessing model building methods; for example, the theory explains why visualization methods are such powerful tools for model building.


Questions? Nidhan Choudhuri