The Quick Tour

Fleischer

1. One Dimensional Projection Pursuit

The pp() function can be called directly to either one or two dimensions; one dimension is the default. With a call to one dimension, the user obtains an array of nine one-dimensional projections of the data set, each projection in a random direction. Each projection is originally viewed as a histogram of the projected density, and choices are available to instead look at the normal QQ plots or a dotchart of the projection coefficients. As an option to the function call, the user may provide a list of projections to display instead of the nine random projections. If the user does not supply a full nine projections, the program will generate the necessary number of remaining random directions.

At this point, each of the nine projections is likely to look uninteresting. The user may then select to optimize the PP Indices along these given directions. The function uses a steepest-ascent approach in its optimization to find a local maximum. The resulting array will be the most interesting views of the original nine directions. If none of the projections offer any interest, the user may choose to randomize the directions again and repeat the process.

Once an interesting projection is uncovered, the user may investigate it further, entering a new display that features only a single projection. Here the user may easily examine and compare the projection's histogram, normal QQ plot, and coefficients, as well as a summary of the projection that includes the first four moments of that direction and a p-value, calculated following Sun (1991), that allows the user to decide if the current projection is indeed interesting enough to be called non-normal.

Other options, such as changing the order, J, of the Legendre polynomials, are present in this local investigation area and will be described later. Of interest is an option that allows the user to set one of the variables of the selected projection to zero in order to assess the importance of its effect on the PP Index. Should the index not drop significantly, the variable may be discarded, if desired, so that it may be easier to interpret the solution. Should the index drop considerably, the user has located an important variable.

If a structure is deemed interesting, the user may remove it before continuing to search for other structure. The structure removal follows Friedman's method that was discussed earlier: the given direction is transformed to normalized data, with orthogonal directions remaining unchanged, so that further attempts to search along that direction will yield a low PP Index. Once the structure is removed, the user may then generate new projections while saving the interesting projections already found.

2. Two-Dimensional Projection Pursuit

As was mentioned before, either the one- or two-dimensional case may be called from the command line. Or, if one has thoroughly searched the one-dimensional projections, he or she may then jump to the two-dimensional case. While being similar to the one-dimensional method, the two-dimensional process proceeds slightly different, as will be discussed below.

As in one dimension, the initial screen contains nine projections of the data, now in two dimensions. The method of display chosen is a scatterplot of the resulting projections. Other graphical displays that may be chosen are a three-dimensional perspective plot, a contour plot, or a dendrogram of the projection. The primary difference between this process and the one-dimensional case is that the nine projections have already been optimized and adjusted for structure removal. The PP Algorithm is dependent on the the first projection it selects (due to Friedman's method); therefore, this package implements Sun's enhancement suggestion of rotating the data after structure has been found. Thus the user may randomize (by a random rotation) the data and repeat the process.

There exist other options, again such as selecting the order, J, of the Legendre polynomial, that will be discussed later. Once the user is satisfied at this stage, a particular projection may be further explored, as in the one-dimensional case. The more in-depth look at the projection allows the user to compare the scatterplot, perspective plot, contour plot, dendrogram, coefficient list for the two sets of coefficients, and a summary that includes measures of the multivariate skewness and kurtosis of the data, as well as a simulated .05 significance cut-off level for a comparable (if not exact) pairing of numbers of observations, n, and number of variables, p, following Friedman's method discussed earlier. The multivariate skewness and kurtosis values are proposed measures of the third and fourth moments of a multivariate data set which can be used to judge the claim of multivariate normality of the data. These issues will be discussed in Section 3. Utilizing these displays and summary statistics, the user may then decide if a given projection is far enough away from the normal to be considered interesting.

3. Additional Comments

Due to additional options that will be discussed in the in-depth look at the package, the user may travel back and forth between the one- and two-dimensional cases without loss of information about structure that was already located, thereby allowing the location of several interesting projections in one complete pass of the function. Once the user is satisfied with both dimensions, then upon exit of the program, the directions in both one and two dimensions are listed and may be stored in S-Plus.

The advantages of this package are its highly graphically-oriented nature and its interactive interface. The various plots that are presented allow the analyst to see whatever structure exists in the data in a clear, orderly, and attractive manner. The interactive nature of this PP Algorithm allows the user to continually select between generating new projections of the data and investigating those thought to be interesting; this control is important to the analyst. At the same time, with the generation of new random directions upon command and the subsequent ability to fairly rapidly optimize them, the analyst does not need to sacrifice speed for control. Interactive Projection Pursuit gives the user the best each method has to offer.