In some situations, it can be useful to apply the principal components (PCs) from a particular PCA to a dataset that is different from the one that may have been used to compute the PCA.
For instance, between-group PCA is used to enhance the chances of finding differences between groups if methods such as canonical variates cannot be used (e.g. with small sample size) or if there are directions of biological interest, such as between-species or between-growth-stage PCs (Klingenberg and Spence 1993; Boulesteix 2005; Mitteroecker and Bookstein 2011). For this type of analysis, a PCA is run on the covariance matrix of the group averages (this may be a dataset with just 3 or 4 observations, corresponding to the number of groups in the analysis), and the resulting PC coefficients are then used with the dataset of the individual observations to plot the scatter of specimens.
A different context is that of species scores in "phylogenetic PCA" (Revell 2009), where the PCs of evolutionary divergence, computed from a covariance matrix obtained by phylogenetic comparative methods such as independent contrasts, may be used to display the divergence among taxa.
Some caution is necessary when interpreting the results of these analyses. The optimal properties of PCA will usually not apply to the PC scores obtained in this way. That means, the first (few) PCs may not be the directions asssociated with maximal amounts of variation and scores for different PCs may be correlated. The new PC scores may be useful, however, in scatter plots that may show interesting features of variation among groups or taxa that may be difficult to characterise otherwise.
Because PCA is essentially just a rotation to a new coordinate system, it can easily be applied to a different dataset, provided it has the same set of variables as the dataset from which the PCA was computed (via a covariance matrix derived from the dataset). In the context of geometric morphometrics, this means the datasets should have the same set of landmarks. Although MorphoJ checks for the agreement of some some conditions (number of landmarks, 2D or 3D data, object symmetry), the final responsibility for this choice rests with the user.
Number of PCs. The number of PCs is determined by the PCA from which the PCs are taken. MorphoJ includes only PCs that are associated with variation (eigenvalues greater than zero). This number may be fewer or more than the number of PCs for a PCA using the dataset used to compute the scores.
As a result, the new coordinate system of the PC scores may have fewer dimensions than the dataset from which the scores are computed. If the PCA has only few PCs, the dimensions of the new scores may be less than in the dataset from which they were computed and, as a consequence, properties such as distances between data points may be affected.
PC scores are the main result of this analysis, and are usually used for visual inspection in scatter plots. PC scores are computed as the vectors of deviations of the observations from the sample mean, multiplied by the vectors of PC coefficients, unless the dataset contains data such as independent contrasts, for which such centering would be inappropriate.
Note: The PC scores from this type of analysis usually will not have any of the optimal properties that PC scores usually have, i.e. in general they are not uncorrelated with each other, the first few PCs are not the directions with maximal variance, etc. It is possible that these conditions still will hold approximately, but there is no guarantee of that.
Select PC Scores From Other PCA from the Variation menu. A dialog box like the following will appear.
The text fied at the top is for providing a name for the analysis, and the two drop-down menus below it are for selecting the dataset from which the scores are to be computed and the PCA from which the PC coefficients (eigenvectors) are to be used. To abort the procedure, the user can click the Cancel button. To continue with the analysis, click the Continue button.
MorphoJ will examine the dataset and PCA to determine which data matrix is to be used (if the dataset contains data of multiple types). As far as is possible, MorphoJ will choose the same options as were used in the original PCA. To confirm further details of the dataset, a further dialog box like this will appear:
The names of the PCA and the dataset chosen in the previous dialog box are displayed, along with a drop-down menu where the most suitable data type is selected. Usually, this setting is appropriate, as it is the one coresponding to the covariance matrix used in the original PCA.
To stop the analysis, click the Cancel button. To continue and perform the analysis, click the Execute button.
The PCA produces three types of graphs in a tab in the Graphics window:
The text output in the Results window contains the following information:
The PC coefficients are not given; they can be obtained in the Results for the original PCA from which the PC coefficients were taken.
A new output dataset is generated that contains the PC scores. The identifiers and classifier variables are copied from the original dataset, and the new dataset is also linked automatically to the original dataset and all other datasets to which it is linked in turn.
Boulesteix, A.-L. 2005. A note on between-group PCA. International Journal of Pure and Applied Mathematics 19:359–366.
Klingenberg, C. P., and J. R. Spence. 1993. Heterochrony and allometry: lessons from the water strider genus Limnoporus. Evolution 47:1834–1853.
Mitteroecker, P., and F. L. Bookstein. 2011. Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evolutionary Biology 38:100–114.
Revell, L. J. 2009. Size-correction and principal components for interspecific comparative studies. Evolution 63:3258–3268.