Include or Exclude Observations ...

Sometimes it is useful to use just some of the available observations in an analysis, or to omit some specific observations (e.g. specific groups).

Observations can be included or excluded by selecting Find Outliers in the Preliminaries menu. However, that user interface is optimized specifically for finding outliers, and is therefore less convenient for other purposes.

For other situations, select a dataset in the Project Tree window and then invoke Include or Exclude Observations from the Preliminaries menu. Note that this only works for datasets that contain raw data, that is, it must be one of the datasets directly attached to the 'root' of the Project Tree (for now). The only exception to this rule are datasets containing shape change vectors (imported into the MorphoJ project via Import Shape Change Vectors in the File menu).

The following dialog box will appear:

The top part of the dialog box contains a button Create a new dataset, where the user can control whether the observations are to be included or excluded from the current dataset or whether these changes are to be applied to a new copy of the dataset. If this button is selected, the text field for the name of the new datset is activated. If the user chooses to create a new dataset, it is attached to the current dataset in the Project Tree.

The two lists show the items that are included or excluded. The Include and Exclude buttons can be used to move items from one list to the other. The Include button moves items that are marked in the list of excluded items into the list of included items. Conversely, the Exclude button moves items selected in the list of included items into the list of excluded items. In the example above, clicking the Exclude button would move the items 'Seich', 'Melan' and 'Mauri' from the list of included items to the list of excluded items. Clicking the Include button would have no effect because no item is selected in the list of excluded items.

Below the two lists, there is a button labeled Select by classifier. If this button is selected (as it is in the screen shot above), the observations in the dataset are selected by the values of the classifier chosen in the drop-down menu to the right of the button (the classifier selected in the example is 'Species'). This means the selections in the lists are for entire groups defined by the values of the classifier. If Select by classifier is not chosen, the items in the lists are individual observations, for which the identifiers will be shown in the lists.

Combining selections by individuals and groups allows a considerable degree of flexibility in the choice of observations to be included in the subsequent analyses.

Clicking the Accept button will apply the choices to the current dataset or will create a new copy of the dataset where they are applied. A new Procrustes fit will be done for the dataset. Clicking the Cancel button stops the procedure without making changes to the dataset.

Some comments

Including/excluding observations versus subdividing datasets

It may appear that including and excluding entire groups of observations by the values of a classifier is the same as subdividing the dataset by the values of that classifier.

This is not the case, because there are differences between the two ways of controlling the observations that go into further analyses. Subdividing datasets produces irreversible divisions: separate datasets are created for the different groups defined by a classifier. In contrast, including or excluding observations is reversible: observations are marked as included or excluded, but this status can be changed. On the other hand, for subdividing datasets, ther is the option

In practise, the two procedures will be used in different situations. Including/excluding observations is more flexible, but because newly produced datasets contain all the observations, it is less efficient in the situation where separate datasets and analyses are needed for a series of classes (e.g. species, sexes, geographic populations etc.).

Updating of analyses

If there are analyses that depend on the current dataset and if Create a new dataset is not selected, the analyses will update themselves to reflect te changes in the dataset. Depending on the types of analyses included (particularly those with permutation tests etc.) and on the speed of the computer, this might take a considerable time. Moreover, the results of the original analyses will no longer be available. If there are existing analyses and if major changes are done (e.g. excluding entire groups of observations), it may be preferable to use the option Create a new dataset.