Cladistics


Parsimony programs like PAUP, Hennig86 and Nona need a way to calculate the number of steps required by a data set on a given tree.

Farris Optimization does this for additive (ordered) characters
Fitch Optimization does this for nonadditive (unordered) characters.

Fitch Optimization follows here. Consider the tree:

we can calculate the number of steps in what's called a "downpass" by first grabbing two adjacent terminals on the tree, moving to the node and asking whether or not there is an intersection set for the characters:

Above, there is an intersection of A and A, it is, of course, A, thus we apply A to the node. Since there was an intersection, we do nothing to the length, which so far is 0 (zero).

Moving from another pair of taxa:

things are a bit different. Above here there is no intersection set for C and T, thus we apply the union set to the node, and because there is no intersection set we add a length of 1 to the total length for this character.

Continuing down the tree:

the intersection of C, T and C is (of course) C, so no length added and C is the state-set applied to that node.

At the next-to last node:

the intersection set of A and C is empty (no intersection) so we add a length of 1 to the tree (now total length = 2 for this character), and apply the union set A, C to the node.

Moving out to the last taxon,

the intersection set of A, C and C is C, so there's no added length to the tree.

The total length for this character on this tree is then, 2. Summing this for all characters allows determination of the length of this topology given all of the data.

Searching

It is possible for small datasets to evaluate all possible tree topologies. This is done, for example, by adding taxa to the growing tree in all possible locations. However, for more than 10 taxa, exhaustive searching would require evaluating "billions and billions" of trees.
Specifically, where The number of taxa t = 4, there are 3 unrooted trees. The number of possible trees rapidly increases with increasing t. In general, the number of bifurcating rooted trees for t species is given by
(2t - 5)!/[2t-3(t - 3)!]
for t > 1 (Cavalli-Sforza and Edwards 1967). This indicates that when t = 10, the number is more than two million.
There is also implicit enumeration, in which an upper length is determined from a wagner tree for example, and then the tree is allowed to grow much as one would in the exhaustive procedure. The difference is that if when a taxon is added in a particular place, the length achieved exceeds the upper length predetermined, this then excludes a whole class of trees from consideration.
Still, though, this can take a very long time and there still might be "billions and billions" of trees to evaluate. Usually you will be constrained to search with one of the following heuristic methods.

Nearest Neighbor interchange:

Each internal branch on a tree (e.g., dashed line) has 4 attached branches that are each others nearest neighbors. Consider the branch leading to "C". It is originally connected to the nearest neighbor branch leading to (A B), and there are two possible nearest neighbor interchanges (green lines). The example given shows one such interchange, C with (F G)

Subtree Pruning Regrafting:

Any branch on the tree can be "cut" off, or pruned to create a subtree. Pruning off (F G) for example, leaves a dangling root on the pruned portion that can then be reattached (green lines) to any other branch on the tree. In the example given (solid green line), the subtree is regrafted to the rest of the original tree on the branch leading to A.

Branch-breaking (a.k.a. tree bisection reconnection):

As with Subtree Pruning Regrafting, this method can break the tree on any branch. However, then the two subtrees are each considered rootless, following which, any two branches can be connected. In the example, the tree is split into the two subtrees ((A B) C) and ((D E) (F G)), then the branch leading to A is connected to the branch leading to E.

Random Addition

Because these are heuristic searches, and not exact, it is possible that the order in which the taxa appear in your matrix can bias the direction of the outcome. This amounts to arriving at a locally optimal solution which may not be as short as the globally optimal solution.

One way to get around this is to re-do the analysis with the taxa added in randomly determined orders. This allows you to start in different places in tree-space, and thus increases your chances (but does not guarantee) arriving at the globally optimal solution.