Maximum Likelihood

Maximum Likelihood

CONSIDER THE DATA SET 
one   A AAAACCCCGGGGTTTT
two   C ACGTACGTACGTACGT
three A AAAAAAAAAAAAAAAA
four  C AAAAAAAAAAAAAAAA

CHARACTERS 7, 12 AND 17 ARE SYNAPOMORPHIES FOR TAXA ONE AND TWO, AND THEY OVER-RIDE CHARACTER 1

BECAUSE BRANCHES TO ONE AND TWO ARE ALREADY LONG, THAT IS TAKEN AS EVIDENCE THAT ALL CHARACTERS CAN EVOLVE MORE OFTEN ON THOSE LINEAGES, THUS, CHARACTERS 7, 12 AND 17 ARE NOT CONSIDERED SYNAPOMORPHIES FOR TAXA ONE AND TWO

The above example is chosen to demonstrate that the issue is related to the problem of all characters being limited to 4 states (a, g, c, t). Characters 2 through 17 are the 16 possible combinations of character states for taxa one and two when those states are randomly determined.

With only 16 characters being randomly determined, of course, you wouldn't expect equal representaitons of all of these combinations, but the argument is that with a couple of thousand such characters, one will arrive at a large number of characters that on-the-whole will reflect what is depicted in characters 2 through 17 above.

What is meant by a state being randomly determined?

The argument goes something like this:

Consider that you have a taxon and you observe a "C" in it and all other taxa have got an "A". Naturally you might conclude that the parsimonious solution is one change along the lineage:

But, is it not possible that the character state C doesn't represent one change, but maybe two, or three or even 10?

We can extend this argument to two taxa which both have "C"

Above, here, the two taxa with C have it for non-historical reasons. That is, there is not one change to C in a common ancestor of the two, but a whole bunch of "unobserved" changes or "overwriting" changes or "multiple substitutions" along the two lineages.

The difference between this:

And this:

it is argued, is branch length, or a differential rate of substitutions.
This is why it is argued that long-branches attract in parsimony analyses.

Maximum likelihood bases its conclusions upon (among other things) the branch lengths (usually denoted m).
But wait (you might ask)... how can you know that a character position has undergone multiple substitutions without a time machine?
Good question. Actually you cannot, at least not directly. But, it is argued, if you make a couple of assumptions, you can use other information to inform you about that.

Consider again the example above:

even the parsimony tree shows long branches for one and two.
Of course, these long branches are based on other characters than the ones which will make a difference. The ticks on the branches going ot one and two are either autapomorphies or they are homoplasies (parallel changes) required by the shortest tree

In other words the branch lengths, and thus, your expectations for amount of change for any given character, are based on the collective infromation obtained from all other characters.

What is a branch length? Well, given that it is applied equally across all characters (i.e., is an expectation across all characters of the number of substitutions on that branch), the easiest way to think of it is time. That is, in the figure above, that one and two, having a most recent common ancestor, split from each other at the same time (by definition) thus all of their characters have had the same amount of time to experience change. This is a little too clocky for most people, so usually it is interpreted as rate. That is, that the lineage leading to "one" has experienced a higher rate of change relative to lineage "three" or "four". What is important here, though is that the "rate" is considered to apply uniformly across all characters, and that change is stochastic.

PAUP* for example, by default estimates branch lengths using a method determined by Rogers and Swofford:

The branch lengths are changed iteratively during the process, nonetheless, they are based on all of the positions collectively. They are expressed in terms of expected number of substitutions per site.

So, for example, in the tree ((one two)(three four)) preferred by parsimony,
the uncorrect p distances for branches are:

one             two 
  \             /
   \           /  
  0.382     0.382 
     \       /
      \     /
       \   /
        \ /
         |   
       0.324
         |
        / \    
    0.029 0.029
      /     \   
   three    four 
and the final substitutions per site (Jukes-Cantor) are

one             two 
  \             /
   \           /  
10.48215   0.83571
     \       /
      \     /
       \   /
        \ /
         |    
      1.28920
         |
        / \         
  0.06126 0.00000001
      /     \   
   three    four 
How is the branch length actually implemented?
Suppose for the moment that you are not going to consider anything except branch length.

And suppose that you have the folowing distribution of states for a character:

one   c
two   c
three a
four  a

And, now, suppose that your expected number of substitutions per site was 0.06 for the branch leading to three, 0.0000000001 leading to four, 10.5 leading to one and 0.835 leading to two, and 1.29 on the internal branch.

It is more probable that there are two changes (one on the branch leading to one and to two) and no changes on the other branches, than is one change on the internal branch and no changes on the other branches.

                                        r (corr.)
    Hyp1:   1     0     1     0     0   0.637
Expected: 10.5  1.29  0.835  0.06  0.00 
    Hyp2:   0     1     0     0     0   -0.155
Correlation of hypothesized and expected changes is a convenient way to think about it but is not really how the probabilities are determined.

Don't forget too that incorporated in this there is a probability that the site will stay the same across a branch and it is more likely (given the expected number of substitutions) that things will stay the same along the branch leading to three and to four than on the other two branches.

Let's consider the most parsimonious solution first:

or,
p(c,c|1) * p(c,c|2) * p(a,a|3) * p(a,a|4) * p(c,a|5)

But, it turns out, this is not the only possible way to get that distribution of states at the terminals. It could be that there was

Same distribution of states, different explanation.
instead of p(c,c|1) * p(c,c|2) * p(a,a|3) * p(a,a|4) * p(c,a|5) ...
it's p(c,c|1) * p(c,c|2) * p(c,a|3) * p(c,a|4) * p(c,c|5)

This being an "OR" statement, they are added (inlike the "and" statements which are multiplied above.

So, the probability for this distribution of states for this character for this tree is


[p(c,c|1) * p(c,c|2) * p(a,a|3) * p(a,a|4) * p(c,a|5)] + [p(c,c|1) * p(c,c|2) * p(c,a|3) * p(c,a|4) * p(c,c|5)]

So far...

Even though you might only be thinking about the four in the outlined box below, all of these are possibilities, and all of them have probabilities that must be summed into the the probability of this distributions of states for this character for this tree:

But that's only one character... That character (character #1) on that tree (the prefered by parsimony), and given the branch lengths inferred from full data set yields a final probability of

L = 0.0003604

This now has to be done for character#2 AND character #3 AND ... so on
multiplying across characters (because of the AND) statement give the likelihood of all of those characters on that tree.

Likelihood of this tree = (0.000360 * 0.017301 * 0.013841 * 0.013841 * 0.013841 * 0.017301 * 0.013841 * 0.013841 * 0.013841 * 0.017301 * 0.013841 * 0.013841 0.013841 * 0.017301 * 0.013841 * 0.013841 * 0.013841)

Now that that's done, you need to evaulate the next tree in the same way

  1. multiplying all branch change / nochange probabilities,
  2. summing those for all possible assingements of states to internal nodes
  3. multiplying that across all characters.

or

Pc(Ss[Pb p(b)])
which might look complicated but isn't really

P means "the product of all"
S means "the sum of all"
for all b branches
for all s state assignments
for all c characters

By the time you get to step 3 though, the probability for a character might be 0.000234. Imagine multiplying similar values across all characters, you might get a final probability (likelihood) of

0.0000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000002345629212756

which is a lot of decimal places to hold on to

Well, the log of this is not so hard... it's something like -427.5

As the probability gets high and higher, the value might climb to something like -415.867

Whichever tree provides the maximum value for this likelihood function wins.

In practise, people get wierded out by negative numbers, programs and programmers work easier with positive numbers so really what happens is the maximum likelihood is the same as the minimum negative log likelihood

Which is why you don't see negative values and why the tree chosen has a lower numerical score than one not chosen.

Okay... but, as you learned from the distance lecture. Branch lengths aren't the only thing that can influence what you think is probable or not.

You might think that transitions are more probable than transversions. Rather than weighting in parsimony, likelihood goes one better, the "weight" (inverse of probability of change) can be modified depending on what branch you are on. In weighting parsimony you can't do this. A transversion costs the same no matter where you are on the tree.

It is convenient now to consider the Q matrix of changes between the four bases:

if we think that all changes are equally probable, and we have some branch-length rate m of change (expected number of changes per site), then we can insert in each of cells the Jukes Cantor equation which you remember is simply 1/4 of the total rate (there are 4 nucleotides).

ACGT
A0.25 m0.25 m0.25 m
C0.25 m0.25 m0.25 m
G0.25 m0.25 m0.25 m
T0.25 m0.25 m0.25 m
Since either there is a change or there is not a change the probability of staying the same is the invers of the probability of change. Note that probability of change from A is 3/4 m thus the final Q matrix is
ACGT
A-0.75 m0.25 m0.25 m0.25 m
C0.25 m-0.75 m0.25 m0.25 m
G0.25 m0.25 m-0.75 m0.25 m
T0.25 m0.25 m0.25 m-0.75 m
Of course, you know already about the Kimura two parameter model where there is a different class of changes for transitions and transversions where say, k is the ratio of transitions to transversions.
Then Q is
ACGT
A-0.75 m(k+2)0.25 m0.25 mk0.25 m
C0.25 m-0.75 m(k+2)0.25 m0.25 mk
G0.25 mk0.25 m-0.75 m(k+2)0.25 m
T0.25 m0.25 mk0.25 m-0.75 m(k+2)

That is as k increases, we expect higher amounts of the four transitions.

Similarly, one can adjust expected changes based on base compositions (later).

To get the Jukes Cantor model in PAUP*, do this under likelihood settings:


To get the Kimura 2 parameter model, change the substitution model to include an extra parameter (i.e., k). If you have some reason to set it to 2 times or 3 times ti:tv, go ahead but generally, you'll want to estimate it from the data just like the branch lengths.

To get the F81 model, keep the substitution model above with "all rates equal", but include an extra parameter for base compositions(i.e., p) using empirical frequencies:

Of course you can combine a value for k and for p into the HKY85 model


< All of the above have the branch rate m being modified by the parameters you add.

Remember from the distance lecture the notion that it might be illogical to think of all sites varying at the same rate.

Stem and loop regions might change at different rates.
1st 2nd and 3rd positions might too.
Usually you expect these to fall into an over dispersed distribution, with a lot of sites not changing at all, and a few sites changing a lot.
like this:

The difference between the two lines is defined by the "shape" parameter for the gamma distribution. You can "bin" these into whatever number of rate categories you deem appropriate.

Arguably a combined data set of CO-I and 18S needs one for each position in CO-I plus variable and conserved regions in 18S, for at least 5.


How do I choose a model?

The model with the most parameters will always yield the lowest -log likelihood. Thus, arguments that say that the best-fitting model are the best to use are silly.

How do you figure out how many parameters to add to the model?

You need to compare the ratio of the best likelihood value from the competing models.
or
For example,

Pc(Ss[Pb p(b)]) from K2P

Pc(Ss[Pb p(b)]) from JC

which is the same as


2(ln LK2P - ln LJC)

But, if you are going to do this properly, you need to find out the distribution of the statistic 2(ln LK2P - ln LJC). To do this, you must simulate many (e.g., 100) datasets equal in size to the one you are actually using and according to the lesser model (in this case JC) using the branch length parameters determined above. Analyse those Monte-Carlo data sets according to both the K2P and the JC models. Then, the frequency with which the values of 2(ln LK2P - ln LJC) for the Monte Carlo data sets is better than that from the observed is your P-value.

In other words, if less than say 5% of the simulated data sets have a 2(ln LK2P - ln LJC) that is higher than 2(ln LK2P - ln LJC) for the observed, you should go with the K2P model.

If this is not the case, this doesn't mean you should go with the JC model, since you may get a significant result comparing 2(ln LF81 - ln LJC).

Actually, if you reject the JC model in favour of the K2P model you still need to compare the K2P model with more complex ones like 2(ln LHKY85 - ln LK2P).

The simplest model that is not rejected is the one you should go with.

It is not clear what to do if both F81 and K2P reject the JC model but neither rejects the other.

The above also applies to the additions of a rate parameter with the gamma distribution. Typically, denoted by G and the number of bins. For example, the HKY85 model with 5 rate bins is denoted: HKY85+G5


Is the tree with the best likelihood significantly better than some other tree? Since, in choosing the likelihood approach, you have adopted a statistical means for making inferences, this is not a trivial question.

The likelihood ratio test for comparing trees of different shape and with different branch lengths is not as straightforward as the one above for diferent models.

The Kishino-Hasegawa test is deemed appropriate, and you can use it to compare any two trees.

Usually though, you are interested in how the best one compares to all other trees.

To do this,


1) set  the criterion to likelihood and choose your model
set crit = likelihood;
lset nst=1 basefreq=equal; [this sets to JC model] 2) create 100 random trees
generatetrees random ntrees = 100; 3) find the best tree but keep all trees initially in memory
hsearch retain = yes; 4)assuming that maxtrees is set to 100, this will leave you with the best tree
and 99 random trees in memory, now do the Kishino Hasegawa test
lscores all /khtest = yes; This will return something like this: Tree -ln L Diff -ln L s.d.(diff) T P* ----------------------------------------------------------------------- 1 3298.50674 332.57585 43.71620 7.6076 <0.0001 ** 2 3313.76149 347.83060 44.83428 7.7581 <0.0001 ** 3 3163.84137 197.91049 35.22177 5.6190 <0.0001 ** 4 3175.81714 209.88625 35.11206 5.9776 <0.0001 ** 5 3243.35519 277.42431 44.67410 6.2100 <0.0001 ** 6 3315.69463 349.76374 44.60222 7.8418 <0.0001 ** 7 3285.77959 319.84870 42.01821 7.6121 <0.0001 ** 8 3301.71134 335.78045 42.84538 7.8370 <0.0001 ** 9 3299.68373 333.75284 42.89941 7.7799 <0.0001 ** ... 95 3281.53440 315.60352 41.42098 7.6194 <0.0001 ** 96 3291.93617 326.00529 40.92667 7.9656 <0.0001 ** 97 3306.11727 340.18639 43.98721 7.7338 <0.0001 ** 98 3300.20559 334.27470 44.02806 7.5923 <0.0001 ** 99 3308.12459 342.19371 43.33005 7.8974 <0.0001 ** 100 3305.71321 339.78232 44.50874 7.6341 <0.0001 ** 101 2965.93089 (best) of course, here all of the random trees are significantly different. So it's not helpful in finding the 95% cut-off. But the best random tree was 3171.03890.

So, now 5) search again keeping trees less than say 3000.

hsearch keep = 3000; Tree -ln L Diff -ln L s.d.(diff) T P* ----------------------------------------------------------------------- 1 2967.11542 1.18453 9.13136 0.1297 0.8968 2 2980.67241 14.74152 12.73800 1.1573 0.2475 3 2978.38470 12.45382 13.17812 0.9450 0.3449 4 2972.94375 7.01287 14.92176 0.4700 0.6385 5 2987.03006 21.09918 14.32121 1.4733 0.1411 6 2980.67317 14.74229 12.74333 1.1569 0.2477 7 2983.10154 17.17065 12.18168 1.4095 0.1591 8 2999.27328 33.34239 14.33208 2.3264 0.0203 ** 9 2992.65868 26.72779 14.76488 1.8102 0.0707 10 2981.61849 15.68760 11.15949 1.4058 0.1602 11 2990.66879 24.73790 11.84754 2.0880 0.0371 ** 12 2989.93132 24.00044 11.98228 2.0030 0.0455 ** 13 2979.78728 13.85640 12.85434 1.0780 0.2814 14 2998.55586 32.62497 15.00219 2.1747 0.0300 ** 15 2978.79584 12.86495 11.44823 1.1238 0.2615 16 2983.59455 17.66367 11.60315 1.5223 0.1283 17 2987.73661 21.80573 12.31617 1.7705 0.0770 18 2979.85333 13.92244 11.23759 1.2389 0.2158 19 2983.13247 17.20158 10.34537 1.6627 0.0968 20 2974.45408 8.52319 9.63580 0.8845 0.3767 21 2989.14609 23.21520 13.11079 1.7707 0.0770 22 2989.14609 23.21520 13.11076 1.7707 0.0770 23 2980.74229 14.81140 11.75886 1.2596 0.2082 24 2986.78063 20.84975 13.70134 1.5217 0.1285 25 2988.48539 22.55451 12.82535 1.7586 0.0790 26 2986.43241 20.50152 12.63695 1.6223 0.1051 27 2995.55877 29.62788 13.64928 2.1707 0.0303 ** 28 2991.36073 25.42984 12.78121 1.9896 0.0470 ** 29 2994.47471 28.54383 11.96524 2.3856 0.0173 ** 30 2984.24637 18.31549 11.31361 1.6189 0.1059 31 2996.65349 30.72261 13.70066 2.2424 0.0252 ** 32 2996.65349 30.72261 13.70067 2.2424 0.0252 ** 33 2982.80662 16.87574 14.22088 1.1867 0.2357 34 2975.33343 9.40254 12.88025 0.7300 0.4656 35 2980.50533 14.57444 14.51892 1.0038 0.3158 36 2982.76952 16.83863 13.60500 1.2377 0.2162 37 2984.36287 18.43198 13.44839 1.3706 0.1709 38 2992.33472 26.40383 14.26287 1.8512 0.0645 39 2985.09518 19.16429 13.65033 1.4039 0.1607 40 2988.31536 22.38447 12.86422 1.7401 0.0822 41 2978.07343 12.14254 12.35236 0.9830 0.3259 42 2990.48622 24.55533 14.56870 1.6855 0.0923 43 2990.48622 24.55533 14.56869 1.6855 0.0923 44 2975.53735 9.60647 15.02211 0.6395 0.5227 45 2973.42740 7.49651 8.97847 0.8349 0.4040 46 2982.28413 16.35324 11.13607 1.4685 0.1424 47 2985.95148 20.02059 12.18017 1.6437 0.1006 48 2971.99952 6.06863 7.62484 0.7959 0.4263 49 2974.65416 8.72327 6.37949 1.3674 0.1719 50 2968.65147 2.72059 2.98633 0.9110 0.3626 51 2986.00930 20.07841 10.11053 1.9859 0.0474 ** 52 2986.00930 20.07841 10.11061 1.9859 0.0474 ** 53 2977.60831 11.67742 10.55132 1.1067 0.2688 54 2981.53627 15.60538 11.64153 1.3405 0.1805 55 2970.28425 4.35337 5.02236 0.8668 0.3863 56 2999.06604 33.13515 12.92669 2.5633 0.0106 ** 57 2996.87644 30.94556 13.70025 2.2588 0.0242 ** 58 2965.93089 (best) 59 2983.53256 17.60167 9.75043 1.8052 0.0714 60 2983.53256 17.60167 9.75050 1.8052 0.0714 61 2973.42740 7.49651 8.97847 0.8349 0.4040 62 2982.25646 16.32557 11.16832 1.4618 0.1442 63 2985.95148 20.02059 12.18017 1.6437 0.1006 64 2971.99952 6.06863 7.62480 0.7959 0.4263 65 2974.65416 8.72327 6.37949 1.3674 0.1719 66 2968.63513 2.70425 3.01537 0.8968 0.3701 67 2985.44674 19.51585 10.08878 1.9344 0.0534 68 2985.44674 19.51585 10.08878 1.9344 0.0534 69 2997.22656 31.29567 13.34455 2.3452 0.0193 ** 70 2998.40428 32.47340 18.12973 1.7912 0.0737 71 2993.30536 27.37447 14.81426 1.8478 0.0650 72 2982.05286 16.12198 13.58022 1.1872 0.2355 73 2991.25557 25.32468 13.52960 1.8718 0.0616 74 2988.30245 22.37156 15.77713 1.4180 0.1566 75 2983.05752 17.12663 13.17644 1.2998 0.1941 76 2982.78970 16.85881 12.75893 1.3213 0.1868 77 2999.62237 33.69149 16.46271 2.0465 0.0410 ** 78 2986.16067 20.22978 13.03746 1.5517 0.1212 79 2982.27620 16.34531 14.77749 1.1061 0.2690 80 2976.77812 10.84724 11.75435 0.9228 0.3564 81 2976.55201 10.62112 11.30985 0.9391 0.3480 82 2998.12046 32.18957 15.33854 2.0986 0.0362 ** 83 2995.72767 29.79678 15.95334 1.8677 0.0622 84 2994.74425 28.81336 15.78102 1.8258 0.0683 85 2997.67143 31.74054 15.05408 2.1084 0.0353 ** 86 2995.41897 29.48808 15.59442 1.8909 0.0590 87 2993.62774 27.69686 16.27301 1.7020 0.0892 88 2983.20513 17.27424 14.15147 1.2207 0.2226 89 2979.33225 13.40136 15.50886 0.8641 0.3878 90 2973.69000 7.75911 12.67420 0.6122 0.5406 91 2973.48460 7.55372 12.25344 0.6165 0.5378 92 2998.03838 32.10750 17.09225 1.8785 0.0607 93 2996.29604 30.36515 17.73406 1.7123 0.0873 94 2982.05286 16.12198 13.58022 1.1872 0.2355 95 2991.25557 25.32468 13.52959 1.8718 0.0616 96 2988.30245 22.37156 15.77713 1.4180 0.1566 97 2983.05752 17.12663 13.17644 1.2998 0.1941 98 2982.78970 16.85881 12.75893 1.3213 0.1868 99 2998.24466 32.31377 16.14107 2.0020 0.0456 ** 100 2978.90839 12.97750 12.17039 1.0663 0.2866 101 2975.38524 9.45435 11.08765 0.8527 0.3941 102 2968.63357 2.70268 8.89714 0.3038 0.7614 103 2996.03830 30.10742 14.45143 2.0834 0.0375 ** 104 2983.12385 17.19296 16.89849 1.0174 0.3093 105 2968.63357 2.70268 8.89715 0.3038 0.7614 106 2993.39751 27.46662 17.10885 1.6054 0.1088 107 2996.11606 30.18518 16.17865 1.8657 0.0625 108 2974.92744 8.99656 11.24029 0.8004 0.4237 109 2978.61794 12.68706 12.34765 1.0275 0.3045 110 2970.62173 4.69084 13.08673 0.3584 0.7201 111 2976.82333 10.89245 10.75413 1.0129 0.3114 112 2976.92537 10.99449 10.43179 1.0539 0.2922 113 2971.46593 5.53504 13.53073 0.4091 0.6826 114 2978.39216 12.46127 15.03862 0.8286 0.4076 115 2970.82064 4.88976 14.44808 0.3384 0.7351 116 2974.38512 8.45423 12.50899 0.6759 0.4993 117 2988.55863 22.62775 14.06851 1.6084 0.1082 118 2987.11407 21.18318 14.62493 1.4484 0.1479 119 2983.65640 17.72551 10.41507 1.7019 0.0892 120 2977.32091 11.39003 10.70581 1.0639 0.2877 121 2997.49056 31.55967 14.40891 2.1903 0.0288 ** 122 2997.49056 31.55967 14.40891 2.1903 0.0288 ** 123 2974.63608 8.70519 12.40412 0.7018 0.4830 124 2971.43585 5.50496 13.94839 0.3947 0.6932 125 2980.20971 14.27883 13.85613 1.0305 0.3031 126 2987.93567 22.00478 16.25997 1.3533 0.1764 127 2983.36664 17.43575 14.87421 1.1722 0.2415 128 2977.85412 11.92323 13.31677 0.8954 0.3709 129 2978.08996 12.15907 10.83798 1.1219 0.2623 130 2996.83360 30.90271 16.13384 1.9154 0.0558 131 2978.69594 12.76505 11.38787 1.1209 0.2627 132 2975.61629 9.68540 10.44129 0.9276 0.3539 133 2982.83798 16.90709 15.02733 1.1251 0.2609 134 2987.72699 21.79611 15.83685 1.3763 0.1691 135 2980.42405 14.49316 10.67465 1.3577 0.1750 136 2995.77283 29.84194 14.08383 2.1189 0.0344 ** 137 2995.77283 29.84194 14.08391 2.1189 0.0344 ** 138 2974.55139 8.62051 9.15818 0.9413 0.3469 139 2991.69546 25.76457 13.38127 1.9254 0.0545 140 2991.69546 25.76457 13.38128 1.9254 0.0545 141 2996.81022 30.87933 14.21941 2.1716 0.0302 ** 142 2996.81022 30.87933 14.21944 2.1716 0.0302 ** 143 2990.20273 24.27185 13.59367 1.7855 0.0746 144 2990.20273 24.27185 13.59353 1.7855 0.0746 145 2975.17807 9.24718 12.99671 0.7115 0.4770 146 2997.93678 32.00590 17.50485 1.8284 0.0679 147 2978.28499 12.35410 11.52604 1.0718 0.2841 148 2997.75949 31.82861 14.74601 2.1585 0.0312 ** 149 2995.76690 29.83602 14.52454 2.0542 0.0403 ** 150 2994.89423 28.96335 15.64580 1.8512 0.0645 151 2994.89423 28.96335 15.64581 1.8512 0.0645 152 2974.55139 8.62051 9.15818 0.9413 0.3469 153 2995.32919 29.39830 14.35163 2.0484 0.0409 ** 154 2995.32919 29.39830 14.35169 2.0484 0.0409 ** 155 2980.42405 14.49316 10.67466 1.3577 0.1750 156 2994.25441 28.32352 13.79991 2.0524 0.0405 ** 157 2994.25441 28.32352 13.79991 2.0524 0.0405 ** 158 2986.97472 21.04384 10.38496 2.0264 0.0431 ** 159 2982.10658 16.17569 9.51919 1.6993 0.0897 160 2973.32201 7.39112 6.38624 1.1574 0.2475 161 2973.96160 8.03072 11.71901 0.6853 0.4934 162 2977.70069 11.76980 10.49998 1.1209 0.2627 163 2993.22583 27.29494 16.19875 1.6850 0.0924 164 2993.22583 27.29494 16.19873 1.6850 0.0924 165 2973.05838 7.12749 5.71029 1.2482 0.2123 166 2980.09468 14.16379 10.15924 1.3942 0.1637 167 2986.97472 21.04384 10.38496 2.0264 0.0431 ** 168 2996.96903 31.03814 14.27946 2.1736 0.0300 ** 169 2994.74425 28.81336 15.78102 1.8258 0.0683 170 2997.67143 31.74054 15.05408 2.1084 0.0353 ** 171 2998.94863 33.01774 14.49220 2.2783 0.0230 ** 172 2995.05939 29.12851 14.90895 1.9538 0.0511 173 2993.15826 27.22737 15.22459 1.7884 0.0741 174 2982.98744 17.05655 17.37429 0.9817 0.3266 175 2981.33622 15.40533 17.99128 0.8563 0.3921 176 2988.20973 22.27884 18.16623 1.2264 0.2204 177 2997.22280 31.29191 14.25958 2.1944 0.0285 ** 178 2973.50416 7.57327 8.79085 0.8615 0.3892 179 2969.27013 3.33924 7.28428 0.4584 0.6468 180 2965.93089 0.00000 0.00000 0.0000 1.0000 181 2977.50317 11.57229 10.76707 1.0748 0.2828 182 2978.69594 12.76505 11.38785 1.1209 0.2627 183 2993.15826 27.22737 15.22462 1.7884 0.0741 184 2995.91881 29.98792 15.31847 1.9576 0.0506 185 2973.32201 7.39112 6.38624 1.1574 0.2475 186 2973.08760 7.15672 5.56065 1.2870 0.1985 187 2971.97889 6.04800 9.31702 0.6491 0.5164 188 2996.11606 30.18518 16.17865 1.8657 0.0625 189 2993.39751 27.46662 17.10885 1.6054 0.1088 190 2994.46407 28.53318 16.81767 1.6966 0.0902 191 2996.11606 30.18518 16.17865 1.8657 0.0625 From the above we can see that the 5% cut off is about 2985 or so, but it depende on shape. Finally, then, re do the search, keeping all trees less than the cut off (say 2980 just to be sure). hsearch; [to get best tree again]; hsearch retain = yes keep = 2980; At this point though, you might want to set maxtrees to 100. Tree -ln L Diff -ln L s.d.(diff) T P* ----------------------------------------------------------------------- 1 2965.93089 (best) 2 2969.62398 3.69309 12.12455 0.3046 0.7608 3 2973.53958 7.60869 11.40237 0.6673 0.5048 4 2978.53969 12.60881 10.71262 1.1770 0.2396 5 2978.21371 12.28283 10.90285 1.1266 0.2603 6 2978.25002 12.31914 10.45251 1.1786 0.2389 7 2972.54935 6.61847 11.31927 0.5847 0.5589 8 2977.85564 11.92476 12.26430 0.9723 0.3312 9 2975.99033 10.05944 9.59869 1.0480 0.2950 10 2973.77224 7.84136 8.63519 0.9081 0.3641 11 2976.77446 10.84357 7.42310 1.4608 0.1445 12 2976.16244 10.23156 5.77298 1.7723 0.0767 13 2970.71551 4.78463 7.49622 0.6383 0.5235 14 2976.97005 11.03917 9.94631 1.1099 0.2674 15 2970.11283 4.18194 10.11872 0.4133 0.6795 16 2965.93089 0.00000 0.00000 0.0000 1.0000 17 2978.70433 12.77345 6.85736 1.8627 0.0629 18 2976.47541 10.54452 8.03463 1.3124 0.1898 19 2976.32421 10.39333 9.54279 1.0891 0.2764 20 2973.53958 7.60869 11.40241 0.6673 0.5048 21 2970.21980 4.28891 13.03413 0.3291 0.7422 22 2978.87052 12.93964 13.06220 0.9906 0.3222 23 2973.89856 7.96768 10.96794 0.7265 0.4678 24 2977.13400 11.20311 10.77569 1.0397 0.2988 25 2974.54425 8.61337 11.19478 0.7694 0.4419 26 2971.98121 6.05032 11.06652 0.5467 0.5847 27 2976.33950 10.40861 10.86166 0.9583 0.3382 28 2974.29083 8.35995 11.13147 0.7510 0.4529 29 2979.12661 13.19572 10.69860 1.2334 0.2178 30 2979.95514 14.02425 12.75814 1.0992 0.2720 31 2978.39299 12.46210 13.28132 0.9383 0.3484 32 2979.50812 13.57724 13.35305 1.0168 0.3096 33 2972.82726 6.89637 15.91031 0.4335 0.6648 34 2975.38209 9.45120 15.63353 0.6045 0.5457 35 2976.37527 10.44439 16.34082 0.6392 0.5229 36 2970.00428 4.07339 15.06568 0.2704 0.7869 37 2978.08387 12.15298 9.34637 1.3003 0.1939 38 2976.68306 10.75217 12.25877 0.8771 0.3807 39 2976.68306 10.75217 12.25877 0.8771 0.3807 40 2974.77141 8.84052 12.47836 0.7085 0.4789 41 2973.91161 7.98072 12.49145 0.6389 0.5231 42 2976.70014 10.76926 12.77293 0.8431 0.3994 43 2974.79163 8.86074 12.96193 0.6836 0.4944 44 2977.65537 11.72449 12.62790 0.9285 0.3535 45 2977.65537 11.72449 12.62790 0.9285 0.3535 46 2975.70819 9.77730 12.84673 0.7611 0.4468 47 2973.89978 7.96890 12.51774 0.6366 0.5246 48 2976.74702 10.81614 11.79478 0.9170 0.3594 49 2976.74702 10.81614 11.79478 0.9170 0.3594 50 2976.74702 10.81614 11.79478 0.9170 0.3594 51 2974.62581 8.69493 11.82587 0.7352 0.4624 52 2974.62581 8.69493 11.82587 0.7352 0.4624 53 2972.38981 6.45892 11.60223 0.5567 0.5779 54 2972.39533 6.46445 11.65165 0.5548 0.5792 55 2976.91844 10.98755 11.83768 0.9282 0.3536 56 2976.91844 10.98755 11.83768 0.9282 0.3536 57 2976.91844 10.98755 11.83768 0.9282 0.3536 58 2976.57060 10.63972 11.73690 0.9065 0.3649 59 2978.07818 12.14729 11.52264 1.0542 0.2921 60 2973.64230 7.71141 11.86323 0.6500 0.5159 61 2970.85637 4.92549 12.29099 0.4007 0.6887 62 2975.98791 10.05702 11.93562 0.8426 0.3997 63 2979.05145 13.12056 11.33607 1.1574 0.2475 64 2976.52017 10.58928 12.65514 0.8368 0.4030 65 2977.82040 11.88951 12.68925 0.9370 0.3491 66 2974.79163 8.86074 12.96193 0.6836 0.4944 67 2976.70014 10.76926 12.77293 0.8431 0.3994 68 2979.95695 14.02606 12.60904 1.1124 0.2663 69 2979.89476 13.96387 13.32325 1.0481 0.2949 70 2976.42639 10.49550 13.25500 0.7918 0.4287 71 2969.62398 3.69309 12.12455 0.3046 0.7608 72 2969.62398 3.69309 12.12455 0.3046 0.7608 73 2979.59552 13.66463 12.36456 1.1051 0.2694 74 2978.36153 12.43064 13.75162 0.9039 0.3663 75 2975.02877 9.09789 11.44816 0.7947 0.4270 76 2972.88883 6.95794 12.62445 0.5511 0.5817 77 2970.62173 4.69084 13.08673 0.3584 0.7201 78 2972.88883 6.95794 12.62445 0.5511 0.5817 79 2976.70014 10.76926 12.77293 0.8431 0.3994 80 2970.89115 4.96027 12.67571 0.3913 0.6957 81 2979.84526 13.91437 10.22170 1.3613 0.1738 82 2972.39533 6.46445 11.65165 0.5548 0.5792 83 2978.28389 12.35300 12.25312 1.0082 0.3137 84 2976.13208 10.20120 12.63640 0.8073 0.4198 85 2974.90084 8.96995 12.64497 0.7094 0.4783 86 2975.83838 9.90749 11.58853 0.8549 0.3929 87 2974.62581 8.69493 11.82587 0.7352 0.4624 88 2970.50782 4.57693 12.00975 0.3811 0.7032 89 2974.38943 8.45854 12.06375 0.7012 0.4834 90 2975.21929 9.28841 13.62520 0.6817 0.4956 91 2968.30338 2.37249 12.78982 0.1855 0.8529 92 2969.54458 3.61369 11.50149 0.3142 0.7535 93 2979.44884 13.51796 14.50171 0.9322 0.3515 94 2979.86708 13.93620 14.24740 0.9782 0.3283 95 2971.79032 5.85944 14.32062 0.4092 0.6825 96 2978.46137 12.53048 11.04700 1.1343 0.2570 97 2976.38480 10.45391 14.15329 0.7386 0.4604 98 2977.55152 11.62063 11.13175 1.0439 0.2969 99 2972.59454 6.66366 13.61826 0.4893 0.6248 100 2972.33945 6.40857 14.33194 0.4472 0.6549 Which confirms that none of these are significantly different. Now do a consensus: