- Your input characters can be modified in many ways, for example to
use a particular cost or
weighting scheme, as well as to modify the type of character being
analyzed. To begin this series of exercises, let's start with our
typical data set:
read ("course.fasta")
- Now we will report the cost matrix being used in the loaded characters:
report (data)
- You can see that by default POY will give cost 2 to each indel, and 1 to
every substitution (that's what the tcm:(1,2) means). We can
modify all the characters with the command transform, as
follows:
transform (tcm:(1,1))
This will change will the characters for which a transformation cost
matrix for an alignment is applicable. tcm:(1,1) will assign
cost 1 to every substitution and 1 to every indel.
Let's verify its effect:
report (data)
- We can also assign a particular cost to opening a gap block. Not
surprisingly the argument is gap_opening:
transform (tcm:(3,1), gap_opening:3)
report (data)
Can you see the effect in the data report? It is time now to
see the effect of the different parameters in the implied alignment.
- First read again your input data and build a tree:
wipe ()
read ("course.fasta")
build (1)
- Now we write down the cost of the tree, and output the implied alignment
in a file:
report (treestats, "1_2_ia.txt", implied_alignments)
- Next we modify the cost regime to substitutions 1, indels 1, and report
the new cost as well as the implied alignment:
transform (tcm:(1,1))
report (treestats, "1_1_ia.txt", implied_alignments)
- Finally we will do the same operations using a cost of 3 for
substitutions, 1 for an individual gap, and 3 for gap opening:
transform (tcm:(3,1), gap_opening:3)
report (treestats, "3_1_3_ia.txt", implied_alignments)
wipe ()
- Compare the costs and the implied alignments. What do you expect? what
do you observe? Are the transformation cost matrices metric? Are your
characters metric?
- You can fix a particular scheme of indels using the command
transform (static_approx), which stands for ``static
approximation''. A static approximation fixes a particular implied
alignment for the best tree in memory, and creates a set of characters
that match that particular alignment and resembles as much as possible
the cost regime of choice. Here is example of this:
read ("course.fasta")
build (1)
transform (tcm:(1,1))
report (data)
- We see that there are 8 molecular characters currently in memory.
Before we continue, as we will play around with this initial set of
characters and tree, we should store this initial state of the program:
store ("initial")
- We can now check the implied alignment:
report (ia)
Yes, ia and implied_alignment are equivalent.
- This alignment can now be fixed to use the resulting matrix as the
characters:
transform (static_approx)
report (data)
- Observe that after the transform there are no molecular characters left.
Instead, there are a number of non-additive characters.
- What happens if we have the default cost regime? Let's roll back to the
characters stored in ``initial'' and give this a try:
use ("initial")
transform (tcm:(1,2))
transform (static_approx)
report (data)
What can you observe?
- Finally, let's check how the static approximation behaves if you have a
gap opening parameter:
use ("initial")
transform (tcm:(3,1), gap_opening:3)
transform (static_approx)
report (data)
What is the main difference the you observe? How are indel blocks being
treated?
- Now we will learn how to transform specific characters. Suppose
that we would like to assign tcm:(2,1) to the first fragment in
course.fasta. We first check the name of the fragment:
use ("initial")
report (data)
You can see that the name of the first fragment is course.fasta:0
(the precise name may vary slightly in your computer). We can specify in
the transform command which characters should be transformed in which
way:
transform ((names:("couse.fasta:0"), tcm:(2,1)))
- try to visually match the parenthesis and understand their effect. Here
is another example, aimed at up-weighting static homology characters
only:
transform ((static, weight:2))
In this case instead of specifying characters by name, we do it by type.
This command probably makes the syntax easier to understand. If you had
troubles with the first one, try to understand the weight
example and go back to the tcm:(2,1) case again.
- To finish this section, we leave you a task: fix the alignment
of the third and fourth fragments of the file course.fasta
using cost 1 for substitutions and cost 1 for indels. Every
other character should have the default cost regime of
substitutions 1 and indels 2.