Tuesday, April 26, 2011

Running Phylip's contrast application for trait pairs from R

Here is some code to run Phylip's contrast application from R and get the output within R to easily manipulate yourself. Importantly, the code is written specifically for trait pairs only as the regular expression work in the code specifically grabs data from contast results when only two traits are input. You could easily change the code to do N traits. Note that the p-value calculated for the chi-square statistic is not output from contrast, but is calculated within the function 'PhylipWithinSpContr'. In the code below there are two functions that make a lot of busy work easier: 'WritePhylip' and 'PhylipWithinSpContr'. The first function is nice because the formatting required for data input to Phylip programs is so, well, awkward  - and this function does it for you. The second function runs contrast and retrieves the output data. The example data set I produce in the code below has multiple individuals per species, so that contrasts are calculated taking into account within species variation. Get Phylip's contrast documentation here.

Note that the data input format allows only 10 characters for the species name, so I suggest if your species names are longer than 10 characters use the function abbreviate, for example, to shorten all names to no longer than 10 characters. Also, within the function WritePhylip I concatenate species names and their number of individuals per species leaving plenty of space.

Also, mess around with the options in the "system" call to get what you want. For example, I used "R", "W" and "Y", meaning replace old outfile (R), then turn on within species analyses (W), then accept all options (Y). E..g, if you don't have an old outfile, then you obviously don't need to replace the old file with the "R" command.

(p.s. I have not tried this on a windows machine).




Here is example output:


> datout
               names2   dat...1.    dat...2.
1      VarAIn_VarAest   0.000110   -0.000017
2      VarAIn_VarAest  -0.000017    0.000155
3      VarAIn_VarEest   0.790783   -0.063097
4      VarAIn_VarEest  -0.063097    0.981216
5      VarAIn_VarAreg   1.000000   -0.107200
6      VarAIn_VarAreg  -0.151800    1.000000
7     VarAIn_VarAcorr   1.000000   -0.127600
8     VarAIn_VarAcorr  -0.127600    1.000000
9      VarAIn_VarEreg   1.000000   -0.064300
10     VarAIn_VarEreg  -0.079800    1.000000
11    VarAIn_VarEcorr   1.000000   -0.071600
12    VarAIn_VarEcorr  -0.071600    1.000000
13    VarAOut_VarEest   0.790734   -0.063104
14    VarAOut_VarEest  -0.063104    0.981169
15    VarAOut_VarEreg   1.000000   -0.064300
16    VarAOut_VarEreg  -0.079800    1.000000
17   VarAOut_VarEcorr   1.000000   -0.071600
18   VarAOut_VarEcorr  -0.071600    1.000000
19    logL_withvar_df -68.779770    6.000000
20 logL_withoutvar_df -68.771450    3.000000
21           chisq_df  -0.016640    3.000000
22            chisq_p   1.000000 -999.000000

No comments:

Post a Comment