Thursday, April 28, 2011

Processing nested lists

So perhaps you have all figured this out already, but I was excited to figure out how to finally neatly get all the data frames, lists, vectors, etc. out of a nested list. It is as easy as nesting calls to the apply family of functions, in the case below, using plyr's apply like functions. Take this example:



# Nested lists code, an example
# Make a nested list
mylist <- list()
mylist_ <- list()
for(i in 1:5) {
 for(j in 1:5) {
  mylist[[j]] <- i*j
 }
mylist_[[i]] <- mylist
}
 
# return values from first part of list
laply(mylist_[[1]], identity)
[1] 1 2 3 4 5
 
# return all values
laply(mylist_, function(x) laply(x, identity))
     1  2  3  4  5
[1,] 1  2  3  4  5
[2,] 2  4  6  8 10
[3,] 3  6  9 12 15
[4,] 4  8 12 16 20
[5,] 5 10 15 20 25
 
# perform some function, in this case sqrt of each value
laply(mylist_, function(x) laply(x, function(x) sqrt(x)))
  
        1        2        3        4        5
[1,] 1.000000 1.414214 1.732051 2.000000 2.236068
[2,] 1.414214 2.000000 2.449490 2.828427 3.162278
[3,] 1.732051 2.449490 3.000000 3.464102 3.872983
[4,] 2.000000 2.828427 3.464102 4.000000 4.472136
[5,] 2.236068 3.162278 3.872983 4.472136 5.000000


Created by Pretty R at inside-R.org

Tuesday, April 26, 2011

Running Phylip's contrast application for trait pairs from R

Here is some code to run Phylip's contrast application from R and get the output within R to easily manipulate yourself. Importantly, the code is written specifically for trait pairs only as the regular expression work in the code specifically grabs data from contast results when only two traits are input. You could easily change the code to do N traits. Note that the p-value calculated for the chi-square statistic is not output from contrast, but is calculated within the function 'PhylipWithinSpContr'. In the code below there are two functions that make a lot of busy work easier: 'WritePhylip' and 'PhylipWithinSpContr'. The first function is nice because the formatting required for data input to Phylip programs is so, well, awkward  - and this function does it for you. The second function runs contrast and retrieves the output data. The example data set I produce in the code below has multiple individuals per species, so that contrasts are calculated taking into account within species variation. Get Phylip's contrast documentation here.

Note that the data input format allows only 10 characters for the species name, so I suggest if your species names are longer than 10 characters use the function abbreviate, for example, to shorten all names to no longer than 10 characters. Also, within the function WritePhylip I concatenate species names and their number of individuals per species leaving plenty of space.

Also, mess around with the options in the "system" call to get what you want. For example, I used "R", "W" and "Y", meaning replace old outfile (R), then turn on within species analyses (W), then accept all options (Y). E..g, if you don't have an old outfile, then you obviously don't need to replace the old file with the "R" command.

(p.s. I have not tried this on a windows machine).




Here is example output:


> datout
               names2   dat...1.    dat...2.
1      VarAIn_VarAest   0.000110   -0.000017
2      VarAIn_VarAest  -0.000017    0.000155
3      VarAIn_VarEest   0.790783   -0.063097
4      VarAIn_VarEest  -0.063097    0.981216
5      VarAIn_VarAreg   1.000000   -0.107200
6      VarAIn_VarAreg  -0.151800    1.000000
7     VarAIn_VarAcorr   1.000000   -0.127600
8     VarAIn_VarAcorr  -0.127600    1.000000
9      VarAIn_VarEreg   1.000000   -0.064300
10     VarAIn_VarEreg  -0.079800    1.000000
11    VarAIn_VarEcorr   1.000000   -0.071600
12    VarAIn_VarEcorr  -0.071600    1.000000
13    VarAOut_VarEest   0.790734   -0.063104
14    VarAOut_VarEest  -0.063104    0.981169
15    VarAOut_VarEreg   1.000000   -0.064300
16    VarAOut_VarEreg  -0.079800    1.000000
17   VarAOut_VarEcorr   1.000000   -0.071600
18   VarAOut_VarEcorr  -0.071600    1.000000
19    logL_withvar_df -68.779770    6.000000
20 logL_withoutvar_df -68.771450    3.000000
21           chisq_df  -0.016640    3.000000
22            chisq_p   1.000000 -999.000000

Saturday, April 16, 2011

Phylometa from R: Randomization via Tip Shuffle

---UPDATE: I am now using code formatting from gist.github, so I replaced the old prettyR code (sorry guys). The github way is much easier and prettier. I hope readers like the change.




I wrote earlier about some code I wrote for running Phylometa (software to do phylogenetic meta-analysis) from R.

I have been concerned about what exactly is the right penalty for including phylogeny in a meta-analysis. E.g.: AIC is calculated from Q in Phylometa, and Q increases with tree size.

So, I wrote some code to shuffle the tips of your tree N number of times, run Phylometa, and extract just the "Phylogenetic MA" part of the output. So, we compare the observed output (without tip shuffling) to the distribution of the tip shuffled output, and we can calculate a P-value from that. The code I wrote simply extracts the pooled effect size for fixed and also random-effects models. But you could change the code to extract whatever you like for the randomization.

I think the point of this code is not to estimate your pooled effects, etc., but may be an alternative way to compare traditional to phylogenetic MA where hopefully simply incorporating a tree is not penalizing the meta-analysis so much that you will always accept the traditional MA as better.

Get the code here, and also below. Get the example tree file and data file, named "phylogeny.txt" and "metadata_2g.txt", respectively below (or use your own data!). You need the file "phylometa_fxn.r" from my website, get here, but just call it using source as seen below.



As you can see, the observed values fall well within the distribution of values obtained from shuffling tips.  P-values were 0.64 and 0.68 for fixed- and random-effects MA's, respectively. This suggests, to me at least, that the traditional (distribution of tip shuffled analyses, the histograms below) and phylogenetic (red lines) MA's are not different. The way I would use this is as an additional analysis to the actual Phylometa output.

Sunday, April 10, 2011

Adjust branch lengths with node ages: comparison of two methods

Here is an approach for comparing two methods of adjusting branch lengths on trees: bladj in the program Phylocom and a fxn written by Gene Hunt at the Smithsonian.

Get the code and example files (tree and node ages) here
Get phylocom here: http://www.phylodiversity.net/phylocom/

Gene Hunt's method has many options you can mess with, including setting tip ages (not available in bladj), setting node ages, and minimum branch length imposed. You will notice that Gene's method may be not the appropriate if you only have extant taxa.

The function AdjBrLens uses as input a newick tree file and a text file of node ages, and uses functions you can simply run by "source" the R file bladjing_twomethods.R file from here.

Note that blad does not like numbers for node names, so you have to put a character in front of a number of just character names for nodes.



# This is where the work happens... 
# Directory below needs to have at least three items:
#  1. phylocom executable for windows or mac
#  2. tree newick file
#  3. node ages file as required by phylocom, see their manual
# Output: trees_out is a list of three trees, the original, bladj, and Gene Hunt's method
# Also, within the function all three trees are written to file as PDFs
setwd("/Mac/R_stuff/Blog_etc/Bladjing") # set working directory
source("bladjing_twomethods.R") # run functions from source file
trees_out <- AdjBrLens("tree.txt", "nodeages.txt")
 
# plot trees of three methods together, 
# with nodes with age estimates labeled
jpeg("threeplots.jpeg", quality=100)
layout(matrix(1:3, 1, 3))
plot(trees_out[[1]])
nodelabels(trees_out[[1]]$node.label, cex = 0.6)
title("original tree")
plot(trees_out[[2]])
nodelabels(trees_out[[2]]$node.label, cex = 0.6)
title("bladj method")
plot(trees_out[[3]])
nodelabels(trees_out[[3]]$node.label, cex = 0.6)
title("gene hunt method, scalePhylo")
dev.off()
Created by Pretty R at inside-R.org


It is sort of hard to see the differences in the branch length changes here, but the individual output trees will reveal the differences better.

Friday, April 1, 2011

Phylometa from R - UDPATE

THIS BLOG HAS MOVED: New version of this post at http://recology.info/2011/04/phylometa-from-r-udpate/

A while back I posted some messy code to run Phylometa from R, especially useful for processing the output data from Phylometa which is not easily done. The code is still quite messy, but it should work now. I have run the code with tens of different data sets and phylogenies so it should work.

I fixed errors when parentheses came up against numbers in the output, and other things. You can use the code for up to 4 levels of your grouping variable. In addition, there are some lines of code to plot the effect sizes with confidence intervals, comparing random and fixed effects models and phylogenetic and traditional models. 

Get the code at my website:
- Use the first file to do the work, calling the second file using source().
- This new code works with Marc's new version of Phylometa, so please update: http://lajeunesse.myweb.usf.edu/publications.html

Again, please let me know if it doesn't work, if it's worthless, what changes could make it better.

Some notes on tree formatting for Phylometa.
1. Trees cannot have node labels - remove them (e.g., tree$node.label < NULL).
2. Trees cannot have zero length branches. This may seem like a non-problem, but it might be for example if you have resolved polytomies and zero length branches are added to resolve the polytomy.
3. I think you cannot have a branch length on the root branch.