Saturday, April 16, 2011

Phylometa from R: Randomization via Tip Shuffle

---UPDATE: I am now using code formatting from gist.github, so I replaced the old prettyR code (sorry guys). The github way is much easier and prettier. I hope readers like the change.




I wrote earlier about some code I wrote for running Phylometa (software to do phylogenetic meta-analysis) from R.

I have been concerned about what exactly is the right penalty for including phylogeny in a meta-analysis. E.g.: AIC is calculated from Q in Phylometa, and Q increases with tree size.

So, I wrote some code to shuffle the tips of your tree N number of times, run Phylometa, and extract just the "Phylogenetic MA" part of the output. So, we compare the observed output (without tip shuffling) to the distribution of the tip shuffled output, and we can calculate a P-value from that. The code I wrote simply extracts the pooled effect size for fixed and also random-effects models. But you could change the code to extract whatever you like for the randomization.

I think the point of this code is not to estimate your pooled effects, etc., but may be an alternative way to compare traditional to phylogenetic MA where hopefully simply incorporating a tree is not penalizing the meta-analysis so much that you will always accept the traditional MA as better.

Get the code here, and also below. Get the example tree file and data file, named "phylogeny.txt" and "metadata_2g.txt", respectively below (or use your own data!). You need the file "phylometa_fxn.r" from my website, get here, but just call it using source as seen below.

#####################################################################
# Created by Scott Chamberlain
# Ecology and Evolutionary Biology Dept., Rice University
# Houston, TX 77005, USA
# myrmecocystus@gmail.com
# http://schamber.wordpress.com/
#####################################################################
##### Directions
#Put your phylogeny, in format required by phylometa (see Phylometa manual), in your working directory
#Put your meta-analysis dataset, in format required by phylometa (see Phylometa manual), in your working directory
#Set working directory
#Use below functions
#Beware: only use a moderator variable with up to 4 groups
# Install packages
install.packages(c("plyr","ggplot2","ape","stringr")) # if not already installed
# Load packages
require(ggplot2); require(ape); require(stringr)
# Call and run functions (used below) in the working directory [NOTE:CHANGE TO YOUR WORKING DIRECTORY]
source("Z:/Mac/R_stuff/Blog_etc/Phylometa_randomization/phylometa_fxn.r")
####### Fxn to do randomization equivalent of Phylometa analysis
# phy = name of phylogeny file
# dat = name of data file in Phylometa format
# reps = number of randomized trees per real tree
# groups = number of levels in grouping/moderator variable
# Note that trees are written to directory so that Phylometa can call them
# Output is a jpeg of the histograms of randomized results, and vertical line for observed value
# Also, p-values for fixed and random-effects are output in the console
PhylogMetaRand <- function(phy, dat, reps, groups) {
# Observed data
obsout <- phylometa.process(system(paste("\"new_phyloMeta_1.2b.exe\"", phy, dat), intern=T), groups)
subobsout <- subset(obsout[4][[1]], Group == "All studies")
obsfixedeffect <- subobsout[1,3]
obsrandeffect <- subobsout[2,3]
# Randomization of data by shuffling tips on tree
phylo <- read.tree(phy)
rtree_list <- list()
for(i in 1:reps) {
temp <- phylo
randtips <- sample(temp$tip.label, length(temp$tip.label))
temp$tip.label <- randtips
rtree_list[[i]] <- temp
}
treenames <- list()
for(i in 1:reps){
treenames[[i]] <- paste("tree",i,".txt","",sep="")
}
WriteTrees2(rtree_list)
fixedrandeffects <- data.frame(matrix(NA, reps, 2))
for(i in 1:length(treenames)) {
cat(i, "\n")
out <- system(paste("\"new_phyloMeta_1.2b.exe\"", treenames[[i]], dat), intern=T)
fixrandout <- subset(phylometa.process(out, groups)[4][[1]], Group == "All studies")
fixedrandeffects[i,] <- c(fixrandout[1,3], fixrandout[2,3])
}
names(fixedrandeffects) <- c("fixed", "random")
# Plot results
fixed_ <- ggplot(fixedrandeffects, aes(x = fixed)) +
geom_histogram() +
geom_vline(xintercept = obsfixedeffect, colour="red", size=1)
random_ <- ggplot(fixedrandeffects, aes(x = random)) +
geom_histogram() +
geom_vline(xintercept = obsrandeffect, colour="red", size=1)
jpeg(paste(dat, ".jpeg", sep=""))
arrange_(fixed_, random_)
dev.off()
# Calculate randomization P-value
num_greater_than_fixed <- length(which(fixedrandeffects$fixed >= obsfixedeffect))
pval_fixed <- num_greater_than_fixed / reps
num_greater_than_random <- length(which(fixedrandeffects$random >= obsrandeffect))
pval_random <- num_greater_than_random / reps
pvals <- c(pval_fixed, pval_random)
names(pvals) <- c("pval_fixed","pval_random")
pvals
}
# E.g.
setwd("Z:/Mac/R_stuff/Blog_etc/Phylometa_randomization/")
PhylogMetaRand(phy = "phylogeny.txt", dat = "metadata_2g.txt", reps = 100, groups = 2)


As you can see, the observed values fall well within the distribution of values obtained from shuffling tips.  P-values were 0.64 and 0.68 for fixed- and random-effects MA's, respectively. This suggests, to me at least, that the traditional (distribution of tip shuffled analyses, the histograms below) and phylogenetic (red lines) MA's are not different. The way I would use this is as an additional analysis to the actual Phylometa output.

No comments:

Post a Comment