Thursday, July 14, 2011

CRdata vs. Cloudnumbers

Cloudnumbers and CRdata are two new cloud computing services.


I tested the two services with a very simple script. The script simply creates a dataframe of 10000 numbers via rnorm, and assigns them to a factor of one of two levels (a or b). I then take the mean of the two factor levels with the aggregate function.


In CRdata you need to put in some extra code to format the output in a browser window. For example, the last line below needs to have '<crdata_object>' on both sides of the output object so it can be rendered in a browser. And etc. for other things that one would print to a console. Whereas you don't need this extra code for using Cloudnumbers.

 
dat <- data.frame(n = rnorm(10000), p = rep(c('a','b'), each=5000))
 
out <- aggregate(n ~ p, data = dat, mean)
 
#<crdata_object>out</crdata_object>


Here is a screenshot of the output from CRdata with the simple script above.

This simple script ran in about 20 seconds or so from starting the job to finishing. However, it seems like the only output option is html. Can this be right? This seems like a terrible only option.


In Cloudnumbers you have to start a workspace, upload your R code file.
Then, start a session...
choose your software platform...
choose packages (one at a time, very slow)...
then choose number of clusters, etc.
Then finally star the job.
Then it initializes, then finally you can open the console, and
Then from here it is like running R as you normally would, except on the web.


Who wins (at least for our very minimal example above)

  1. Speed of entire process (not just running code): CRdata
  2. Ease of use: CRdata
  3. Cost: CRdata (free only)
  4. Least annoying: Cloudnumbers (you don't have to add in extra code to run your own code)
  5. Opensource: CRdata (you can use publicly available code on the site)
  6. Long-term use: Cloudnumbers (more powerful, flexible, etc.)

I imagine Cloudnumbers could be faster for larger jobs, but you would have to pay for the speed of course. 

What I really want to see is a cloud computing service that accepts code directly run from R or RStudio. Hmmm...that would be so tasty indeed. I think Cloudnumbers may be able to do this, but haven't tested it yet.  

Perhaps using the server version of RStudio along with Amazon's EC2 is a better option than both of these. See Karthik Ram's post about using RStudio server along with Amazon's EC2. Even just running RStudio server on your Unbuntu machine or virtual machine is a pretty cool option, even without EC2 (works like a charm on my Parallels Ubuntu vm on my Mac). 

No comments:

Post a Comment